Arshad Kazi

Data Scientist & Computer Vision Engineer

github.com/arshad221b arshad-kazi.com linkedin/arshad221b arshadkaji97@gmail.com

About

Computer vision and machine learning engineer with five years of industry experience. My work has spanned NLP and document understanding at a large financial data company, to building real-time 3D pose estimation and player analysis systems at a sports tech startup. I have shipped models to production using FastAPI, Docker, and AWS, and worked across the full pipeline from data preparation through inference optimization.

On the research side, I work on 3D human pose reconstruction from 2D video, with a focus on temporal modeling and efficient inference. I tend to understand things by reimplementing them from scratch (NeRF, DINO, MotionBERT-style pose uplift) before working with them in applied settings. I have also published in NLP, specifically on multilingual code-switching with language models.

Open To

Research collaborationsCV, SSL, pose estimation, segmentation

Freelance / contractBuilding full-stack ML systems end to end

Full-time rolesML engineering with depth and autonomy

Experience

Mar 2024 to Present Current

Data Scientist, Computer Vision

Remote Resource, Noida

Early-stage startup. I was brought in to independently build the computer vision side of a sports analytics platform, no existing codebase, no team to hand things off to. I owned the full system from video ingestion to structured insights.

Built a 3D pose estimation pipeline for extracting skeletal keypoints and motion features from sports video. Engineered 15+ biomechanical features including joint angles, velocities, and trajectories for downstream performance analysis.
Implemented temporal consistency mechanisms (smoothing and interpolation) to handle occlusions and improve stability across frames.
Optimized inference to achieve near real-time processing at 15 to 25 FPS on standard GPU hardware.
Built an object detection pipeline for fast-moving objects, reaching 70% accuracy under challenging conditions.
Deployed a video junk classification system with 95% accuracy and low latency in production.
Deployed the full system on AWS EC2 using Docker and FastAPI. Pitched project ideas directly to clients and iterated based on their feedback.

Oct 2023 to Feb 2024 5 mo

Independent Researcher

Self-directed

After nearly three years at Morningstar, I hit a wall. Not burnout exactly, more like a growing sense that I was drifting. The work was good on paper but I had stopped feeling curious about it. I left without a clear plan, which felt uncomfortable at the time but turned out to be necessary.

I spent a few months trying to figure out what I actually wanted to work on. I explored a lot, read widely, experimented with ideas that had nothing to do with my job title, and got frustrated with how much of the ML ecosystem felt like repackaged hype. I became less interested in following trends and more interested in understanding things deeply. That shift eventually pulled me back toward research and toward computer vision specifically, which is where I had always found the most interesting problems.

During this period I also wrote and submitted a research paper, and started working on a 3D pose research project that I have continued alongside my current role.

Published "Contextual Code Switching for Machine Translation using Language Models", accepted at IEEE SmartGenCon, Bangalore. Compared model size vs. accuracy for multilingual translation across LLMs, Prompt Tuning, and LoRA fine-tuning.
Started research on 3D human pose reconstruction from 2D inputs using spatial and temporal modeling, ongoing since Jan 2025, currently in the paper-writing phase.
Implemented 3D image segmentation for CT and MRI scans, including a custom UNet architecture for volumetric data on lower-RAM GPU setups.

Jan 2021 to Sep 2023 2 yr 9 mo

Associate Data Scientist

Morningstar Inc., Mumbai

My first proper industry role. Worked on NLP and document understanding for financial data, a domain with real messiness (scanned PDFs, multilingual docs, inconsistent formatting) that forced me to care about production quality, not just benchmark numbers. Received Employee of the Month and Star of the Quarter recognition multiple times.

Built a custom question-answering model for financial documents using RoBERTa trained on a custom dataset. Implemented text extraction from scanned images using Tesseract OCR.
Achieved 75% automation on document processing and deployed on AWS Lambda, reducing manual intervention significantly.
Implemented multi-class document classification using SVM, Random Forest, and XGBoost across multiple global markets.
Built an end-to-end data extraction and preprocessing pipeline with Tesseract OCR, achieving 92% accuracy, above manual accuracy. Deployed on AWS EC2.

Research

Contextual Code Switching for Machine Translation using Language Models

Evaluated model size versus translation accuracy for multilingual code-switching tasks across multiple LLMs. Compared zero-shot prompting, Prompt Tuning, and LoRA fine-tuning strategies on unknown multilingual data.

3D Human Pose Reconstruction from 2D Inputs via Spatial and Temporal Modeling

Custom model for lifting 2D keypoints to 3D poses, with GPU-based training pipeline, custom loss function, and inference optimization. Paper writing currently underway.

3D Image Segmentation for CT and MRI Scans

Custom 3D UNet for volumetric medical image segmentation on lower-RAM GPU setups, with patch-wise training and organ-level labeling.

Projects

2D to 3D Human Pose Uplift

Transformer-based system inspired by MotionBERT that converts 2D keypoints into 3D human poses. Includes temporal modeling, full training pipeline, and 3D visualization.

DINO from Scratch

Full reimplementation of DINO (Self-Distillation with No Labels) in PyTorch with Vision Transformers, momentum encoders, centering, sharpening, and multi-view alignment.

NeRF from Scratch

Neural Radiance Fields implementation in PyTorch with positional encoding, hierarchical volume rendering, and efficient GPU batching for novel 3D view synthesis from 2D images.

3D U-Net for Medical Segmentation

3D U-Net for CT/MRI volumetric segmentation with patch-wise training, organ-level labeling, and volumetric visualization. Designed to train on lower-RAM GPUs.

YOLO-NAS on Indian Street Signs

Fine-tuned YOLO-NAS via SuperGradients on a custom Indian street sign dataset. Includes custom annotation support, training scripts, and optimized inference setup.

RAG on PDF

Retrieval-Augmented Generation pipeline for answering questions from PDFs using LangChain, FAISS, and LLMs. Optimized for local-first use and fast contextual retrieval.

Named Entity Recognition

Fine-tuned RoBERTa for token-level NER on custom data. Includes full preprocessing, training loop, and clean inference pipeline.

Notable

●

ML blog with 100k+ visits. Referenced in Wiley Publication's Artificial Intelligence Programming with Python, pages 453 to 454.

●

Open source projects with 300+ stars on GitHub.

●

Top 3 at HackSRM: built a real-time deep learning model for heart disease prediction.

●

Top 0.01% in CodeVita (TCS) and HackWithInfy (Infosys) coding competitions.

Skills

Languages

PythonSQL

Frameworks

PyTorchTensorFlowFlaskFastAPILangChain

CNNsViTsUNetsYOLOLLMsLoRASVMGBM

Tools

DockerAWS EC2AWS LambdaGitWeights & BiasesGreatExpectationsCUDA

Libraries

OpenCVTesseractHugging FacePandasNumPyScikit-learnMatplotlib

Education

B.Tech in Computer Science

Ramrao Adik Institute of Technology, Mumbai 2020

Languages

EnglishHindiMarathi

Interests

Self-supervised learningMultimodal systemsGeometric CV3D reconstructionPose estimationVisual perception