About

This page contains a detailed overview of the past work I did.

💼 Work Experience

Wolters Kluwer (Jan 2023 - Present) - Machine Learning Engineer

Borrower Analytics 2.0 (UCC filings)
- Processed 65M UCC filings for information extraction from PDFs (>1B single-page images).
- Fine-tuned and deployed a vision-language model to extract 50+ fields from UCC1/UCC3 across diverse state formats; achieved 95% extraction accuracy.
- Deployed with a high-throughput serving stack at ~10 images/sec on H100 GPUs.
- Built an LLM-as-a-judge framework to generate high-quality labeled data, significantly reducing human labeling effort.
- Trained and deployed a transformer for text segmentation (97% recall) extracting complete collateral text.
- Trained a lien-classification model (95% accuracy) based on collateral text.
IRA Knowledge-Base Chatbot (RAG)
- Built a RAG chatbot over an XML-based IRA knowledge base for faster query resolution.
- Implemented query transformation, parent–child retrieval, hierarchical indexing, and multi-vector retrieval in LanceDB.
- Improved retrieval with ColBERT reranking and hybrid search (semantic + BM25).
- Reached ~97% CSAT, improving user experience and response accuracy.
WK AI Studio (internal fine-tuning platform)
- Built an end-to-end internal framework to fine-tune and deploy models to production.
- Enabled non-technical teams to fine-tune models with their own data.
- Integrated MLflow with full lineage, model registry, and dataset/model versioning for production tracking.
- Supported distributed multi-GPU, 4-/8-bit training, LoRA/QLoRA, mixed precision, and differential learning rates, plus other SOTA techniques.
BLMS (Business License Match/Search)
- Built a search engine to recommend required licenses for starting a business.
- Developed a HyDE (Hypothetical Document Embeddings) workflow to expand short user queries into meaningful search text.
- Used synthetically generated descriptions from an internal taxonomy and a reranker, achieving 96% retrieval accuracy.
Proviso (legal citations chatbot)
- Built a RAG-based chatbot on 91k legal citations for efficient legal query resolution.
- Enhanced retrieval with metadata filtering, query transformation, and similar techniques as IRA.
- Used BERTTopic to generate hypothetical topics, which were leveraged for metadata filtering to reduce search space and improve query handling.
Earlier — Data Science Intern (Jan–Jul 2023)
- Built an in-house key information extraction solution using a transformer-based stack.
- Shipped a document classification pipeline.

Weights & Biases (May 2022 - Present) - Ambassador

Engineered optimized Kaggle notebooks with integrated W&B tracking and monitoring.
Authored technical reports showcasing W&B across medical imaging, visual-language models (Flamingo, BLIP-2), few-shot learning (SetFit), PyTorch 2.0, Hugging Face, RAG, and LLMs.
You can find all my W&B blogs here

📈 Competitions

I love participating in machine learning competitions. I am primarily active on Kaggle and am a Kaggle Competition Expert.

WSDM Cup — Multilingual Chatbot Arena (Kaggle, 2024) — 31/950 🥈

Challenge was to develop a reward model (used in RLHF stage) for multilingual human conversations on the chatbot arena (formerly LMSYS).
Finetuned LLMs as reward models in classification setting and used various techniques like multi-stage training (pretraining, finetuning), pseudo labelling, LoRA, QLoRA, efficient inference techniques, knowledge distillation.
Competition Link | Code

Kaggle — LLM Science Exam (2023) — 123/2664 🥈

Multiple-choice science questions; evaluated by MAP@3.
Dual retrieval: TF-IDF over curated Wikipedia corpora + dense retrieval (BGE-small-en v1.5) with FAISS.
Context assembly: join top-K passages into a single context string per question.
Answering model: DeBERTa-v3-Large as multiple-choice scorer (context + question paired with each option).
Ensembling: soft-average logits across checkpoints × retrieval variants for the final MAP@3 submission.
Competition Link

DataSolve-India (Wolters Kluwer, 2022) — 1st place 🥇

The objective was to categorize regulations that are crucial for business compliance (multi-label classification).
Used weighted hill-climbing ensemble of transformer models (DeBERTa-v3, RoBERTa) and GBDT’s (CatBoost, XGBoost).
Competition Link | Code

U.S. Patent Phrase-to-Phrase Matching (Kaggle, 2022) — 31/1889 🥈

The task was to extract relevant information by determining the semantic similarity between key phrases in patent documents.
Used hill-climbing ensemble technique, and a range of transformer models trained using different strategies to ensure diversity.
Competition Link | Code

Happywhale — Whale & Dolphin Identification (Kaggle, 2022) — 132/1588 🥉

Individual re-identification using dorsal fin/marking signatures.
Curated and published resized training sets to accelerate iteration/stabilize training.
Built a robust visual re-ID pipeline (embedding model + nearest-neighbour matching), with heavy augmentation and careful per-individual CV to avoid leakage.
Iterated on mining schedules and validation sanity checks for consistent generalization.
Competition Link

Amazon ML Challenge (HackerEarth, 2021) — 11/3294

The task required categorizing products into browse node IDs for a large dataset consisting of 2.67GB of text and with 9k+ classes.
Used standard handcrafted features, sentence embeddings, TF-IDF and a custom neural network to merge all features.
Competition Link | Code (team)

Bristol-Myers Squibb — Molecular Translation (Kaggle, 2021) — 50/874 🥈

The task was to interpret old chemical images and convert images back to the underlying chemical structure annotated as InChI text.
Used Vision Transformer (ViT) as encoder and original transformer decoder.
Generated 12M synthetic images with RDKit for better ViT performance.
Competition Link

Sartorius — Cell Instance Segmentation (Kaggle, 2021) — 117/1505 🥉

Detect and delineate single neuronal cells in microscopy images (instance segmentation).
Implemented Detectron2-based Mask R-CNN pipeline (with tuned anchors/thresholds) as the primary model.
Added a parallel Cellpose track for comparison; tracked results and failures to guide post-processing.
TTA (flips/scales) and morphological post-processing (small-object cleanup, hole-filling) to refine masks.
Deployed a small demo app for qualitative review and error analysis.
Competition Link | Code

🌍 Open-Source Contributions

Fixed an example script on HuggingFace 🤗 transformers repository for XLA devices - PR
Made the experiment trackers to launch only on main process in distributed setups on 🤗 Accelerate library - PR
Fixed several examples and removed the check for main process… on 🤗 Accelerate library - PR
Update several 🤗 transformers no_trainer scripts leveraging 🤗 Accelerate… - PR
Contributed a report to Weights & Biases showcasing the integration of MONAI and W&B - PR

🎤 Talks

I frequently give talks on machine learning and MLOps topics. Most of my talks to colleges and small groups are not recorded, but here is one notable recorded presentation:

Weights & Biases MLOps Conference (Fully Connected 2023) - I was a speaker at the inaugural Weights & Biases MLOps conference. You can listen to my talk here. Also here’s the announcement post.

I’m open to giving talks! If you’re interested in having me speak at your event, please reach out to me at atharvaaingle@gmail.com.