Hey, I'm

Amjad Ali

Data & AI Engineer focused on building reliable pipelines, intelligent systems, and real-world solutions.

Explore my projects, experience, and ongoing journey.

New York

Amjad Ali
scroll

Featured Project

ML Risk Model — Results Explorer

Dual-model approach: origination predictor (honest AUC) + behavioral segmentation (risk ranking)

AUC-ROC (Origination Model)
Capture at Top 10%
Capture at Top 20%
Risk Segments Scored
Model B uses only origination features (credit score, LTV, DTI, interest rate, loan age) — no payment history. This ensures genuine predictive capability without data leakage.
Feature importance from the origination-only model. Credit score is the dominant predictor — consistent with fundamental credit risk theory.

Top risk segments from the behavioral model. Segments with highest concentration of delinquent loans.

Credit BandLTVRateVintageLoansRisk ScoreActual DLQ
Single-snapshot limitation.
The dataset is a single monthly snapshot, not a time series. True forward prediction requires sequential monthly data. The origination model predicts current delinquency status from borrower profile — valuable for risk stratification but not a deployment-ready forecasting tool.
Class imbalance (99:1).
Only ~1% of loans are delinquent. Precision is inherently low. Addressed with balanced class weights and evaluated using lift metrics rather than raw precision.
What would improve it.
Monthly time-series data, additional features (employment, payment amounts, forbearance history), gradient boosting ensembles, and a calibration layer for dollar-loss estimation.
Why the dual-model design matters.
The first attempt produced AUC = 1.0 due to data leakage. The redesign separates Model B (origination, AUC ~0.77) for honest prediction from Model A (behavioral) for segmentation. This distinction is what mortgage analytics teams value.

Other Projects

Career & Education

July 2025 — Present
Research Assistant
University of New Haven · West Haven, CT
LLM-assisted software security research — vulnerability detection, multilingual code analysis across 6 languages, and explainable automated repair workflows for real codebases.
August 2023 — May 2025
MS in Data Science (STEM)
University of New Haven
AWS (Athena, Glue, S3, Lambda), Power BI, NLP, Math for Data Scientists.
June 2024 — August 2024
Solutions Engineering Intern
Bitwise Inc. · Schaumburg, IL
Built Python GenAI service with LangChain orchestration, multi-LLM routing, and SQL-to-PySpark conversion pipeline. Led cross-team final presentation across frontend, backend, and GenAI tracks.
August 2021 — August 2023
Senior Analyst
Capgemini
Fortune 50 licensing analytics — 50+ Snowflake SQL scripts, ETL migration from Informatica to AWS/Snowflake, end-to-end data validation across S3, DB2, and Snowflake.
March 2021 — May 2021
Data Analyst Intern
DevTown (Shape AI)
Supervised ML models on labeled datasets — 89.5% classification accuracy, 92.9% fraud recall. Deployed regression model as interactive web application.
August 2017 — May 2021
BE in Electronics & Telecommunication
University of Pune
AI, Machine Learning, Data Structures & Algorithms, OOP, SQL.

Credentials & Badges

Professional Certifications
OCI GenAI
OCI Generative AI Professional
OCI Vector Search
OCI AI Vector Search Professional
OCI Auto DB
OCI Autonomous Database Professional
Azure AZ-900
Azure Fundamentals (AZ-900)
Azure DP-900
Azure Data Fundamentals (DP-900)
Azure AI-900
Azure AI Fundamentals (AI-900)
HackerRank SQL Advanced
HackerRank SQL (Advanced) Certificate
Google Kaggle AI Agents
Google/Kaggle 5-Day AI Agents Intensive
Platform Achievements
Snowflake DE
Snowflake Data Engineering Bootcamp
Snowflake GenAI
Snowflake Gen AI Bootcamp
MS AI Skills Fest
Microsoft AI Skills Fest (GWR)
CodeSignal Streak
CodeSignal Streak — 244 days
Coding Badges
LeetCode SQL50
LeetCode SQL 50
LeetCode Pandas15
LeetCode Pandas 15
HR SQL Gold
HackerRank SQL Gold
HR 30-Day Gold
HackerRank 30 Days Gold
Tools & Technologies
CodeSignal Tools
Practiced on CodeSignal