Amjad Ali

Featured Project

Mortgage Servicing Analytics Platform

Loan Performance Analytics & Risk Intelligence

End to end analytics platform built on 992K Freddie Mac loans ($248B portfolio) spanning 11 origination years. SQL powered analytical engine with automated data quality checks, interactive Tableau dashboards, ML based risk scoring, and auto generated executive reports.

992K loans$248B portfolio11 vintages8 SQL queries

Explore Live Dashboard →

View Code Documentation Hub

ML Risk Model: Results Explorer

Dual model approach: OriginRisk (origination predictor, honest AUC) + SegmentIQ (behavioral segmentation, risk ranking)

For complete methodology and architecture details, see the Technical Documentation.

—

AUC ROC (OriginRisk)

—

Capture at Top 10%

—

Capture at Top 20%

—

Risk Segments Scored

This chart shows the cumulative lift curve comparing the OriginRisk model against random selection. The gold line represents the model's performance: it plots what percentage of all delinquent loans are captured when reviewing a given percentage of the portfolio, ranked by predicted risk score. The red dashed line represents random guessing, where reviewing 10% of loans would catch roughly 10% of delinquencies. OriginRisk is a logistic regression model trained exclusively on origination time features such as credit score, loan to value ratio, debt to income ratio, interest rate, and loan age. No payment history or behavioral data is used, which ensures there is no data leakage from the target variable into the input features. The steeper the gold curve rises above the red baseline, the more effectively the model concentrates risk. In this portfolio, reviewing the top 20% of risk ranked loans captures approximately 59% of all delinquencies, nearly three times better than random selection.

This chart displays the relative importance of each feature used by the OriginRisk model to predict delinquency. Feature importance is measured by how much each variable contributes to the model's ability to distinguish delinquent loans from current ones. Higher values indicate stronger predictive power. Credit score at origination is the single strongest signal, followed by the number of borrowers on the loan (single borrower loans carry more risk), origination interest rate (higher rates mean larger monthly payments), and debt to income ratio (borrowers who are already financially stretched default more often). These results are consistent with established credit risk theory and align with findings from large scale European mortgage studies. The model uses only information available at the time of lending, meaning these risk drivers can inform underwriting decisions before a loan is funded.

This table shows the highest risk loan segments identified by the SegmentIQ model, which uses all available features including payment history to score current portfolio risk. Each row represents a unique combination of credit score band, loan to value bucket, interest rate range, and origination vintage. The "Risk Score" column shows the model's average predicted delinquency probability for that segment, while "Actual DLQ" shows the observed delinquency rate. Segments where both scores are high represent concentrated pockets of risk where loss mitigation resources would have the greatest impact. Subprime and Fair credit borrowers from the 2022 and 2023 vintages with elevated interest rates consistently appear at the top, confirming that the combination of weaker credit profiles and rate environment stress produces the highest delinquency concentrations.

Credit Band	LTV	Rate	Vintage	Loans	Risk Score	Actual DLQ

Single snapshot limitation.
The dataset is a single monthly snapshot, not a time series. True forward prediction requires sequential monthly data. The origination model predicts current delinquency status from borrower profile, which is valuable for risk stratification but not a deployment ready forecasting tool.

Class imbalance (99:1).
Only about 1% of loans are delinquent. Precision is inherently low. Addressed with balanced class weights and evaluated using lift metrics rather than raw precision.

What would improve it.
Monthly time series data, additional features (employment, payment amounts, forbearance history), gradient boosting ensembles, and a calibration layer for dollar loss estimation.

Why the dual model design matters.
The first attempt produced AUC = 1.0 due to data leakage. The redesign separates OriginRisk (origination, AUC ~0.77) for honest prediction from SegmentIQ (behavioral) for segmentation. This distinction is what mortgage analytics teams value.

Other Projects

Spotify Live Top 200 Dashboard

Power BI data intelligence platform with historical and real time analytics engine built with API pipelines and Power BI modeling.

Power BIREST APICI/CDDAX

StatGuard: Security Patch Assistant

Can a static only LLM based agent safely assist with secure code review without ever executing the target program?

Gemini 2.5BanditMulti Agent

Tic Tac Tourney

Game logic with Random, Minimax, and Heuristic AI agents competing against humans in a round robin tournament.

AI AgentsMinimaxAlpha Beta

Psychometric Severity Prediction

Understanding how personality traits and demographics influence mental health outcomes using ML classification.

Logistic RegressionSVMANOVA

Deepseek SpringAI

Querying DeepSeek and working with Spring AI to build a robust Spring Boot app that delivers interactive, personalized learning experiences.

Spring BootDeepSeekSpring AI

Career & Education

July 2025 — Present

Research Assistant (Volunteering)

University of New Haven · West Haven, CT

LLM assisted software security research involving vulnerability detection, multilingual code analysis across 6 languages, and explainable automated repair workflows for real codebases.

August 2023 — May 2025

MS in Data Science (STEM)

University of New Haven

AWS (Athena, Glue, S3, Lambda), Power BI, NLP, Math for Data Scientists.

June 2024 — August 2024

Solutions Engineering Intern

Bitwise Inc. · Schaumburg, IL

Built Python GenAI service with LangChain orchestration, multi LLM routing, and SQL to PySpark conversion pipeline. Led cross team final presentation across frontend, backend, and GenAI tracks.

August 2021 — August 2023

Senior Analyst

Capgemini

Fortune 50 licensing analytics with 50+ Snowflake SQL scripts, ETL migration from Informatica to AWS/Snowflake, end to end data validation across S3, DB2, and Snowflake.

March 2021 — May 2021

Data Analyst Intern

DevTown (Shape AI)

Supervised ML models on labeled datasets achieving 89.5% classification accuracy and 92.9% fraud recall. Deployed regression model as interactive web application.

August 2017 — May 2021

BE in Electronics & Telecommunication

University of Pune

AI, Machine Learning, Data Structures & Algorithms, OOP, SQL.