Hello! I'm
MS in Data Science @ University at Buffalo.
Applied NLP & Data Science Intern with hands-on work across LLM validation, machine learning, analytics dashboards, and cloud data systems.
MS Graduate
Python · SQL · NLP · LLMs · AWS · Azure · Spark · Power BI.
Open to full-time roles in Data Science, AI/ML Engineering, Data Engineering, and Analytics.
Data doesn't tell stories on its own - I build the systems that make it speak. From LLM-powered OCR/NLP validation that improved turnaround efficiency by 40% to merchant-risk pipelines that generate 15-minute snapshots and next-24-hour risk scores, I focus on work that ships, scales, and helps teams make better decisions. I am completing an MS in Data Science and Applications at the University at Buffalo, with experience across Python, SQL, Spark, Azure, AWS, Power BI, Tableau, FastAPI, and LLM-driven automation.
End-to-end obituary publishing platform using LLMs and NLP to automate content validation. Reduced manual review effort by 30–40% across 100+ records. Deployed in a live pilot to 50+ stakeholders.
Open projectAI-powered music recommendation app that analyzes images with OpenCLIP ViT-B/32, detects scene mood and confidence, and recommends Spotify tracks that match the user's visual vibe.
Open projectLarge-scale news intelligence pipeline using GDELT, BigQuery, Python, and Tableau. Built partition-pruned extracts, anomaly detection, risk scoring, forecasts, and dashboards for global event monitoring.
Open projectProduction-grade merchant risk platform using Python, SQL, Snowflake, SageMaker, and AWS. Generates 15-minute merchant snapshots and next-24-hour fraud-risk scores with versioned model outputs.
Open projectEnd-to-end Amazon review sentiment benchmark comparing traditional ML with transformer models. Fine-tuned DeBERTa-v3-base reached 93.5% accuracy with MLflow tracking and robustness tests.
Open projectStatic outreach app for personalized email campaigns through Gmail API and Microsoft Graph. Supports merge tags, CSV validation, pacing controls, attachments, campaign history, and open tracking.
Open projectIntegrated GPT-4o into Python-based OCR + NLP pipelines, achieving 98% accuracy in anomaly detection.
Developed a Streamlit human-in-the-loop validation interface, improving review turnaround efficiency by 40%.
Built React dashboard workflows with Python/FastAPI backend services for LLM-powered content validation.
Created automated Power BI dashboards with DAX and Power Query to track recycling KPIs and throughput.
Analyzed 1,500+ production records and reduced cycle delays by identifying bottlenecks and trend drivers.
Cleaned and transformed 20K+ rows for operational reporting, improving dashboard reliability and readiness.
sunnymalik0102@gmail.com · GitHub @Sunny-0102 · LinkedIn /in/himanshumalik0102