AWS Certified · MS Computer Science · New York, NY

Raj Kumar
Nelluri

AI Engineer  |  Machine Learning Engineer  |  Data Engineer

Building production-ready AI systems — from Generative AI and RAG pipelines to scalable ML workflows and data engineering on AWS. Experienced in LangChain, vector databases, prompt engineering, and end-to-end ML deployment. Passionate about turning raw data and documents into intelligent, reliable AI products.

Raj Kumar Nelluri — Machine Learning Engineer
RAG · LangChain · GenAI
🔬 ML · MLOps · AWS
Scroll to explore

Building ML Systems
That Work in Production

I'm an AWS Certified Cloud Practitioner and MS Computer Science graduate from Pace University, New York, with a B.Tech in Artificial Intelligence from Amrita Vishwa Vidyapeetham. My focus is on designing and deploying end-to-end machine learning pipelines and cloud-native data systems on AWS that go beyond notebooks into real production infrastructure.

On the ML engineering side, I've trained and deployed models using XGBoost, TensorFlow, and Scikit-learn for fraud detection, churn prediction, and demand forecasting. I work across the full ML lifecycle — feature engineering, hyperparameter tuning, cross-validation, SageMaker managed training jobs, and real-time inference endpoints — with monitoring and automated retraining via CloudWatch and EventBridge.

On the data engineering side, I build batch and streaming ETL pipelines that process 500,000+ records, design data lake architectures on Amazon S3, and manage relational data warehouses on AWS RDS. I understand the full data journey — schema validation, data quality enforcement, transformation logic, and storage — the infrastructure that makes ML systems actually reliable.

I'm actively seeking entry-level roles in Data Engineering, ML Engineering, or Cloud Engineering at US tech companies and startups.

Download Resume ↓
MS
Computer Science
Pace University, NY
AWS
Certified Practitioner
Cloud Infrastructure
5+
ML/DE Projects
End-to-End Deployed
500K+
Records Processed
Scalable Pipelines

Technologies I Work With

⌨️

Programming

Python SQL

Generative AI

LangChain RAG Pipelines OpenAI API Vector Databases ChromaDB Embeddings Prompt Engineering Streamlit
🤖

Machine Learning

XGBoost TensorFlow Scikit-learn Keras Feature Engineering Model Deployment Hyperparameter Tuning Model Monitoring
🔧

Data Engineering

ETL Pipelines Batch Processing Stream Processing Data Validation Data Quality Schema Enforcement Pandas NumPy
☁️

Cloud Infrastructure (AWS)

SageMaker Lambda Kinesis S3 RDS CloudWatch EventBridge Glue EC2 IAM
📊

Visualization

Tableau Power BI Matplotlib
🛠️

Tools

Git Jupyter Notebook VS Code Flask REST APIs

Machine Learning & Data Engineering Projects

Production-style ML systems built on AWS — real-time pipelines, model deployment, and automated monitoring.

Capstone · Pace University 2024

Insurance Fraud Detection System

Python · Scikit-learn · AWS S3 · Lambda · Kinesis · RDS · SageMaker

Cloud-native, real-time fraud detection pipeline on AWS — processing insurance claims through a 4-stage distributed architecture from ingestion to inference.

~90%Accuracy
15K+Records
Real-TimeScoring
GitHub
Independent · GitHub 2026

Enterprise RAG Chatbot

Python · LangChain · OpenAI · ChromaDB · Streamlit

Production-grade RAG pipeline answering questions from custom documents with mandatory source citations — MMR retrieval, metadata-grounded responses, and strict hallucination prevention via prompt engineering.

MMRRetrieval
Top-5Chunks Retrieved
ZeroHallucinations
🚀 Live Demo GitHub
Independent · GitHub 2025

Customer Churn Prediction — MLOps Pipeline

Python · XGBoost · AWS S3 · Lambda · SageMaker · Comprehend · CloudWatch · EventBridge

End-to-end MLOps pipeline predicting SaaS customer churn with NLP feature extraction from support tickets and automated model retraining on data drift.

0.67Recall
7K+Tickets
AutoRetraining
GitHub
Independent · GitHub 2025

Retail Sales Forecasting on AWS

Python · XGBoost · Pandas · NumPy · AWS S3 · SageMaker

Scalable batch ETL pipeline processing 500,000+ retail transactions for time-series demand forecasting, deployed as a SageMaker real-time endpoint.

500K+Transactions
10+Features
LiveEndpoint
GitHub
Capstone · Amrita University 2023

Crypto Price Forecasting — Flask REST API

Python · XGBoost · TensorFlow · LSTM · CNN · Flask

Multi-model ML system benchmarking LSTM, CNN, and XGBoost for 7-day Bitcoin price forecasting, deployed as a Flask REST API with live web dashboard.

3Architectures
7-DayForecast
RESTAPI
GitHub
Academic · Amrita University 2023

3D Face Generation with Neural Radiance Fields

Python · JAX · TensorFlow · OpenCV · COLMAP

Implemented Deformable NeRF for photorealistic 3D face reconstruction from monocular video, with multi-stage preprocessing for camera pose estimation.

PSNREvaluated
30%GPU Reduction
NeRFArchitecture
GitHub

How I Build ML Systems

End-to-end production ML pipeline — from raw data to automated model monitoring.

01
📥
Data Ingestion
AWS S3 AWS Kinesis REST APIs
02
🔄
ETL & Transform
AWS Lambda AWS Glue Python
03
⚗️
Feature Engineering
Pandas Scikit-learn AWS Comprehend
04
🧠
Model Training
SageMaker XGBoost TensorFlow
05
🚀
Deployment
SageMaker Endpoint AWS RDS Flask API
06
📡
Monitoring & Retrain
CloudWatch EventBridge Auto-Retrain
Insurance Fraud Detection
S3 Data Lake Lambda Kinesis Stream SageMaker RDS

Real-time claim scoring with distributed stream processing and SQL-based analyst reporting.

Customer Churn MLOps Pipeline
S3 Lambda ETL Comprehend NLP SageMaker Training CloudWatch + EventBridge

Automated ML lifecycle: NLP feature extraction → training → real-time endpoint → drift-triggered retraining.

Retail Sales Forecasting
Raw Transactions Python ETL S3 Data Lake SageMaker Job Real-Time Endpoint

500K+ transaction batch pipeline with temporal feature engineering and live prediction serving.

Engineering Capabilities

🤖

Machine Learning Engineering

  • End-to-end model training workflows
  • Feature engineering and selection
  • Hyperparameter tuning with cross-validation
  • Model evaluation (MAE, RMSE, Recall, Accuracy)
  • SageMaker managed training jobs
  • Real-time inference endpoint deployment
  • Batch inference pipelines
🔧

Data Engineering

  • Batch ETL pipeline design and execution
  • Real-time streaming pipelines (Kinesis)
  • Schema validation and enforcement
  • Data quality monitoring and alerting
  • Data lake architecture (S3)
  • Data warehouse management (RDS)
  • Large-scale data processing (500K+ records)
⚙️

MLOps & Cloud Automation

  • Automated model retraining pipelines
  • Data drift detection and alerting
  • CloudWatch metric monitoring
  • EventBridge scheduled automation
  • AWS Lambda serverless orchestration
  • Cloud-native infrastructure design
  • Model lifecycle management

Let's Build
Something Together

🟢 Open to Machine Learning Engineer, Data Engineer, and Cloud Engineering opportunities at US tech companies and startups.