Raj Kumar Nelluri — AI/ML Engineer

Production AI —
not just notebooks.

3 live AWS deployments: investment research, document retrieval, and neural audience intelligence — all automated, auditable, and measured by the work they replace.

3 Live AWS deployments

PR #20 Submitted to Meta FAIR TRIBE v2 (CLA signed)

<30s Deterministic investment decision

Real systems.
Measured results.

Every number below comes from a production deployment that replaced a manual process — not a Kaggle notebook.

96.7%

Fraud detection accuracy across 50k+ insurance claims

91%

F1 score on churn prediction with SHAP explainability

87%

Retrieval accuracy on enterprise RAG chatbot

4.2%

RMSE on 8-week retail sales forecast — 18% better than baselines

Tools I work
with daily.

AI / ML

PyTorch TensorFlow Scikit-learn XGBoost YOLOv8 LangChain

LLM / Data

GPT-4o LangGraph ChromaDB FAISS Pandas NumPy Prophet

Infrastructure

AWS EC2 SageMaker Lambda Kinesis S3 Docker FastAPI Redis Nginx

MLOps

MLflow GitHub Actions Streamlit Weights & Biases

Selected Work

Built. Shipped. Measured.

Every project below replaced a manual process that was costing someone real time, money, or accuracy.

All on GitHub

01 — LLM Agent · Live Product

AI Financial Research Agent

Designed and deployed a deterministic investment decision engine — a 7-step async LangGraph pipeline generating auditable BUY/HOLD/SELL recommendations in under 30 seconds. GPT-4o is constrained to explanation-only; all decisions are made by a normalized multi-factor scoring system. Zero LLM hallucination risk on recommendations. Running in production.

BUY/HOLD/SELL in <30s ~15s → <100ms Redis cache Live · Zero hallucination risk

GPT-4oLangGraphFastAPI Alpha VantageRedisDocker AWS EC2Nginx

Try it live

View Live System View Code

Problem

Financial analysts spend days compiling research from 10+ scattered data sources. Manual work is slow and doesn't scale.

Solution

GPT-4o agent loop that autonomously calls live APIs — market data, financial news, SEC fundamentals — then synthesises a structured markdown report.

Engineering Decisions

Multi-factor scoring engine (Technical 0–25, Fundamental 0–40, Sentiment 0–15) with conflict-aware logic — signal disagreement (variance > 0.15) overrides to HOLD, eliminating false conviction. GPT-4o constrained to explanation-only: zero influence over the decision, fully auditable outputs. Redis TTL 900s caching cuts repeated query latency from ~15s to <100ms. LangGraph chosen for structured async pipeline over ad-hoc tool-calling.

Business Value

A 5-factor confidence model (data completeness, signal agreement, volatility, news uncertainty) blocks recommendations below 20% confidence — solving the false-conviction problem of generic LLM finance tools. Portfolio ranking mode supports 2–8 tickers with proportional capital allocation. Any firm paying analysts $80k+/year offsets that cost with sub-30s auditable decisions.

<30s BUY/HOLD/SELL decision

<100ms Cached query latency

0 LLM hallucination risk

02 — Neuroscience AI · Live Product · Meta FAIR TRIBE v2

CineNeuro — Neural Audience Intelligence

Engineered a multimodal brain encoding pipeline that predicts second-by-second fMRI activations across 20,484 cortical vertices — giving studios audience emotional response intelligence before a trailer ever releases. Three neural networks (V-JEPA2, Wav2Vec-BERT, Llama 3.2) feed Meta FAIR's TRIBE v2 to map vision, audio, and language into 5 emotion channels across 7 brain regions. Live on AWS. Submitted PR #20 to Meta FAIR's TRIBE v2 — added int8 compute_type fallback for WhisperX on non-CUDA devices. Before the fix: hardcoded float16 caused ValueError on Apple Silicon and CPU-only Linux. After: the full pipeline runs everywhere. CUDA path unchanged, <1% WER degradation on transcription accuracy. CLA signed, checks passing.

20,484 brain vertices/sec <100ms demo response $0.50/trailer · Live on AWS

Meta TRIBE v2V-JEPA2Wav2Vec-BERT Llama 3.2FastAPIReact DockerAWS EC2Nginx

Try it live

View Live System View Code

Problem

Studios spend millions on trailers with no way to predict emotional response before release. Test screenings are expensive, slow, and limited to small samples — a studio can't know second-by-second where audiences peak in excitement or disengage.

Solution

Multimodal brain encoding pipeline: V-JEPA2 extracts video features, Wav2Vec-BERT processes audio, Llama 3.2 embeds the transcript — all three streams feed Meta FAIR's TRIBE v2, which predicts fMRI activations across 20,484 cortical vertices per second of video. Activations are mapped to 5 emotion channels (excitement, fear, joy, suspense, boredom) across 7 brain regions using neuroscience-based weights.

Engineering Decisions

Two-tier deployment: g4dn.xlarge GPU instance ($0.53/hr, Tesla T4 15GB VRAM) spun on-demand for inference only, t3.micro serves 24/7 with pre-computed JSON results at <100ms — keeping monthly cost near $0. 12.89GB across three extractors barely fits in T4 VRAM, requiring careful model loading order. Multi-stage Docker build (Node 18 + Python 3.11-slim) keeps the production image at ~500MB with no ML weights. Submitted PR #20 to Meta FAIR's TRIBE v2 — added int8 compute_type fallback for WhisperX on non-CUDA devices. Before the fix: hardcoded float16 caused ValueError on Apple Silicon and CPU-only Linux. After: the full pipeline runs everywhere. CUDA path unchanged, <1% WER degradation on transcription accuracy. CLA signed, checks passing.

Business Value

Second-by-second emotion maps identify the 3 strongest and 3 weakest moments in a trailer — actionable edit intelligence. Persona simulation models how Action Lovers, Romance Fans, and Horror Enthusiasts respond differently. Competitive benchmarking against Oppenheimer, Avengers Endgame, Inception, Interstellar, and The Dark Knight gives studios a quantified baseline. Full analysis costs ~$0.50 in GPU time, replacing a test screening that costs thousands.

20,484 Brain vertices/sec

<100ms Pre-computed demo latency

$0.50 Full trailer GPU cost

03 — RAG Pipeline · Live Product

Enterprise RAG Chatbot

Engineered a production RAG system with MMR retrieval, hallucination guardrails, and a rule-based query classifier — no LLM call fires without sufficient retrieval confidence. 87% retrieval accuracy. Two interfaces: Streamlit chat UI + FastAPI developer API. Live on AWS with HTTPS.

87% retrieval accuracy Hallucination guardrails HTTPS · Live on AWS

LangChainChromaDBGPT-4 FastAPIStreamlitDocker AWS EC2Nginx

Try it live

View Live System View Code

Problem

Enterprise knowledge locked in PDFs and Word docs means employees waste hours on manual search instead of actual work.

Solution

Full RAG pipeline: LangChain orchestration, ChromaDB vector store, GPT-4 generation with streaming. Supports multi-format doc ingestion.

Engineering Decisions

MMR (Maximal Marginal Relevance) over plain cosine similarity — fetches a larger candidate pool and re-ranks for relevance and diversity, preventing near-duplicate chunks reaching the LLM. Rule-based query classifier (no LLM cost) detects factual/complex/ambiguous intent and adjusts top_k/fetch_k accordingly. Three API cost guards: rate limiting (20 req/min), 500-char input cap, tiktoken token budget — all fire before any LLM call. Hallucination guardrails check retrieval confidence and return a fallback instead of guessing on low scores.

Business Value

Citation format [SOURCE N: filename — Page: X] makes every answer auditable — critical for regulated industries. Dual interfaces (Streamlit for business users, FastAPI Swagger at /api/docs for engineering teams) means one deployment serves both audiences. The cost guards and query classifier make this viable at scale without API spend ballooning.

87% Retrieval accuracy

MMR Anti-duplicate retrieval

0 Hallucination on grounded queries

About

Full-stack AI —
from manual process
to production system.

Designed and deployed complete AI systems — not just notebooks. A model that never leaves a researcher's laptop isn't a product — it's a cost. Every system I've built started with a process someone was doing manually. My job was to make that unnecessary. Engineered across the full stack: model architecture and training, API development, containerisation, and AWS cloud deployment.

Owned three commercially critical areas end-to-end: LLM agents and RAG pipelines (GPT-4o, LangGraph, deterministic decision systems, hallucination mitigation), computer vision systems (YOLO, ResNet, real-time inference), and predictive ML (forecasting, churn, fraud). I care less about model accuracy on holdout sets and more about what happens when the system goes live. Every system is measured against accuracy, latency, and business outcome — not validation loss.

3 Production deployments

3 AI domains

AWS Cloud infrastructure

End-to-end Full-stack ML

MS CS Pace University · NYC

AWS CCP Certified Cloud Practitioner

B.Tech AI Amrita Vishwa Vidyapeetham

Download Resume Get in touch

Capabilities

LLM Engineering

Production-grade agents and RAG pipelines that replace manual research and document retrieval — with deterministic decision logic and hallucination guardrails built in.

Computer Vision

YOLO and ResNet pipelines that replace manual inspection and review at scale — real-time inference via FastAPI, streaming ingestion via Kinesis, results queryable via SQL.

ML Infrastructure

End-to-end MLOps on AWS — S3 ingestion, Lambda ETL, SageMaker training, real-time endpoints, CloudWatch drift detection, and CI/CD retraining. Systems that stay accurate without manual intervention.

Forecasting & Prediction

LSTM + Prophet ensembles and XGBoost classifiers that replace manual planning and gut-feel decisions — SHAP explainability makes every prediction auditable for non-technical stakeholders.

Every layer — so nothing blocks the deployment.

ML / DL

PyTorch & TensorFlow Scikit-learn XGBoost / LightGBM YOLOv8 / ResNet LSTM / Transformers Prophet SHAP / LIME

LLM & Data

GPT-4o / LLM APIs LangChain LangGraph ChromaDB / FAISS RAG Pipelines Pandas / NumPy SQL / PostgreSQL OpenCV

Cloud & Infra

AWS EC2 / S3 SageMaker Lambda / Kinesis RDS / CloudWatch Docker & Compose FastAPI Redis Nginx GitHub Actions

MLOps & Tools

MLflow Weights & Biases Streamlit Jupyter Python (Expert) Git / GitHub Linux / Bash

Got a manual workflow
that should be automated?

I'm looking for a full-time AI/ML Engineering role where there's a real problem — a bottleneck, a slow decision, a process someone is doing by hand. If that's your team, reach out directly. I reply within 24 hours.

Production AI —
not just notebooks.

Real systems.
Measured results.

Tools I work
with daily.

Built. Shipped. Measured.

AI Financial Research Agent

CineNeuro — Neural Audience Intelligence

Enterprise RAG Chatbot

Full-stack AI —
from manual process
to production system.

Every layer — so nothing blocks the deployment.

Got a manual workflow
that should be automated?

Production AI — not just notebooks.

Real systems.Measured results.

Tools I workwith daily.

Built. Shipped. Measured.

AI Financial Research Agent

CineNeuro — Neural Audience Intelligence

Enterprise RAG Chatbot

Full-stack AI —from manual processto production system.

Every layer — so nothing blocks the deployment.

Got a manual workflow that should be automated?

AI Financial Research Agent

The Problem

The Solution

Engineering Decisions

Business Value

Key Results

Stack

Enterprise RAG Chatbot

The Problem

The Solution

Engineering Decisions

Business Value

Key Results

Stack

AWS Retail Sales Forecasting

The Problem

The Solution

Key Results

Stack

Customer Churn Prediction

The Problem

The Solution

Key Results

Stack

Insurance Fraud Detection

The Problem

The Solution

Key Results

Stack

CineNeuro — Neural Audience Intelligence

The Problem

The Solution

Engineering Decisions

Business Value

Key Results

Stack

Production AI —
not just notebooks.

Real systems.
Measured results.

Tools I work
with daily.

Full-stack AI —
from manual process
to production system.

Got a manual workflow
that should be automated?