AI / ML Engineer
3 live AWS deployments: investment research, document retrieval, and neural audience intelligence — all automated, auditable, and measured by the work they replace.
Impact
Every number below comes from a production deployment that replaced a manual process — not a Kaggle notebook.
Stack
Selected Work
Every project below replaced a manual process that was costing someone real time, money, or accuracy.
01 — LLM Agent · Live Product
Designed and deployed a deterministic investment decision engine — a 7-step async LangGraph pipeline generating auditable BUY/HOLD/SELL recommendations in under 30 seconds. GPT-4o is constrained to explanation-only; all decisions are made by a normalized multi-factor scoring system. Zero LLM hallucination risk on recommendations. Running in production.
Try it live
Financial analysts spend days compiling research from 10+ scattered data sources. Manual work is slow and doesn't scale.
GPT-4o agent loop that autonomously calls live APIs — market data, financial news, SEC fundamentals — then synthesises a structured markdown report.
Multi-factor scoring engine (Technical 0–25, Fundamental 0–40, Sentiment 0–15) with conflict-aware logic — signal disagreement (variance > 0.15) overrides to HOLD, eliminating false conviction. GPT-4o constrained to explanation-only: zero influence over the decision, fully auditable outputs. Redis TTL 900s caching cuts repeated query latency from ~15s to <100ms. LangGraph chosen for structured async pipeline over ad-hoc tool-calling.
A 5-factor confidence model (data completeness, signal agreement, volatility, news uncertainty) blocks recommendations below 20% confidence — solving the false-conviction problem of generic LLM finance tools. Portfolio ranking mode supports 2–8 tickers with proportional capital allocation. Any firm paying analysts $80k+/year offsets that cost with sub-30s auditable decisions.
02 — Neuroscience AI · Live Product · Meta FAIR TRIBE v2
Engineered a multimodal brain encoding pipeline that predicts second-by-second fMRI activations across 20,484 cortical vertices — giving studios audience emotional response intelligence before a trailer ever releases. Three neural networks (V-JEPA2, Wav2Vec-BERT, Llama 3.2) feed Meta FAIR's TRIBE v2 to map vision, audio, and language into 5 emotion channels across 7 brain regions. Live on AWS. Submitted PR #20 to Meta FAIR's TRIBE v2 — added int8 compute_type fallback for WhisperX on non-CUDA devices. Before the fix: hardcoded float16 caused ValueError on Apple Silicon and CPU-only Linux. After: the full pipeline runs everywhere. CUDA path unchanged, <1% WER degradation on transcription accuracy. CLA signed, checks passing.
Try it live
Studios spend millions on trailers with no way to predict emotional response before release. Test screenings are expensive, slow, and limited to small samples — a studio can't know second-by-second where audiences peak in excitement or disengage.
Multimodal brain encoding pipeline: V-JEPA2 extracts video features, Wav2Vec-BERT processes audio, Llama 3.2 embeds the transcript — all three streams feed Meta FAIR's TRIBE v2, which predicts fMRI activations across 20,484 cortical vertices per second of video. Activations are mapped to 5 emotion channels (excitement, fear, joy, suspense, boredom) across 7 brain regions using neuroscience-based weights.
Two-tier deployment: g4dn.xlarge GPU instance ($0.53/hr, Tesla T4 15GB VRAM) spun on-demand for inference only, t3.micro serves 24/7 with pre-computed JSON results at <100ms — keeping monthly cost near $0. 12.89GB across three extractors barely fits in T4 VRAM, requiring careful model loading order. Multi-stage Docker build (Node 18 + Python 3.11-slim) keeps the production image at ~500MB with no ML weights. Submitted PR #20 to Meta FAIR's TRIBE v2 — added int8 compute_type fallback for WhisperX on non-CUDA devices. Before the fix: hardcoded float16 caused ValueError on Apple Silicon and CPU-only Linux. After: the full pipeline runs everywhere. CUDA path unchanged, <1% WER degradation on transcription accuracy. CLA signed, checks passing.
Second-by-second emotion maps identify the 3 strongest and 3 weakest moments in a trailer — actionable edit intelligence. Persona simulation models how Action Lovers, Romance Fans, and Horror Enthusiasts respond differently. Competitive benchmarking against Oppenheimer, Avengers Endgame, Inception, Interstellar, and The Dark Knight gives studios a quantified baseline. Full analysis costs ~$0.50 in GPU time, replacing a test screening that costs thousands.
03 — RAG Pipeline · Live Product
Engineered a production RAG system with MMR retrieval, hallucination guardrails, and a rule-based query classifier — no LLM call fires without sufficient retrieval confidence. 87% retrieval accuracy. Two interfaces: Streamlit chat UI + FastAPI developer API. Live on AWS with HTTPS.
Try it live
Enterprise knowledge locked in PDFs and Word docs means employees waste hours on manual search instead of actual work.
Full RAG pipeline: LangChain orchestration, ChromaDB vector store, GPT-4 generation with streaming. Supports multi-format doc ingestion.
MMR (Maximal Marginal Relevance) over plain cosine similarity — fetches a larger candidate pool and re-ranks for relevance and diversity, preventing near-duplicate chunks reaching the LLM. Rule-based query classifier (no LLM cost) detects factual/complex/ambiguous intent and adjusts top_k/fetch_k accordingly. Three API cost guards: rate limiting (20 req/min), 500-char input cap, tiktoken token budget — all fire before any LLM call. Hallucination guardrails check retrieval confidence and return a fallback instead of guessing on low scores.
Citation format [SOURCE N: filename — Page: X] makes every answer auditable — critical for regulated industries. Dual interfaces (Streamlit for business users, FastAPI Swagger at /api/docs for engineering teams) means one deployment serves both audiences. The cost guards and query classifier make this viable at scale without API spend ballooning.
About
Designed and deployed complete AI systems — not just notebooks. A model that never leaves a researcher's laptop isn't a product — it's a cost. Every system I've built started with a process someone was doing manually. My job was to make that unnecessary. Engineered across the full stack: model architecture and training, API development, containerisation, and AWS cloud deployment.
Owned three commercially critical areas end-to-end: LLM agents and RAG pipelines (GPT-4o, LangGraph, deterministic decision systems, hallucination mitigation), computer vision systems (YOLO, ResNet, real-time inference), and predictive ML (forecasting, churn, fraud). I care less about model accuracy on holdout sets and more about what happens when the system goes live. Every system is measured against accuracy, latency, and business outcome — not validation loss.
Capabilities
Full Stack
Contact
I'm looking for a full-time AI/ML Engineering role where there's a real problem — a bottleneck, a slow decision, a process someone is doing by hand. If that's your team, reach out directly. I reply within 24 hours.