Atul Kumar — Software Engineer · AI Agents, LangGraph, AWS, Apache Spark, Distributed Systems
Software Engineer · AI Agents · Distributed Systems · Backend at Scale
I build production AI agents and large-scale distributed systems — from LangGraph + SageMaker retrieval pipelines to high-throughput Apache Spark systems processing millions of records and server-side streaming APIs sustaining tens of thousands of records per second.
↓ Download Resume (PDF)About
Software Engineer focused on production AI/ML systems and distributed infrastructure. I have shipped a LangGraph + AWS SageMaker agent that deflects 80% of internal support queries with sub-20-second responses over a 40-document corpus, streamed 5M+ records per request at ~50K records/sec via a server-side streaming API, sped up a critical Spark pipeline 6× (12 hrs → 2 hrs), and built shared Spark libraries that cut new-pipeline build time by ~75% across the team. Currently an Engineering Associate at Goldman Sachs Engineering.
I own systems end-to-end across AWS — Glue, Spark, ECS, Step Functions, Aurora, SageMaker — and enjoy the full spectrum from agent design and RAG retrieval to streaming APIs, OCR, and near-real-time data infrastructure.
Skills & Expertise
Languages
Backend
Frontend
AI / ML
Data & Pipelines
Cloud & Infra
Other Tools
Professional Experience
Engineering Associate
TINA — Travel Insights & Navigation Agent
- Deflected 80% of travel & expense support queries from human agents by building TINA, an internal AI agent that resolves policy questions (missed flights, reimbursements, approvals) end-to-end.
- Cut average policy-answer latency to under 20 seconds across 40 policy documents by building a RAG pipeline that ingests document embeddings into AWS SageMaker and orchestrates multi-step retrieval with a LangGraph agent.
Travel & Expense Platform
- Enabled streaming export of 5M+ live transaction, report, and ledger records per request by engineering a server-side API that sustains ~50K records/sec within a 10-minute window, eliminating the timeouts and OOM failures from the prior batch approach.
- Eliminated daily manual reconciliation by the operations team across 80 global markets by building an auto/manual voucher creation flow for vendor (Amex) payments with auto-generation of ledgers, statements, and voucher reconciliation.
- Met Goldman Sachs' Tech Raise Bar quality standard for the greenfield Travel & Expense platform by authoring an end-to-end integration test suite covering AWS Step Functions, ECS tasks and services, S3, and Aurora.
AI-Driven Invoice Lifecycle Management (POC)
- Eliminated manual data entry on every invoice submission by building an OCR layer that pre-fills 100% of form fields from invoice attachments for downstream human verification.
- Built an intelligent reviewer-assignment engine that routes invoices to reviewers based on expertise, calendar availability, and criticality, with cycle-time metrics fed back into the model for continuous improvement.
- Prevented duplicate payments across $2M+ in average daily invoice volume by adding a duplicate-detection layer over historical submissions.
Slate — Inter-Affiliate Outsourcing Agreements
- Drove data-backed feature prioritization for the next platform build by integrating GS Analytics into the Slate frontend, capturing feature-level telemetry from 800 users across 10 workflows in the agreement-creation pipeline.
High-Throughput Spark Pipeline on AWS Glue (FFIEC 009)
- Engineered a high-performance Apache Spark 3 pipeline on AWS Glue that ingests and processes data from 12 upstream sources under strict latency and correctness budgets.
- Reduced new-pipeline build time across the team by ~75% by designing shared libraries that standardize data processing and validation.
Engineering Analyst
Large-Scale Data Pipeline Re-Architecture (TIC B)
- Cut end-to-end pipeline runtime by 83% (12 hrs → 2 hrs) by re-architecting the workload on Apache Spark 3.
- Designed a versioning system that enables exact reproduction of any historical calculation across 60 onboarded data products by capturing the precise inputs and logic used for any past run.
- Replaced manual hand-offs with a fully automated end-to-end data pipeline and downstream generation flow.
- Advanced downstream data availability by 9 days each cycle by removing legacy system dependencies and building a new data model.
Kafka-Driven NRT Pipelines (TIC SLT / SHCA / SHLA)
- Cut data-quality issue resolution from 4–5 days of manual email follow-up to near-real-time by building a UI over Kafka-driven NRT pipelines that surfaces issues as they occur.
- Advanced downstream data availability by 17 days each cycle by shifting end-of-cycle batch processing to daily incremental processing.
Technology Intern
Natural Language Query Parser for ML Studio
- Removed the learning curve for Amex's ML Studio Logstash by building an end-to-end natural language query parser that translates plain-English questions into executable Logstash queries — an early NLP-to-DSL system.
Education
B.E. Computer Science
Class 12 (CBSE)
Class 10 (CBSE)
Achievements & Leadership
- Owned multiple regulatory reports end-to-end (including TIC-B) with full delivery responsibility — from raw data pipeline to final published report.
- Drove student engagement at PECfest by organizing technical PEC-ACM events with 45+ participants across 15 teams.
- Contributed as a volunteer researcher on the Indian-Origin Academicians Abroad project under DST.