Senior Associate/Manager - Applied AI Engineer, Technology Consulting
Job description
We’re hiring an Applied AI Engineer to own the AI stack end-to-end — from the cloud infrastructure it runs on, through the model layer, to the systems and evaluations that keep it working in production. This is a builder role that flexes with the situation: some weeks you’re heads-down shipping as an individual contributor, others you’re setting architectural direction and leading a small team. We want someone who can do both well and read which the moment calls for.
You’ll work across the full model landscape. That means getting the most out of frontier APIs (Claude, GPT, and their peers) and deploying, fine-tuning, and operating open-source models when that’s the right call — on cost, latency, control, or privacy. Knowing which to reach for, and why, is a core part of the job.
We’re looking for someone who can apply first principles about why a transformer produces the output it does, drive real results out of frontier models, run open source models in production, and design the cloud infrastructure to serve them reliably and cost-effectively at scale.
The opportunity:
- Own the architecture of production AI systems — inference stacks, fine-tuning pipelines, retrieval and evaluation infrastructure, monitoring
- Build on frontier models (Claude, GPT, and peers) with real rigor: tool use, structured outputs, context and cost management, evals, and guardrails — not just prompt-and pray
- Deploy and operate open-source models (Llama, Qwen, Mistral, DeepSeek, and whatever comes next) on our cloud environment — including quantization, serving frameworks (vLLM, TGI, SGLang, TensorRT-LLM), and multi-GPU inference.
- Make the frontier-vs-open-source call deliberately, on cost, latency, control, and data sensitivity grounds — and be able to defend it
- Design the cloud infrastructure underneath it all: GPU orchestration, autoscaling, cost controls, VPC/networking, IAM, observability. This is not a “hand it to DevOps” role
- Fine-tune, distill, and evaluate models against real task metrics — not vibes, not leaderboards.
- Track the current research literature (arXiv, major labs, key conferences) and make judgment calls on what’s ready for production versus what’s still noise
- Depending on the project, contribute independently as a senior IC or lead and mentor a small team — and switch between the two as needed
- Partner with product and leadership to translate ambiguous problems into systems that actually ship
To qualify for the role, you must have:
Cloud & infrastructure engineering
- 6+ years of software / infrastructure engineering, with deep production experience on at least one major cloud (AWS, GCP, or Azure).
- Strong command of GPU infrastructure: instance selection, driver / CUDA stack, containerization, Kubernetes or an equivalent orchestrator, autoscaling patterns for inference workloads.
- IaC discipline (Terraform / Pulumi / CDK), CI/CD, monitoring (Prometheus / Grafana / OpenTelemetry), and cost management as second nature.
- Fluent in Python; comfortable in at least one systems-adjacent language (Go, Rust, C++) for the parts that need it.
ML / AI depth
- Genuine understanding of transformer internals: attention (including variants like MQA, GQA, sliding-window, and flash attention), positional encodings (RoPE, ALiBi), tokenization, KV cache mechanics, sampling, and where each part contributes to latency, memory, and quality
- Proven ability to get production-grade results from frontier models — Claude, GPT, and peers — including tool use / function calling, structured outputs, retrieval, context and cost management, and building the evals and guardrails around them.
- Hands-on experience fine-tuning open-source LLMs — full fine-tuning, LoRA / QLoRA, preference optimization (DPO / ORPO / equivalents) — and knowing when each is appropriate.
- Practical familiarity with the modern training and inference stack: PyTorch, Hugging Face, DeepSpeed or FSDP, vLLM or equivalent, evaluation frameworks
- Ability to read a recent paper, explain what’s actually new, and give an honest assessment of whether it’s worth integrating.
- Track record of taking models — frontier or open weights — into a production system that real users depend on, with the evaluation, guardrails, and monitoring to match.
What we look for:
- Experience with distributed training at scale (multi-node).
- Contributions to open-source ML tooling or model releases.
- Background in classical ML / applied research prior to the LLM era.
- Experience running on-prem or hybrid GPU environments
- Strong understanding of the mechanics before reaching for an abstraction.
- Able to contribute as an individual contributor, adapting to the needs of the situation
- Communicate tradeoffs clearly to non-technical stakeholders