Senior Associate/Manager - Applied AI Engineer, Technology Consulting

Location:

Other locations: Primary Location Only

Date: Jul 3, 2026

Requisition ID: 1722955

We’re hiring an Applied AI Engineer to own the AI stack end-to-end — from the cloud infrastructure it runs on, through the model layer, to the systems and evaluations that keep it working in production. This is a builder role that flexes with the situation: some weeks you’re heads-down shipping as an individual contributor, others you’re setting architectural direction and leading a small team. We want someone who can do both well and read which the moment calls for.

You’ll work across the full model landscape. That means getting the most out of frontier APIs (Claude, GPT, and their peers) and deploying, fine-tuning, and operating open-source models when that’s the right call — on cost, latency, control, or privacy. Knowing which to reach for, and why, is a core part of the job.

We’re looking for someone who can apply first principles about why a transformer produces the output it does, drive real results out of frontier models, run open source models in production, and design the cloud infrastructure to serve them reliably and cost-effectively at scale.

The opportunity:

Own the architecture of production AI systems — inference stacks, fine-tuning pipelines, retrieval and evaluation infrastructure, monitoring
Build on frontier models (Claude, GPT, and peers) with real rigor: tool use, structured outputs, context and cost management, evals, and guardrails — not just prompt-and pray
Deploy and operate open-source models (Llama, Qwen, Mistral, DeepSeek, and whatever comes next) on our cloud environment — including quantization, serving frameworks (vLLM, TGI, SGLang, TensorRT-LLM), and multi-GPU inference.
Make the frontier-vs-open-source call deliberately, on cost, latency, control, and data sensitivity grounds — and be able to defend it
Design the cloud infrastructure underneath it all: GPU orchestration, autoscaling, cost controls, VPC/networking, IAM, observability. This is not a “hand it to DevOps” role
Fine-tune, distill, and evaluate models against real task metrics — not vibes, not leaderboards.
Track the current research literature (arXiv, major labs, key conferences) and make judgment calls on what’s ready for production versus what’s still noise
Depending on the project, contribute independently as a senior IC or lead and mentor a small team — and switch between the two as needed
Partner with product and leadership to translate ambiguous problems into systems that actually ship

To qualify for the role, you must have:

Cloud & infrastructure engineering

6+ years of software / infrastructure engineering, with deep production experience on at least one major cloud (AWS, GCP, or Azure).
Strong command of GPU infrastructure: instance selection, driver / CUDA stack, containerization, Kubernetes or an equivalent orchestrator, autoscaling patterns for inference workloads.
IaC discipline (Terraform / Pulumi / CDK), CI/CD, monitoring (Prometheus / Grafana / OpenTelemetry), and cost management as second nature.
Fluent in Python; comfortable in at least one systems-adjacent language (Go, Rust, C++) for the parts that need it.

ML / AI depth

Genuine understanding of transformer internals: attention (including variants like MQA, GQA, sliding-window, and flash attention), positional encodings (RoPE, ALiBi), tokenization, KV cache mechanics, sampling, and where each part contributes to latency, memory, and quality
Proven ability to get production-grade results from frontier models — Claude, GPT, and peers — including tool use / function calling, structured outputs, retrieval, context and cost management, and building the evals and guardrails around them.
Hands-on experience fine-tuning open-source LLMs — full fine-tuning, LoRA / QLoRA, preference optimization (DPO / ORPO / equivalents) — and knowing when each is appropriate.
Practical familiarity with the modern training and inference stack: PyTorch, Hugging Face, DeepSpeed or FSDP, vLLM or equivalent, evaluation frameworks
Ability to read a recent paper, explain what’s actually new, and give an honest assessment of whether it’s worth integrating.
Track record of taking models — frontier or open weights — into a production system that real users depend on, with the evaluation, guardrails, and monitoring to match.

What we look for:

Strong understanding of the mechanics before reaching for an abstraction.
Able to contribute as an individual contributor, adapting to the needs of the situation
Communicate tradeoffs clearly to non-technical stakeholders

Provider	Description	Enabled
AddThis	Google Analytics is a web analytics service offered by Google that tracks and reports website traffic. Cookie Information Privacy Policy Terms and Conditions	Consent to cookies from provider AddThis
LinkedIn	LinkedIn is an employment-oriented social networking service. We use the Apply with LinkedIn feature to allow you to apply for jobs using your LinkedIn profile. Opting out of LinkedIn cookies will disable your ability to use Apply with LinkedIn. Cookie Policy Cookie Table Privacy Policy Terms and Conditions	Consent to cookies from provider LinkedIn
Google Analytics	Google Analytics is a web analytics service offered by Google that tracks and reports website traffic. Cookie Information Privacy Policy Terms and Conditions	Consent to cookies from provider GoogleAnalytics
Google Tag Manager	Google Tag Manager is a tag management system for conversion tracking, site analytics, remarketing, and more. Privacy Policy Terms and Conditions	Consent to cookies from provider GoogleTagManager

Senior Associate/Manager - Applied AI Engineer, Technology Consulting

Job description