Founding LLM Inference Engineer (replacement search, exclusive)

Share this job

San Francisco, CA

Apply for this job

Founding LLM Inference Engineer

Full-time | On-site | San Francisco, CA

Compensation: $200K – $300K + 0.10%–1.00% Equity

About the Role

We’re looking for a Founding LLM Inference Engineer to architect and optimize large-scale inference systems powering cutting-edge AI applications. You’ll be building the backbone of an AI platform used by top enterprises, with a focus on performance, scalability, and reliability.

This is a hands-on, high-impact role where you’ll collaborate closely with research and product teams, moving fast to bring breakthrough model capabilities into production. If you’re excited about low-latency systems, high-throughput pipelines, and deploying bleeding-edge LLMs, this role is for you.

Tech stack: Python, CUDA, LLMs, API integrations, TGI, vLLM, TensorRT-LLM

What You’ll Do

Architect and implement scalable inference systems for state-of-the-art models
Optimize infrastructure for high throughput and low latency at scale
Develop and integrate advanced inference optimization techniques
Collaborate with research teams to productionize new model capabilities
Build developer tools and infra to support rapid experimentation and deployment

What We’re Looking For

Deep expertise in LLM inference, optimization, and deployment at scale
Strong background in Python and GPU programming (CUDA)
Experience with serving frameworks (TGI, vLLM, TensorRT-LLM)
Proven track record of shipping production-grade AI systems
Excitement about building foundational infra at an early-stage AI startup

Benefits

Competitive salary + equity (0.10%–1.00%)
Health, dental, and vision insurance
Daily team lunches and wellness stipend
Unlimited PTO + flexible parental leave
On-site role in San Francisco (5 days a week)

👉 Ready to take the next step?

Apply now or email Jenn at Recruiter@CareDynamicsFL.com to learn more.

Apply for this job