Principal Engineer – AI Infrastructure Abstractions

Share this job

San Jose, CA

As a Principal AI Infrastructure Abstraction Engineer, you will design and implement the foundational systems that make shared AI compute environments scalable, secure, and developer-friendly. Your work will focus on creating abstractions that hide hardware complexity while providing predictable, cloud-native interfaces for AI workloads.

This position bridges infrastructure and applied AI—turning raw GPUs and accelerators into programmable, elastic, and multi-tenant resources for both internal developers and enterprise clients.

Key Responsibilities

Architect abstractions that map logical compute constructs (vGPUs, GPU pools, workload queues) to physical devices.
Build APIs, services, and control planes that expose GPU and accelerator resources with strong isolation and quality-of-service guarantees.
Develop mechanisms for secure GPU sharing, including time-slicing, partitioning, and namespace isolation.
Work with orchestration and scheduling systems to ensure intelligent mapping of resources based on utilization, priority, and network topology.
Define policies for quotas, fair allocation, and resource elasticity in shared environments.
Integrate with AI/ML frameworks (PyTorch, TensorFlow, Triton, etc.) to optimize model training and inference workflows.
Deliver observability and monitoring capabilities that trace resource usage from logical abstractions to hardware.
Partner with platform security teams to strengthen access controls, onboarding processes, and tenant isolation.
Support internal developer adoption of abstraction APIs while maintaining high performance and low overhead.
Contribute to long-term compute platform strategy with a focus on modularity, abstraction, and scale.

Minimum Qualifications

Bachelor’s degree with 15+ years of experience, Master’s with 12+ years, or PhD with 8+ years.
Proven track record building production-grade infrastructure systems, preferably in Go, Python, or C++.
Strong experience with containerization and orchestration platforms (Kubernetes, Docker, KubeVirt).
Background in designing logical abstractions for compute, storage, or networking in multi-tenant systems.
Familiarity with integrating with machine learning platforms (e.g., PyTorch, TensorFlow, Triton, MLFlow).

Preferred Qualifications

Hands-on experience with GPU sharing, scheduling, or isolation (MIG, MPS, vGPUs, time-slicing, or device plugin models).
Deep knowledge of resource management: quotas, prioritization, fairness, elasticity.
Strong ability to think across hardware/software boundaries and design abstractions that scale.

Apply for this job