Share this job
Principal Engineer – AI Infrastructure Abstractions
San Jose, CA
Apply for this job

As a Principal AI Infrastructure Abstraction Engineer, you will design and implement the foundational systems that make shared AI compute environments scalable, secure, and developer-friendly. Your work will focus on creating abstractions that hide hardware complexity while providing predictable, cloud-native interfaces for AI workloads.


This position bridges infrastructure and applied AI—turning raw GPUs and accelerators into programmable, elastic, and multi-tenant resources for both internal developers and enterprise clients.


Key Responsibilities


  • Architect abstractions that map logical compute constructs (vGPUs, GPU pools, workload queues) to physical devices.
  • Build APIs, services, and control planes that expose GPU and accelerator resources with strong isolation and quality-of-service guarantees.
  • Develop mechanisms for secure GPU sharing, including time-slicing, partitioning, and namespace isolation.
  • Work with orchestration and scheduling systems to ensure intelligent mapping of resources based on utilization, priority, and network topology.
  • Define policies for quotas, fair allocation, and resource elasticity in shared environments.
  • Integrate with AI/ML frameworks (PyTorch, TensorFlow, Triton, etc.) to optimize model training and inference workflows.
  • Deliver observability and monitoring capabilities that trace resource usage from logical abstractions to hardware.
  • Partner with platform security teams to strengthen access controls, onboarding processes, and tenant isolation.
  • Support internal developer adoption of abstraction APIs while maintaining high performance and low overhead.
  • Contribute to long-term compute platform strategy with a focus on modularity, abstraction, and scale.


Minimum Qualifications


  • Bachelor’s degree with 15+ years of experience, Master’s with 12+ years, or PhD with 8+ years.
  • Proven track record building production-grade infrastructure systems, preferably in Go, Python, or C++.
  • Strong experience with containerization and orchestration platforms (Kubernetes, Docker, KubeVirt).
  • Background in designing logical abstractions for compute, storage, or networking in multi-tenant systems.
  • Familiarity with integrating with machine learning platforms (e.g., PyTorch, TensorFlow, Triton, MLFlow).


Preferred Qualifications

  • Hands-on experience with GPU sharing, scheduling, or isolation (MIG, MPS, vGPUs, time-slicing, or device plugin models).
  • Deep knowledge of resource management: quotas, prioritization, fairness, elasticity.
  • Strong ability to think across hardware/software boundaries and design abstractions that scale.


Apply for this job
Powered by