Job Description
About Us:
AI needs a new infrastructure layer. We're building it at Modal.
Every era of computing brought new workloads that previous infrastructure couldn't support: mainframes, databases, and the cloud. Each time, the company that rebuilt the layer underneath defined the decade. AI is no different, except it touches everything instead of one slice, and the window to build the layer underneath it is open right now.
Our customers include category-defining companies like Lovable, Ramp, Cognition, DoorDash, and Suno. They rely on Modal for instant GPU access, sub-second container starts, and native storage, so it's simple to serve low-latency inference, fine-tune models, and access production-ready sandboxes at scale.
We recently raised a $355M Series C at a $4.65B valuation, led by General Catalyst and Redpoint Ventures. We've crossed $300M+ ARR and grown fivefold since September.
Our team includes creators of popular open-source projects (e.g.,Seaborn,Luigi), academic researchers, international olympiad medalists, and experienced engineering and product leaders with decades of experience.
The Role:
At Modal, we sell cloud services atop which our customers run their critical production systems. As a rapidly growing new cloud infrastructure company, we seek to improve our reliability dramatically while scaling the size of our platform, customer base, and our team.
This role is for people who are deep systems thinkers, love stacking nines, and thrive from making others move faster at scale. Responsibilities include:
Identifying architectural changes to improve reliability and performance.
Fostering a culture of reliability across Modal’s engineering organization.
Defining and implementing operational processes such as deployments, upgrades, etc.
Operating systems like Kubernetes, Postgres, Redis, etc.
Participating in on-call rotations, and responding to production incidents.
Requirements:
5+ years of experience writing high-quality production code.
2+ years of on-call experience for critical production services.
Strong cloud skills, and deep familiarity with at least one hyperscaler cloud (AWS preferred).
Familiarity with auto scaling, fleet management, and capacity planning at scale.
Experience operating databases, monitoring, CI/CD, and other infrastructure, at scale
Experience owning and scaling Kubernetes clusters to thousands of nodes a plus.
Experience with systems safety research (e.g. STAMP) and control theory a plus.
Ability to work in-person in our NYC or Stockholm offices.
Required Skills
Categories
Frequently asked questions
Is the Member of Technical Staff - Platform Engineering position at Modal remote?
The Member of Technical Staff - Platform Engineering role at Modal is an on-site or hybrid position.
What type of employment is the Member of Technical Staff - Platform Engineering role?
Modal is hiring for a full-time Member of Technical Staff - Platform Engineering position.
What skills are needed for the Member of Technical Staff - Platform Engineering job at Modal?
Key skills for this role include Kubernetes, AWS, GPU.
How do I apply for the Member of Technical Staff - Platform Engineering position at Modal?
You can apply for the Member of Technical Staff - Platform Engineering role directly through Modal's official application link provided on this page.
Similar AI jobs
Strategic Delivery Lead, Cyber
OpenAI · fulltime
Senior/Staff Applied AI Engineer, Devops/SRE
Mistral AI · fulltime
Senior Enterprise Technology Administrator, GTM Systems
Crusoe · fulltime
Supplier Quality Engineer (Taiwan)
Etched · fulltime
Manager, Business Development (BDR)
Fireworks AI · fulltime
Manager, RL Algorithms & Decoder
Zoox · fulltime