P
Himalayas

Staff Platform Engineer, AI/ML Infrastructure

Pfizer

Remote Worldwide · Full-time · Remote

Apply Now

You will apply on Himalayas. Bubird keeps the source attribution visible.

Work mode

Remote

Job type

Full-time

Experience

5+ years

Salary

EUR 65,250 - 108,750

Job Description

Staf f Platform Engineer, AI/ML Infrastructure Department:AI Software & Operations Role Summary The Staff Platform Engineer, AI/ML Infrastructure will provide technical leadership for thecloud platforms, deployment systems, and operational foundations that power enterprise-scalegenerative AI applications.

This role will define and evolve the infrastructure architecture for AI/ML platforms running across AWS,Kubernetes, serverless, and containerized environments.

The engineer will lead platform standards forreliability, scalability, observability, CI/CD, security, and developer enablement, while partnering closelywith software engineering, AI engineering, security, and operations teams.

The ideal candidate combines deep hands-on cloud engineering experience with staff-level technicalinfluence.

They are comfortable designing infrastructure patterns, writing infrastructure-as-code,improving delivery pipelines, mentoring engineers, and making architectural decisions that raise theoperational maturity of AI platforms across multiple teams.

Key Responsibilities Define and drive the technical strategy for AI/ML platform infrastructure supporting generative AIapplications, LLM integrations, model routing, and enterprise AI services.

Architect, build, and operate scalable cloud platforms using AWS services such as EKS, ECSFargate, Lambda, DynamoDB, S3, OpenSearch, Secrets Manager, CloudWatch, ALB, and MWAA.

Establish reusable infrastructure patterns using CloudFormation, Helm, and Terraform to supportreliable multi-environment and multi-region deployments.

Lead CI/CD architecture using GitHub Actions, reusable workflows, OIDC-based AWSauthentication, automated quality gates, deployment promotion, and environment approvals.

Design and improve observability across AI platforms, including CloudWatch dashboards, logs,alarms, Prometheus/Grafana, OpenSearch, Langfuse, and LLM-specific operational metrics.

Build platform capabilities for GenAI workloads, including model availability monitoring.

Partner with software engineering teams to improve deployment reliability, rollback strategies,health checks, autoscaling, load testing, and runtime performance.

Define and enforce security and compliance practices for infrastructure, including IAM permissionboundaries, Secrets Manager usage, secret scanning, audit logging, tagging standards, andchange-management controls.

Provide technical leadership for cost optimization, capacity planning, environment standardization,and operational resilience across development, test, production, and sandbox environments.

Compensation & Benefits

Compensation

EUR 65,250 - 108,750

Find Similar Backend Development Jobs

Browse more active roles in Remote Worldwide, or explore the full Backend Development category.

Ready to find your next opportunity?

Fresh job listings, free tools, and direct application links.

Browse Jobs
Apply Now