Senior AI DevOps / LLMOps

RemoteFull-timeSeniorHimalayasGamingDirect apply

AI-DevOpsLLMOpsMLOpsSite-Reliability-EngineeringCloud-EngineeringSenior-AI-ML-Operations-EngineerSenior-AI-LLM-EngineerSenior-AI-ML-DeveloperSenior-AI-ML-Engineer

At TechBiz Global, we are providing recruitment service to our TOP clients from our portfolio. We are currently seeking an Senior AI DevOps / LLMOpsspecialist to join one of our clients' teams. If you're looking for an exciting opportunity to grow in a innovative environment, this could be the perfect fit for you.

Key Responsibilities

Automation of Build-to-Production

Design and implement robust CI/CD pipelines tailored for AI, covering model weights,

dataset versioning, and application code.

Develop specialized workflows for PromptOps, ensuring that system prompts are

version-controlled, tested for regressions, and deployed with the same rigor as traditional

code.

Automate the deployment of Agentic workflows, managing the complexities of stateful

AI interactions and multi-agent handoffs.

2. AI Infrastructure as Code (IaC)

Provision and manage high-performance compute environments (GPU clusters, TPU

pods) using Terraform, Pulumi, or Ansible.

Define and enforce Policy-as-Code for AI endpoints to ensure compliance with security,

cost-usage limits, and data residency requirements.

Maintain a consistent environment across Hybrid Infrastructure, ensuring seamless

parity between On-Premises development and Cloud production.

3. Safe Experimentation & Controlled Releases

Architect Progressive Delivery strategies for AI, including Canary releases, Blue-Green

deployments, and Shadowing (where new models run in parallel with production to

compare outputs).

Build “Evaluation-in-the-Loop” gates within the pipeline to automatically test for bias,

hallucination, and performance degradation before a release.

Implement A/B testing frameworks specifically designed for LLM outputs and agentic

behavior.

4. Monitoring & Observability

Establish deep observability into Inference Endpoints, tracking metrics like tokens-per-

second, latency, and drift in model accuracy.

Integrate feedback loops that capture production “edge cases” to feed back into the

training and fine-tuning pipelines.

Requirements

Must-Have Technical Skills:

Orchestration: Advanced Kubernetes (K8s) skills, specifically with KubeFlow, Ray, or

NVIDIA Triton.

CI/CD & IaC: Expertise in GitHub Actions/GitLab CI, and Terraform or Pulumi.
AI Tooling: Experience with Weights & Biases, MLflow, LangSmith, or Arize

Phoenix.

Hardware: Understanding of GPU virtualization, CUDA drivers, and on-premises

hardware management.

Security: Familiarity with Open Policy Agent (OPA) and secret management (Vault).

Experience:

10+ years in DevOps, SRE, or Cloud Engineering.
2+ years of hands-on experience in MLOps or LLMOps, specifically moving LLMs

from notebook to production.

Proven experience managing Hybrid Cloud environments (e.g., AWS/Azure + Private

Data Center).

Highlights

full time and remote job

- fluent English is needed

Originally posted on Himalayas

Apply for this role

Apply

Careers page Company site Original listing

Salary estimate

$165k–$250k / yr

Based on: software · senior · US market

AI-powered tools

✦ Cover letter ✦ ATS score & resume ✦ Application questions ✦ Outreach message

Personalized help

Sign in for AI tools

Save your resume and get an ATS score, optimized resume, and cover letter tailored to this job.

People who may help

Alumni, recruiters, and shared-background connections at TechBiz Global.

Python team search

TechBiz Global Python recruiter engineer

Search LinkedIn Search web

Java team search

TechBiz Global Java recruiter engineer

Search LinkedIn Search web

SQL team search

TechBiz Global SQL recruiter engineer

Search LinkedIn Search web

Edit job details

Correct workplace type, employment type, experience level, or location if the source data is wrong.

Workplace typeEmployment typeExperience levelLocation

Senior AI DevOps / LLMOps

Key Responsibilities

Automation of Build-to-Production

code.

2. AI Infrastructure as Code (IaC)

compare outputs).

behavior.

4. Monitoring & Observability

training and fine-tuning pipelines.

Requirements

Must-Have Technical Skills:

NVIDIA Triton.

Phoenix.

hardware management.

Experience:

from notebook to production.

Data Center).

Highlights

- fluent English is needed