DevOps & Site Reliability Engineer
VoltaGrid · Houston
Job description
About the role
We are looking for a DevOps & Site Reliability Engineer to design, build and operate the cloud and on‑premise infrastructure that powers our applications. The role works closely with software engineering teams to ensure services are scalable, observable and resilient while fostering a culture of operational excellence.
Key responsibilities
- Design, implement and maintain cloud infrastructure on AWS (or other major cloud providers).
- Manage and optimise Kubernetes clusters and containerised workloads in production.
- Develop infrastructure‑as‑code using Terraform and related tooling.
- Build and improve CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins, etc.) for fast and safe deployments.
- Implement monitoring, alerting and observability solutions such as Prometheus, Grafana and Datadog.
- Define and track SLIs/SLOs, participate in incident response, root‑cause analysis and blameless post‑mortems.
- Automate repetitive tasks and create self‑service tooling to reduce toil.
- Configure and maintain on‑prem bare‑metal servers, Linux systems and virtualised assets.
- Collaborate with development teams on system design, capacity planning and performance optimisation.
- Participate in on‑call rotations and ensure production readiness of new services.
Required profile
- Minimum 4 years of experience in DevOps, SRE or infrastructure engineering.
- Strong experience with at least one major cloud provider (AWS preferred).
- Hands‑on experience operating Kubernetes and Docker in production.
- Proficiency with Terraform or comparable infrastructure‑as‑code tools.
- Experience building CI/CD pipelines using GitHub Actions, GitLab CI, Jenkins or similar.
- Solid understanding of monitoring, logging and tracing concepts.
- Strong scripting abilities in Bash, Python or Go.
- Proven incident‑management experience and familiarity with SLO‑based reliability practices.
- Deep Linux administration skills (Ubuntu, RHEL/CentOS).
- Knowledge of virtualization, networking, DNS, load balancing and security fundamentals.
Required skills
- AWS (or GCP/Azure)
- Kubernetes
- Docker
- Terraform
- GitHub Actions / GitLab CI / Jenkins
- Prometheus
- Grafana
- Datadog
- Bash
- Python
- Go
- Linux (Ubuntu, RHEL/CentOS)
- Virtualization platforms
- Networking, DNS, load balancing, security basics
Questions fréquentes
Why are you reporting this job?
Apply in 30 seconds
Enter your email to apply. An account will be created automatically.
By continuing, you accept our terms of use.
Already have an account? Login
Published 1 week ago
Expires 1 month from now
13 views · 0 interested
Boost your chances
Upload your CV — we will match you with relevant openings.
Analyzing your CV...
VoltaGrid
Houston