Senior Site Reliability Engineer (SRE) / DevOps Engineer Job at TekWissen LLC, Washington DC

aGs4ejFlR3N6b0lrM2NOTmMwM1N0dkRNcEE9PQ==
  • TekWissen LLC
  • Washington DC

Job Description

Overview:

TekWissen is a global workforce management provider headquartered in Ann Arbor, Michigan that offers strategic talent solutions to our clients world-wide. Our client provider of digital technology and transformation, information technology and services

Position: Senior Site Reliability Engineer (SRE) / DevOps Engineer

Location: Bothell, WA

Duration: 11 Months

Job Type: Temporary Assignment

Work Type: Hybrid

Job Description

  • We are seeking a highly experienced SRE / DevOps Engineer to support and scale a Kubernetes-based API Gateway platform built on a Java technology stack.
  • The role focuses on reliability, observability, automation, and performance, while also contributing to POCs around next-generation AI Gateway capabilities.

Key Responsibilities



Platform Reliability & Operations

  • Own reliability, availability, scalability, and performance of API Gateway services running on Kubernetes
  • Design and implement SRE best practices including SLIs, SLOs, SLAs, error budgets, and incident management
  • Lead production readiness reviews, root cause analysis (RCA), and post-incident improvements
  • Drive capacity planning, performance tuning, and resilience testing

Kubernetes & Cloud Engineering

  • Manage and optimize Kubernetes clusters (EKS / AKS / GKE / On-prem)
  • Develop and maintain Helm charts, manifests, and deployment strategies
  • Implement rollout strategies such as blue-green, canary, and rolling deployments
  • Collaborate with development teams to ensure cloud-native design patterns

Observability & Monitoring (Strong Focus)

  • Build and maintain enterprise-grade observability (O11y) solutions:
  • Prometheus & Grafana for metrics and dashboards
  • Splunk for centralized logging and alerting
  • OpenTelemetry for distributed tracing
  • Define actionable alerts and dashboards for platform and application health
  • Improve MTTR through better visibility and automation

CI/CD & Automation

  • Design and maintain CI/CD pipelines (Jenkins, GitHub Actions, GitLab CI, etc.)
  • Automate infrastructure using Infrastructure as Code (Terraform, CloudFormation, etc.)
  • Develop automation scripts using Python, Bash, or Groovy

Security & Compliance

  • Implement DevSecOps practices including secrets management, image scanning, and RBAC
  • Work closely with security teams on vulnerability remediation and compliance controls

Innovation & POCs

  • Actively contribute to POCs for AI Gateway / Intelligent API Gateway initiatives
  • Evaluate and prototype integrations with AI/ML-driven routing, observability, and security features
  • Stay current with emerging SRE, cloud, and AI gateway technologies

Required Skills & Qualifications



Must Have

  • 7 8 years of experience in SRE / DevOps / Platform Engineering
  • Strong hands-on experience with Kubernetes in production environments
  • Solid understanding of Java-based applications and JVM performance considerations
  • Deep expertise in Splunk, Prometheus, Grafana, and observability practices
  • Experience operating API Gateway platforms (Kong, Apigee, NGINX, Istio, etc.)
  • Strong Linux fundamentals and networking knowledge (TCP/IP, DNS, TLS)
  • Experience with cloud platforms (AWS / Azure / GCP)

Nice to Have

  • Experience with OpenTelemetry and distributed tracing
  • Exposure to AI Gateway / Intelligent Traffic Management concepts
  • Experience with service mesh (Istio / Linkerd)
  • Certification in Kubernetes (CKA / CKAD) or Cloud platforms

Soft Skills

  • Strong troubleshooting and problem-solving skills
  • Ability to work cross-functionally with developers, architects, and security teams
  • Proactive mindset with a passion for automation and reliability
  • Good documentation and communication skills

TekWissen Group is an equal opportunity employer supporting workforce diversity.

Job Tags

Temporary work,

Similar Jobs