Job Description
Overview:
TekWissen is a global workforce management provider headquartered in Ann Arbor, Michigan that offers strategic talent solutions to our clients world-wide. Our client provider of digital technology and transformation, information technology and services
Position: Senior Site Reliability Engineer (SRE) / DevOps Engineer
Location: Bothell, WA
Duration: 11 Months
Job Type: Temporary Assignment
Work Type: Hybrid
Job Description
- We are seeking a highly experienced SRE / DevOps Engineer to support and scale a Kubernetes-based API Gateway platform built on a Java technology stack.
- The role focuses on reliability, observability, automation, and performance, while also contributing to POCs around next-generation AI Gateway capabilities.
Key Responsibilities
Platform Reliability & Operations
- Own reliability, availability, scalability, and performance of API Gateway services running on Kubernetes
- Design and implement SRE best practices including SLIs, SLOs, SLAs, error budgets, and incident management
- Lead production readiness reviews, root cause analysis (RCA), and post-incident improvements
- Drive capacity planning, performance tuning, and resilience testing
Kubernetes & Cloud Engineering
- Manage and optimize Kubernetes clusters (EKS / AKS / GKE / On-prem)
- Develop and maintain Helm charts, manifests, and deployment strategies
- Implement rollout strategies such as blue-green, canary, and rolling deployments
- Collaborate with development teams to ensure cloud-native design patterns
Observability & Monitoring (Strong Focus)
- Build and maintain enterprise-grade observability (O11y) solutions:
- Prometheus & Grafana for metrics and dashboards
- Splunk for centralized logging and alerting
- OpenTelemetry for distributed tracing
- Define actionable alerts and dashboards for platform and application health
- Improve MTTR through better visibility and automation
CI/CD & Automation
- Design and maintain CI/CD pipelines (Jenkins, GitHub Actions, GitLab CI, etc.)
- Automate infrastructure using Infrastructure as Code (Terraform, CloudFormation, etc.)
- Develop automation scripts using Python, Bash, or Groovy
Security & Compliance
- Implement DevSecOps practices including secrets management, image scanning, and RBAC
- Work closely with security teams on vulnerability remediation and compliance controls
Innovation & POCs
- Actively contribute to POCs for AI Gateway / Intelligent API Gateway initiatives
- Evaluate and prototype integrations with AI/ML-driven routing, observability, and security features
- Stay current with emerging SRE, cloud, and AI gateway technologies
Required Skills & Qualifications
Must Have
- 7 8 years of experience in SRE / DevOps / Platform Engineering
- Strong hands-on experience with Kubernetes in production environments
- Solid understanding of Java-based applications and JVM performance considerations
- Deep expertise in Splunk, Prometheus, Grafana, and observability practices
- Experience operating API Gateway platforms (Kong, Apigee, NGINX, Istio, etc.)
- Strong Linux fundamentals and networking knowledge (TCP/IP, DNS, TLS)
- Experience with cloud platforms (AWS / Azure / GCP)
Nice to Have
- Experience with OpenTelemetry and distributed tracing
- Exposure to AI Gateway / Intelligent Traffic Management concepts
- Experience with service mesh (Istio / Linkerd)
- Certification in Kubernetes (CKA / CKAD) or Cloud platforms
Soft Skills
- Strong troubleshooting and problem-solving skills
- Ability to work cross-functionally with developers, architects, and security teams
- Proactive mindset with a passion for automation and reliability
- Good documentation and communication skills
TekWissen Group is an equal opportunity employer supporting workforce diversity.
Job Tags
Temporary work,