Site Reliability Engineer (SRE) Job at MSD Malaysia, Rahway, NJ

dFh4WjZTOXRBVGhaS0lmWXFyODc5bHEyaXc9PQ==
  • MSD Malaysia
  • Rahway, NJ

Job Description

Site Reliability Engineer (SRE)

Job Description:
We are looking for a Site Reliability Engineer (SRE) to lead and establish the SRE domain within the organization.

You will be responsible for ensuring the reliability, availability, and performance of our systems and applications. Collaborate closely with Software Engineers, DevOps teams, Security teams, and Program Managers to build and maintain scalable infrastructure, monitor critical systems, and automate repetitive tasks to improve efficiency and uptime. Your primary goal is to maintain an optimal balance between system stability, feature development, and fast delivery cycles.

Key Responsibilities:

  1. Monitoring : Monitoring of AWS (Azure – advantage) infrastructures using DataDog (or equivalent) using KPIs. Proven experience with defining efficient alerts, synthetic tests, analyzing logs (error detection), detecting issues using DataDog, managing SLIs and SLOs, leveraging NOC activity, and defining flows.
  2. Architecture Understanding: Infrastructure: In-depth understanding of designing distributed systems on cloud-based environments and microservices. Business Logic: Understand complex cloud product architectures, including event-driven architecture, with a focus on how data flows and messages interact between services.
  3. Continuous Improvement & Documentation: Develop and maintain technical documentation for processes, procedures, and systems; conduct post-incident reviews and implement preventative measures and lead Root Cause Analysis (RCA) and Incident management when issues arise.
  4. Infrastructure & Cloud: Proven experience with AWS services such as API Gateway, Lambda Functions, SQS, SNS, S3 Bucket, RDS, Redis Cache, Kinesis, Global Accelerator, CloudFront, and Route 53, with an understanding of most common cloud services in production environments and IAC understanding using Terraform.
  5. Automation and CI/CD: Experience with Azure DevOps, GitHub Actions, Argo, GitOps, Artifact management using Artifactory. Ability to review pipelines and Helm charts or equivalent, understand Automation processes. Familiarity with CrossPlan.
  6. Security (Preferred): Experience with Web Application Firewalls (WAF) rules review, rate limiting on services and infrastructure based on data analysis and collaboration with DevSecOps.

Personal Requirements:

  • Bachelor’s degree in computer science or equivalent proven experience.
  • At least 2-3 years in a hands-on DevOps or SRE position.
  • Strong communication skills to align, document, and share knowledge across teams are a must when working with cross-functional teams.
  • Ability to work in high-load and lead sensitive situations and investigations, especially when customer-facing services are impacted.
  • Great motivation for continuous learning and adoption of new technologies and excellent problem-solving skills with a proactive approach.
#J-18808-Ljbffr

Job Tags

Night shift,

Similar Jobs

Source One Technical Solutions

Learning Management System Administrator Job at Source One Technical Solutions

 ...Job Description W2 Only (No C2C or 3rd parties) Overview: We are seeking a strategic and technically skilled LMS Super Administrator to lead the administration and optimization of our enterprise-wide learning platform, Docebo. This role will serve as the... 

Essentia Health

PHYSICIAN - Radiology (0.6-1.0 FTE) - Hayward, WI Job at Essentia Health

 ...Radiologist needed (0.6-1.0 FTE) Hybrid position with opportunity to work remotely 50% of the time and on-site in Hayward, WI, 50% of the...  ...- 4:30 PM. No call, no weekends. Flexibility to work from home up to 3 days per week, if desired. Generous time off.... 

Schneider

Owner-operator Van Truckload truck driver Job at Schneider

Join the elite fleet of Schneider as an Owner-operator Van Truckload Truck Driver and become a part of a highly efficient transportation network. At Schneider, we are committed to delivering premiere transportation and logistics solutions with a strong emphasis on safety... 

SP Software Solutions

Scrum Master (Philadelphia) Job at SP Software Solutions

 ...Job Title: Scrum Master Location: Philadelphia, PA (Hybrid Onsite 3 Days/Week) Duration: 6+ Month Contract Job Description We are seeking an experienced Scrum Master to join our healthcare clients technology team in Philadelphia, PA. The ideal candidate will... 

Hytek Finishes

Aerospace Quality Engineer Job at Hytek Finishes

 ...Quality Engineer Elevate Aerospace Excellence Kent, WA | Full-Time , Day Shift (MF) | Reports to: Quality Manager Salary: $85,000 $110,000 (DOE) | FAA Repair Station Environment Advance aerospace innovation with precision and purpose. We're seeking...