Site Reliability Engineer (SRE) Job at Openkyber, California

b0ZMRElaZWtBOHdxUlFPbmVTazZQQ1plSEE9PQ==
  • Openkyber
  • California

Job Description

Overview:

Dataflix is seeking a highly experienced Senior or Lead Platform Engineer/Site Reliability Engineer (SRE)/Hadoop Admin to manage and enhance our petabyte-scale, on-premises data platform. This platform is built using the open-source Hadoop ecosystem. The ideal candidate brings deep technical expertise, a strong understanding of distributed systems, and extensive experience operating and optimizing large-scale data infrastructure.

Responsibilities:
  • Own and operate the end-to-end infrastructure of a large-scale, on-prem Hadoop-based data platform, ensuring high availability and reliability.
  • Design, implement, and maintain core platform components, including Hadoop, Hive, Spark, NiFi, Iceberg, ELK, OpenSearch and Ambari.
  • Automate infrastructure management, monitoring, and deployments using CI/CD pipelines (GitLab) and scripting.
  • Implement and enforce security controls, access management, and compliance standards.
  • Perform system upgrades, patching, performance tuning, and troubleshooting across platform components.
  • Optimize observability and telemetry using tools like Prometheus, Grafana, and OpenTelemetry for real-time performance monitoring and alerting.
  • Proactively monitor system health, resolve incidents, and conduct root-cause analyses to prevent recurrence.
  • Collaborate with data engineering, analytics, and infrastructure teams to align platform capabilities with evolving needs.
Requirements:
  • 10+ years of experience in Platform Engineering, Site Reliability Engineering, or similar roles, with proven success managing large-scale, distributed Hadoop infrastructure.
  • Deep expertise in the Hadoop ecosystem, including HDFS, YARN, Hive, Spark, NiFi, Ambari, and Iceberg.
  • Strong Linux system administration skills (CentOS/Rocky preferred), including system tuning, performance optimization, and troubleshooting.
  • Proficiency in containerization and orchestration using Docker and Kubernetes.
  • Solid experience with automation and Infrastructure as Code, leveraging tools like GitLab CI/CD and scripting in Python and bash.
  • Practical knowledge of monitoring and observability tools (e.g., Prometheus, Grafana, OpenTelemetry) and understanding of system health, alerting, and telemetry.
  • Familiarity with networking concepts, security protocols, and data compliance requirements.
  • Experience managing petabyte-scale data platforms and implementing disaster recovery strategies.
  • Understanding of data governance, metadata management, and operational best practices.

Job Tags

Similar Jobs

Wyndham Destinations

Confirmations Agent Job at Wyndham Destinations

 ...travelers worldwide. Position Summary: As a Confirmations Agent your primary responsibility will be to assist management and...  ...professional manner, reducing package cancellations and increasing show rate, tours and VPG by utilizing outstanding communication... 

Southeast Missouri State University

Assistant Professor - Nursing - Medical Surgical Job at Southeast Missouri State University

 ...Professor - Medical Surgical Nursing College of Education, Health and Human Studies...  ...instruction, in a faculty or preceptor role Recent clinical experience in nursing within the...  ...As a public regional university (with a graduate mission), Southeast seeks candidates... 

Ohana Outreach Financial

Remote Benefits Consultant Job at Ohana Outreach Financial

 ...provided. What We Provide Training & mentorship (no experience required) Access to reputable insurance carriers Optional...  ...third-party benefits and incentive opportunities Remote work with flexible scheduling Trips, bonuses, and optional benefits... 

Globe Life AO

Customer Service Representative WFH | No Experience Required | Immediate Start | Most Responsive - Job at Globe Life AO

100% Remote No Experience Needed Start This Week! Company: Globe...  ...AO Employment Type: Full-Time / Part-Time Location: Remote ...  ...Must be 18+ and authorized to work in the U.S. Motivated,...  ...Level Remote Jobs Work From Home Hiring Now Sales & Benefits... 

X/Celerant Consulting

Consultant (3+ years experience) Job at X/Celerant Consulting

 ...Company Description X/Celerant Consulting is a performance-based consultancy that specializes in accelerating operational and financial performance. With 30 years of deep expertise in various industries such as Aerospace, Chemicals, Construction, Manufacturing, Mining...