Manager, Site Reliability

Technology | Santa Monica, CA | Full-Time

Hulu is a premium streaming TV destination that seeks to captivate and connect viewers with the stories they love. We create amazing experiences that celebrate the best of entertainment and technology. We’re looking for great people who are passionate about redefining TV through innovation, unconventional thinking and embracing fun. It’s a mission that takes some serious smart, intense curiosity and determination to be the best.  Come be part of the team that’s powering play.

SUMMARY

Hulu is looking for a Site Reliability Engineering Manager to lead our new team.  As a Site Reliability Engineering Manager you will work closely with our engineering and operations teams to identify manual processes as candidates for automation and then build said automation.  You will help improve the signal to noise ratio of our monitoring systems by partnering with service owners, improving telemetry, and ensuring preventative and remediative solutions are operationalized for a non-engineering workforce.  If you are a person who takes pride in stability and believes every operational problem is a software problem, this is a great role for you.

WHAT YOU’LL DO

  • Design, build, or improve current systems that focus on scalability, availability, and efficiencies of Hulu’s services.
  • Identify mission critical problems and solve them via automation and design improvements.
  • Build or improve monitoring and instrumentation to predict future scalability or latency risks and solve them before they ever manifest into customer facing issues.
  • Develop best practices with development teams to improve scalability and reliability of Hulu’s services.
  • Design and improve the developer platform and infrastructure so that reliability and availability become an even more natural part of our software development process.

WHAT TO BRING

  • Experience as a Site Reliability Engineer or a Software Engineer focused on infrastructure and/or operations.
  • Experience with the building blocks of large scale systems including load balancing, fault tolerance, containers, instrumentation, predictive monitoring, etc
  • Familiarity with commonly available services and tools (AWS, Docker, Redis, New Relic, Heroku, Hadoop, etc)
  • Strong passion for automation, testing and code quality.
  • BS in Computer Science or equivalent preferred.
  • Familiarity with one or more of the following: Java, Python, Go

Just like the best ensemble casts of our favorite shows, Hulu embraces diversity and is proud to be an Equal Opportunity Employer.

Apply Save