CareersHelp Build the Open Cloud

Senior Site Reliability Engineer, Storage


Remote

At Joyent, engineers at every level directly influence our business and services. Our Site Reliability Engineering team is a hybrid of software and systems engineers responsible for reliability, scalability, and automation of our platforms.

A senior engineer wants:

To architect infrastructure and applications behind customer facing APIs with high availability, reliability, scalability, delivering a great customer experience.

To build systems that:

  • Are low toil for operators
  • Have elegant interfaces for users
  • Deal with the complexities of rigorous business logic
  • Supply adequate informational context for engineers to make good decisions about
  • Transparently handle failures

To participate and thrive in a delivery-oriented, goal-centric culture.

Responsibilities:

Change Management:

  • Contribute to the design and creation of CI/CD system
  • Scaling up/out services based on monitoring feedback to prevent overload
  • Plan and manage new service deployments

Day-to-day Monitoring Improvements:

  • Identify monitoring gaps based on incident root-cause analysis
  • Identify monitoring needs for new/modified services
  • Work with monitoring system owners to implement new metrics, dashboards and alarms

Standard Operating Procedures:

  • Create/maintain Runbook/Standard Operating Procedures for recurring processes (alarm handling, audit, scaling)
  • Contribute to product operator guide and debugging documentation

On-call Support (24x7):

  • Act as the first escalation point for application issues
  • Work with the incident response team to restore services
  • Work rotating shifts and weekend schedules as required

Root Cause Analysis:

  • Write RCA reports, work with engineering, operations, and customer support to produce action plans
  • Assist and participate in action plans, with a focus on mitigations for any systemic resiliency and reliability issues; champion improvement initiatives
  • Ensure RCA action plans are followed through

Operator Tooling:

  • Identify “toil” and look for automation opportunities
  • Create/maintain operator tools; create product enhancement requests to replace workarounds

Capacity/Usage Monitoring:

  • Work with product teams on capacity usage threshold and future planning
  • Rectify underlying system issues which may have contributed to the capacity problems

Qualifications

You’ll be a great fit if you are:

  • Great to work with and have great communication and people skills
  • Passionate about building services in the cloud computing market
  • Not afraid to respectfully disagree with others when quality does not meet standards
  • Able to perform and make sound decisions under pressure

Additionally, you should have most (or all!) of the following:

  • 5+ years experience in one of the following areas: software development, DevOps, SRE, Cloud infrastructure, QA
  • Experience with building and running a cloud platform in a production environment
  • Deep hands-on technical expertise in designing and deploying Linux / Unix based systems
  • Experience with building and operating CI/CD pipelines
  • Experience with building and maintaining authentication systems.
  • Experience with monitoring tools such as Circonus, Prometheus, and InfluxDB/Telegraph/Kapacitor
  • Working knowledge of all aspects of cloud infrastructure services (compute, storage, network, authentication)

About Joyent

Joyent, a wholly-owned subsidiary of Samsung, is the open cloud company. With its Triton Kubernetes services and support, Joyent helps its customers build and operate modern cloud native applications across multiple clouds. Joyent’s Triton Private Regions provide low cost, dedicated cloud infrastructure that gives its customers the ability to own their data and control their cloud costs.

To apply, please submit a brief introduction, a copy of your resume, and a link to your Github or LinkedIn profile to jobs@joyent.com with Senior Site Reliability Engineer, Storage in the subject. Qualified applicants with criminal histories will be considered for the position in a manner consistent with the Fair Chance Ordinance.

View All Open Positions at Joyent

Opt In to the Joyent Newsletter

Our regular newsletter includes Joyent product information, upcoming vidoes, blogs and content.