CareersHelp Build the Open Cloud

Site Reliability Engineer (SRE)

San Francisco, CA, US
Remote

At Joyent, engineers at every level directly influence our business and services. Our Service Reliability Engineers are a hybrid of software and systems engineers responsible for reliability, scalability, and automation while keeping an eye on latency, performance, and capacity.

You Want

To automate infrastructure behind customer facing APIs with high availability, reliability, scalability.

To build systems that look:

  • Look elegant and reliable on the outside
  • Deal with the complexities of rigorous business logic
  • Transparently handle hardware failures

To work in Go and other tools as necessary to build systems. To participate and thrive in a delivery-oriented, goal-centric culture.

Successful candidates will:

  • Design, write, and maintain software to improve the availability, scalability, reliability, performance, and efficiency of Joyent’s services, incorporating third-party open-source tools when available
  • Create new designs for a growing number of distributed systems
  • Design and implement the tools and processes used for deployment and change management
  • Plan and execute configuration management
  • Automate resource provisioning and allocation process
  • Own, maintain, and continuously improve all systems provided as a service, such as monitoring and datastores
  • Engage in service capacity planning and demand forecasting, anticipating performance bottlenecks
  • Run software performance analysis
  • Plan and execute disaster recovery drills
  • Participate in rotating on-call duties

You have

A love of systems engineering, APIs, and making applications secure. You enjoy collaborating and coming up with reliable solutions that solve business needs. You constantly seek ways to ensure your systems meet the objectives while improving performance.

The ideal candidate doesn’t have all of the following, but is seeking to gain experience with them:

  • Comfortable with languages such as Go and Python
  • Expertise in mariadb/mysql clustering
  • Minimum of 4 years of industry experience in engineering
  • Familiarity with algorithms, data structures, and complexity analysis
  • Experience working with Linux systems from kernel to shell including working with system libraries, file systems, and client-server protocols
  • Experience with networking (TCP/IP, UDP, ICMP, ARP, DNS, load balancing, etc.)
  • Experience with configuration management tools (Ansible)
  • Systematic problem solving
  • Elegant and simple solutions to complex problems
  • Strong sense of ownership and drive
  • Expertise in designing, analyzing, and troubleshooting large-scale distributed systems

Joyent offers

  • An opportunity to build a cloud solution and scale it to meet the needs of the world’s largest cloud consumers
  • A highly distributed, remote-friendly team (US preferred)
  • An opportunity to shape product and business strategy and can grow into new roles as the organization grows

About Joyent

Joyent, a wholly-owned subsidiary of Samsung, is the open cloud company. With its Triton Kubernetes services and support, Joyent helps its customers build and operate modern cloud native applications across multiple clouds. Joyent’s Triton Private Regions provide low cost, dedicated cloud infrastructure that gives its customers the ability to own their data and control their cloud costs.

To apply, please submit a brief introduction, a copy of your resume, and a link to your Github or LinkedIn profile to jobs@joyent.com with Site Reliability Engineer (SRE) in the subject. Qualified applicants with criminal histories will be considered for the position in a manner consistent with the Fair Chance Ordinance.

View All Open Positions at Joyent

Opt In to the Joyent Newsletter

Our regular newsletter includes Joyent product information, upcoming vidoes, blogs and content.