CareersHelp Build the Open Cloud

Service Reliability Engineer (SRE)

San Francisco, CA, US
Remote

At Joyent, engineers at every level directly influence our business and services. Our Service Reliability Engineers are a hybrid of software and systems engineers responsible for reliability, scalability, and automation while keeping an eye on latency, performance, and capacity.

You want

To automate infrastructure behind customer facing APIs with high availability and exceedingly minimal downtime. To build systems that look elegant and reliable on the outside, even as they deal with the complexities of rigorous business logic and user permissions and hide hardware failures on the inside. To work in Go and other tools as necessary to build stuff. To participate and thrive in a delivery-oriented, goal-centric culture.

Successful candidates will:

  • Design, write, and maintain software to improve the availability, scalability, latency, and efficiency of Joyent's services, incorporating third-party open-source tools when available
  • Create new designs for a growing number of distributed systems
  • Design and implement the tools and processes used for deployment and change management
  • Plan and execute configuration management
  • Own, maintain, and continuously improve all systems provided as a service, such as monitoring and datastores
  • Engage in service capacity planning and demand forecasting, anticipating performance bottlenecks
  • Automate resource provisioning and allocation process
  • Run software performance analysis and system tuning
  • Plan and execute disaster recovery drills
  • Participate in rotating on-call duties

You have

A love of systems engineering, APIs, and constrained interface surface area. You’ve read the Google SRE book and have the background to argue the details of it. You laugh and cringe at Jepsen tests, subject your own systems to similar tests, and constantly seek ways to assure your systems meet their obligations while improving performance.

The ideal candidate doesn’t have all of the following, but recognizes them and craves experience with them:

  • Fluent in one or more of: Go, Node.js, Python, C
  • Minimum of 4 years of industry experience in engineering
  • Familiarity with algorithms, data structures, and complexity analysis
  • In-depth knowledge of operating systems (processes, threads, IPC, concurrency, locks, mutexes, semaphores, etc.)
  • Experience working with Unix/Linux systems from kernel to shell and beyond, with experience working with system libraries, file systems, and client-server protocols
  • Experience with network protocols and theory (TCP/IP, UDP, ICMP, MAC addresses, IP packets, DNS, OSI layers, and load balancing, etc.)
  • Experience with configuration management tools and cluster schedulers
  • Systematic problem solving approach
  • Strong sense of ownership and drive
  • Expertise in designing, analyzing, and troubleshooting large-scale distributed systems
  • Experience with eBPF or other performance profiling tools

Joyent offers

An opportunity to build a greenfield cloud, and scale it to meet the needs of the world’s largest cloud consumers. A highly distributed, remote-friendly team (US preferred, but we can work with candidates up to 9 hours offset from US Pacific time). An opportunity to shape product and business strategy and grow into new roles as the organization grows.

About Joyent

Joyent, a wholly-owned subsidiary of Samsung, is the open cloud company. With its Triton Kubernetes services and support, Joyent helps its customers build and operate modern cloud native applications across multiple clouds. Joyent’s Triton Private Regions provide low cost, dedicated cloud infrastructure that gives its customers the ability to own their data and control their cloud costs.

To apply, please submit a brief introduction, a copy of your resume, and a link to your Github or LinkedIn profile to jobs@joyent.com with Service Reliability Engineer (SRE) in the subject. Qualified applicants with criminal histories will be considered for the position in a manner consistent with the Fair Chance Ordinance.

View All Open Positions at Joyent

Get the Open Cloud Newsletter

Sign up for our newsletter with information about Joyent Triton, upcoming events, recent publications, and insight into the latest technologies surrounding the Open Cloud.