At SafetyCulture, we’re solving problems that fundamentally change the way people work. Recently valued at AU$1.3Bn, we’re expanding into sensors/IoT and telematics and evolving into a communication and collaboration platform. With an ambitious goal to have 100 million workers using our products every day, we’re facing interesting technical challenges and looking for talented people to join our team.
The Role
As a Site Reliability Engineer at SafetyCulture, you’ll help to design, build and run resilient systems. You live and die by Murphy’s Law, knowing that anything that can go wrong will go wrong at the worst possible moment. You will help to foster a culture of designing for, and expecting failure in production systems - a culture where learning and knowledge-sharing is expected.
You love to solve sticky cross-service and cross-domain problems, and have a passion to identify root causes in complex scenarios. You understand how important it is for the teams to analyze past incidents. Most importantly you are a team-player, are excited about the prospect of working in a fast-paced demanding environment and get that learning happens at the edge of the comfort zone.
How you can have an impact
As one of a core team of experienced SREs, you will shape and mature the culture, define the processes that the development teams will follow, and allow the business to scale to millions of users. You’ll coach and educate your engineering colleagues on systems reliability and fault-tolerance best practice, identify gaps in existing systems and come up with remediation plans. You’ll improve metrics such as MTTR and MTTF, and promote a culture of sustainable incident response and blameless post-mortem. We encourage involvement in the community, open source work, attending talks and events, and experimenting with new technologies.
What you’ll need
-
- Fluency in at least one modern programming language
- An ability to wrangle with infrastructure tooling, and get why infrastructure-as-code is mandatory
- A solid level of understanding of observability, alerting and alarming best practice
- Excellent human-handling-skills with an ability to build and maintain healthy cross-team relationships
- You balance your love of systems-engineering with a product-mindset and build empathy with your customers and your product-engineering colleagues