Share on facebook
Share on twitter
Share on linkedin
Share on whatsapp
Share on email

Site Reliability Engineer

USA, Florida · Full-time Remote · Intermediate

About The Position

As Elementor’s SRE you will work in a team responsible for keeping Elementor’s systems available, reliable, monitored, and performant 24/7. You will work closely with the company’s R&D and Product leadership.

Work Environment

Elementor is the leading website building platform for WordPress professionals.

Our vision is to empower web creators - developers, designers, and marketers - with the ability to create their futures, one pixel at a time. We provide our users with everything they need to become successful web creators.

A new website is created every ten seconds using Elementor! 

Since launching in Israel in 2016, Elementor has expanded to more than 180 countries and now powers over 10M websites. More than 7% of all websites around the world are built using Elementor!

Creativity, friendship, curiosity, motivation, and professionalism are the driving force behind our journey. 

Elementor's constant scaling is an important aspect we take great pride in.

Responsibilities

  • Build software and systems to monitor and track Elementor’s production to ensure maximum availability 
  • Prevent problem recurrence by deploying and maintaining industry-standard solutions and building in-house software
  • Measure and optimize system performance, while advancing our ability to handle upcoming challenges 
  • Lead the analysis process for root cause production incidents 
  • Ensure that key production metrics are transparent and understandable, while continuously updating them based on company needs
  • Work with the Development, DevOps, QA, and Customer Experience teams to align on potential issues and new features 

Requirements

  • 3+ years in hands-on DevOps/SRE/Tier 3 support roles
  • Proven skills in Incident management & Root cause analysis 
  • Software development experience in one or more programming languages (NodeJS, Go, Python, Java, Ruby, JavaScript, etc.)
  • Experience running high-scale, high-availability systems
  • Experience with logging tools (ELK, Splunk, SumoLogic, etc.) 
  • Proficiency working with Kubernetes
  • Proficiency working with Azure / AWS, or GCP
  • A proactive approach to identifying problems, areas in need of improvement, and performance bottlenecks
  • Ability to take 24/7 on-call shift availability for critical incidents
  • Great communication skills - both verbal and written.

Apply for this position