Senior Site Reliability Engineer (SRE)

Bybit

Bybit

Software Engineering
Dubai - United Arab Emirates
Posted on Wednesday, July 10, 2024
Job description

We are seeking an experienced Senior Site Reliability Engineer (SRE) or Senior Operations Engineer to join our technical team. The successful candidate will be responsible for ensuring the stability, reliability, and high performance of our systems. This role will involve system architecture design, automated operations, troubleshooting, and performance optimization.

Key Responsibilities:

  1. Ensure the stable operation of online systems, guaranteeing high availability.
  2. Design and implement system monitoring, log analysis, and alert systems to quickly respond to and handle online faults.
  3. Optimize system performance, continually improving system stability and scalability.
  4. Design and implement automation tools and processes to enhance operational efficiency.
  5. Work closely with the development team, participating in system architecture design and reviews to ensure maintainability.
  6. Responsible for system capacity planning and performance evaluation, providing system optimization recommendations.
  7. Write and maintain operations documentation and manuals to ensure team members can quickly get up to speed and operate effectively.
  8. Regularly conduct system security assessments, promptly fix security vulnerabilities, and ensure system security.

Job Requirements:

  1. Bachelor's degree or above in Computer Science or a related field, with over five years of relevant work experience.
  2. Proficient in Linux operating systems, familiar with Shell, Python, or other scripting languages.
  3. Extensive experience in system operations and troubleshooting, with the ability to quickly locate and resolve issues.
  4. Familiar with common operations tools such as Ansible, Puppet, Chef, SaltStack, etc.
  5. Knowledgeable about container technology and related ecosystems like Docker, Kubernetes, etc.
  6. Familiar with cloud computing platforms such as AWS, GCP, Azure, with relevant usage experience preferred.
  7. Understanding of network protocols and architecture, with some network troubleshooting skills.
  8. Excellent communication skills and team spirit, able to effectively collaborate with development and testing teams.
  9. Strong sense of responsibility and ability to work under pressure, able to handle urgent situations.
  10. Strong learning ability and technical research spirit, able to quickly master new technologies and apply them to work.
  11. Fluency in English and Mandarin is a must

About Bybit:

Established in March 2018, Bybit is one of the fastest growing cryptocurrency derivatives exchanges, with more than 30 million registered users. We aim to continue revolutionizing the industry by fusing the best of cryptocurrency and traditional finance. Our innovative, highly advanced, user-friendly platform has been designed from the ground-up using best-in-class infrastructure to provide our users with the industry's safest, fastest, fairest, and most transparent trading experience. Built on customer-centric values, we endeavour to provide a professional, 24/7 multi-language customer support to help in a timely manner.

We are seeking an experienced Senior Site Reliability Engineer (SRE) or Senior Operations Engineer to join our technical team. The successful candidate will be responsible for ensuring the stability, reliability, and high performance of our systems. This role will involve system architecture design, automated operations, troubleshooting, and performance optimization.

Key Responsibilities:

  1. Ensure the stable operation of online systems, guaranteeing high availability.
  2. Design and implement system monitoring, log analysis, and alert systems to quickly respond to and handle online faults.
  3. Optimize system performance, continually improving system stability and scalability.
  4. Design and implement automation tools and processes to enhance operational efficiency.
  5. Work closely with the development team, participating in system architecture design and reviews to ensure maintainability.
  6. Responsible for system capacity planning and performance evaluation, providing system optimization recommendations.
  7. Write and maintain operations documentation and manuals to ensure team members can quickly get up to speed and operate effectively.
  8. Regularly conduct system security assessments, promptly fix security vulnerabilities, and ensure system security.

Job Requirements:

  1. Bachelor's degree or above in Computer Science or a related field, with over five years of relevant work experience.
  2. Proficient in Linux operating systems, familiar with Shell, Python, or other scripting languages.
  3. Extensive experience in system operations and troubleshooting, with the ability to quickly locate and resolve issues.
  4. Familiar with common operations tools such as Ansible, Puppet, Chef, SaltStack, etc.
  5. Knowledgeable about container technology and related ecosystems like Docker, Kubernetes, etc.
  6. Familiar with cloud computing platforms such as AWS, GCP, Azure, with relevant usage experience preferred.
  7. Understanding of network protocols and architecture, with some network troubleshooting skills.
  8. Excellent communication skills and team spirit, able to effectively collaborate with development and testing teams.
  9. Strong sense of responsibility and ability to work under pressure, able to handle urgent situations.
  10. Strong learning ability and technical research spirit, able to quickly master new technologies and apply them to work.
  11. Fluency in English and Mandarin is a must

About Bybit:

Established in March 2018, Bybit is one of the fastest growing cryptocurrency derivatives exchanges, with more than 30 million registered users. We aim to continue revolutionizing the industry by fusing the best of cryptocurrency and traditional finance. Our innovative, highly advanced, user-friendly platform has been designed from the ground-up using best-in-class infrastructure to provide our users with the industry's safest, fastest, fairest, and most transparent trading experience. Built on customer-centric values, we endeavour to provide a professional, 24/7 multi-language customer support to help in a timely manner.