AI Tools Site Reliability Engineer - Graphic

MyShell

MyShell

Software Engineering, Data Science
United States · Remote
Posted on Wednesday, June 26, 2024

About MyShell

MyShell is revolutionizing the AI landscape by building an open ecosystem for AI-native apps. Our powerful platform and intuitive toolkit empower anyone to create, access, and benefit from AI-powered applications. Launched in April 2023, MyShell has quickly gained global traction, attracting a diverse community of creators and users.

Our team of talented individuals from top institutions like MIT, Princeton, and Oxford is committed to fostering innovation in a supportive and transparent work environment. With funding from leading VCs, MyShell is poised to reshape the future of AI, making it accessible and integral to everyone's daily life. Join us on this thrilling journey as we redefine what's possible with AI.

About the Role

We are seeking an experienced Site Reliability Engineer to join our team and manage AI tools like ControlNet, ensuring efficient, stable operation and continuous system performance optimization.

Responsibilities:

  • System Maintenance & Monitoring: Oversee daily operations of AI tools, including server, database, and network maintenance. Monitor system performance and address issues promptly.
  • Deployment & Release: Manage deployment and version releases of AI tools. Implement CI/CD processes for automated deployments.
  • Troubleshooting: Address and resolve issues in AI tools, analyze logs and monitoring data to find root causes, and propose solutions.
  • Performance Optimization: Enhance deployment architecture, improve efficiency and stability, and implement performance tuning strategies.
  • Security Management: Ensure tool security, conduct regular assessments, fix vulnerabilities, and implement data protection and backup strategies.
  • Collaboration & Documentation: Work closely with development teams, contribute to system design and optimization, and maintain operations documentation.

Qualifications:

  • Education: Bachelor's degree or higher in Computer Science, Software Engineering, or related fields.
  • Experience: 3+ years in operations engineering or related roles, with a preference for AI tools experience.
  • Skills: Proficiency in Linux, monitoring tools (e.g., Prometheus, Grafana, ELK), automation tools (e.g., Ansible, Puppet, Chef), scripting (e.g., Python, Shell), cloud platforms (e.g., AWS, Azure, GCP), and AI tools (e.g., ControlNet).
  • Other: Strong communication, teamwork, analytical and problem-solving skills. Ability to work under pressure, strong sense of responsibility, and a proactive attitude towards continuous learning.

Plus Points

  • Exceptional problem-solving abilities and strong communication skills.
  • Experience with AI or machine learning technologies and their integration into backend systems.
  • Contributions to open-source projects or a strong presence in the developer community.
  • Prior experience in a fast-paced startup environment.

What We Offer

  • Competitive salary and equity package, commensurate with experience and location.
  • Flexible working hours and a fully remote work environment, with the ability to collaborate effectively across time zones.
  • A dynamic and collaborative work environment that fosters innovation, growth, and professional development.
  • The opportunity to work on cutting-edge technologies and help shape the future of AI, transforming industries and making a global impact.