The Internet Archive (IA) is a non-profit digital library, top 200 website at archive.org, and repository of over 60PB (unique) of digital information running across an integrated cluster of over 1200 VMs on over 700 "bare-metal" physical machines in multiple self-owned and operated data centers -- all serving to advance our goal of “Universal Access to All Knowledge.”
The Web & Data Services team is part of the Web Archiving & Data Services department within IA that provides a range of earned-income services supporting IA’s mission and the mission of our partners. This includes a number of contract and subscription services around web harvesting, content delivery, computational research, web development, digital preservation, and online access to information.
We are looking for a production-oriented, "hands-on" team leader with experience managing a small engineering team and overseeing a variety of services across a group of remote/distributed staff. The ideal candidate will bring a user-oriented approach to technical leadership with a focus on growing services, continuous improvement, technical project management from design to release, ensuring operational resiliency and service delivery, and actively contributing to development. Candidates should also be skilled in management communications and able to work collaboratively with a distributed team of other engineers, managers, and program staff.
Skills and responsibilities
Responsibilities & Duties:
- Manage, contribute to, and oversee the Web & Data Services engineering team, including distributed and contract staff, and take a lead role in building, maintaining, and supporting new and existing services.
- Work directly with Director, managers, and service teams to design, build, release, and expand new features, services, and development projects.
- Oversee and plan the group’s technical infrastructure and work closely with other Engineering Managers and Core Operations teams on hardware allocation, monitoring, and operational maintenance and planning.
- Manage and contribute to the development of software and services through the design, prototype, testing, and production release cycle. This includes establishing procedures that ensure agile, sustainable development and deployment practices.
- Ensure public-facing services meet performance requirements and client expectations and that internal systems remain scalable, efficient, and resilient.
- Hiring and staff management of direct reporting engineering staff to achieve service, department, and organization objectives.
- This job is remote friendly.
- Experience as manager and mentor of an engineering team, ideally one with remote staff.
- Experience in a highly available 24x7 production environment and managing aspects of a large server cluster infrastructure.
- Strong advocate for the end-user experience of web-delivered services and an overall “customer service” mentality.
- Ability to document, communicate, and share critical knowledge with both engineering, product, and management staff.
- Passion for automation, continuous improvement, reporting, and data-driven decision making with experience in open source practices and staying current with trends.
- Work history that includes production-level programming in high-transaction environments.
- Fluency in Linux system administration, Unix shell scripting, and Python, with familiarity in Java, no-SQL databases, is a plus.
- Experience deploying and administering search and web-host services.
- BS Computer Science, or equivalent work experience.
- Comfort working in a loosely-structured environment and juggling many projects.
- Experience with Ansible, Git, Nagios, ELK stack, etc.
- Experience with big data analytics tools and systems, especially Hadoop/HDFS.
- Excellent oral and written communication and documentation skills, including with external, non-technical partners.
- MS in Computer Science or equivalent work experience.
- Flexibility, a sense of humor, and a mission-driven orientation.