The Internet Archive (IA) is a non-profit digital library, top 200 website at, and repository of over 60PB (unique) of digital information running across an integrated cluster of over 1200 VMs on over 700 "bare-metal" physical machines in multiple self-owned and operated data centers -- all serving to advance our goal of “Universal Access to All Knowledge.” We are seeking an Engineering Manager to help grow our suite of services for collecting, preserving, and providing access to the massive trove of historically-important data now published on the web while at the same time working in partnership with a global set of institutions to provide web, data, access, research, and preservation services to users.

The Archive-It team is responsible for maintaining a web application which automates high quality captures of content from the web. The successful candidate will work in the Archive-It Group in support of building and maintaining high quality software for the collection, preservation, and accessibility of web content. 

Reporting Structure: The Senior Web Developer reports to the Engineering Manager for Archive-It and works closely with other departments. The position works alongside other web archiving engineers as well as program staff in Web Archiving & Data Services Group and with the broader Internet Archive infrastructure and engineering teams.

The role will help design and implement the future of a toolset and APIs which automate web capture using open source technologies and platforms. An ideal candidate is interested in developing harvest techniques and tools to enable archival capture and re-rendering of rich media, streaming content, social media, as well as traditional web page content. This role contributes to defining deployment  architectures and workflows, managing data at scale, and monitoring production systems.

Skills and responsibilities

An ideal candidate demonstrates independence and initiative, is a problem solver, works well autonomously, has deep experience on the Unix/Linux command line and broad experience in systems architecture. Additionally, the ideal candidate is open to helping advance the state of preserving web-published content, working on the platform which drives a large portion of global web capture.

Essential Job Functions:
  • Contending with the complexity of a suite of tools that capture web content accurately at the micro and global scale with equal accuracy
  • Configuration, maintenance and improvement of web crawling tools
  • Contribute to the development of a distributed python-based database used for crawl material deduplication, analysis and reporting.
  • Demonstrated experience of delivering on commitments with deadlines and project time lines and working in a collaborative team of engineers and project/product managers.


  • Location:
    San Francisco
  • This job is remote friendly.
  • Deadline: n/a


Minimum qualifications

  • Strong experience in Unix shell scripting and Python coding required
  • Strong experience with python, bash, java, and C-based debugging tools strongly preferred
  • Solid experience in Internet protocols (HTTP is must.) Strong knowledge of HTML, JavaScript and Web technologies in general
  • Knowledge of building and deploying web applications, databases, web-host services, and Linux system administration
  • Ability to work in, and enjoy, a loosely structured work environment

Preferred qualifications

  • Cluster computing experience is preferred, especially familiarity with Hadoop and related technologies and tools
  • Experience working with Javascript and HTML in a large-scale application preferred
  • Experience or familiarity with Java preferred
  • Experience with applications designed to display archived web content
  • Experience with development environments and system monitoring/administration tools
  • Experience with open source practices, version control, and code review
  • Experience with Atlassian tool sets
  • Flexibility and a sense of humor are a plus
  • Requirements: Bachelor's Degree in Computer Science or a related field, five years of progressively responsible experience in software development.