Description

You will work on a small team, helping to improve and manage all aspects of our systems, which are deployed on AWS. We make heavy use of tools such as Docker, Packer, Terraform, Jenkins, and Statsd/Graphite/Grafana and we continually evaluate new technologies as they become production-ready for government contexts. 
You'll be responsible for pushing the limits of these critical systems, from Billion User Load Tests to achieving and maintaining sub-millisecond transaction times. By doing this, you’ll help ensure that Nava continues delivering services that millions of Americans depend on. You care deeply about working on technology that affects people’s lives, and are passionate about building and maintaining large-scale systems that are well-designed, fast, scalable, and secure.

Skills and responsibilities

  • Work with fellow Infrastructure Engineers to build and maintain our production infrastructure to ensure ongoing reliability while maximizing development team efficiency
  • Troubleshoot and debug infrastructure, network, and operating system issues
  • Build and maintain operations software which automates the configuration, provisioning, deployment and monitoring of our core systems
  • Manage security systems, linux file system permissions and network firewalls
  • Perform automated deployments to create new or update existing production environments
  • Set up and maintain alarm systems for notifications on error conditions
  • Join our on-call rotation as a first line of defense during production issues

Details

  • Location:
    Washington
    ,
    DC
    We're open to filling this role in DC, SF, NYC or Remote
  • This job is remote friendly.
  • Deadline: n/a

Qualifications

Minimum qualifications

Preferred qualifications

  • Previous experience maintaining a medium or larger scale production system
  • Significant experience in one or more of the following areas: Cloud infrastructure, Unix/Linux, Scripting, or Security
  • Ability to automate procedural tasks using scripting or coding in Python, Javascript, or Ruby
  • A thoughtful, adaptive, and collaborative mindset
  • Understanding of networks, HTTP
  • Ability to use the shell to achieve practical aims. E.g. navigate the shell, SSH into machines, create SSH keys, read log files, move files, and start/stop services, search in files, change permissions
  • Excellent written and verbal communication skills, technical and otherwise
  • Ability to pick up and learn new development and operations skills