Online media is in a state of flux. Twitter, Facebook, Gab, blogs and so-called fake news are all developments that have radically altered the landscape of news and information online. The Media Cloud project was created to track and understand this online media ecosystem. Come help us build data-centric tools for academic researchers and non-profits that let them investigate and track how speech moves across the internet.
Media Cloud is a joint project between UMass-Amherst, Northeastern University, and the Berkman Klein Center for Internet & Society at Harvard University. This position is with Media Cloud’s nonprofit arm, the Media Ecosystems Analysis Group, and you will work closely with members of the team from all centers.
We are a diverse and welcoming community of researchers and technologists who love to engage with hard questions about online media by using a combination of social, computer, and data sciences. You will work with all members of our small team, from senior faculty to junior developers, and thrive in an academic atmosphere that encourages experimentation, constant questioning, and validation at all levels of our platform.
Much of our substantive work focuses on issues of online hate-speech, race, democracy, and health. We strongly encourage women, people of color, and people of any sexual identity to apply.
Our entire team is remote, with team members working all around the world, and we welcome remote workers.
Our upcoming technical roadmap includes ingesting new platforms into our data pipeline, analyzing images from news stories, and incorporating new sources of audience/readership data, as well as ongoing updates to improve the scalability, performance, and reliability of our existing pipeline.
Skills and responsibilities
- work on our server architecture, which collects and processes and allows researchers to analyze these stories via an API; you will approximately spend half your time planning, designing, building and the other half, maintaining and running the project's data pipeline;
- work with senior engineers to establish a technical vision for the project;
- contribute to and follow a technical roadmap to meet research needs and to complete grant deliverables;
- collaborate with other developers, designers, and system administrators in implementing technical roadmap;
- accurately communicate project status internally and externally to our community of users;
- maintain, upgrade and build systems within an existing (rather large) codebase to collect, archive, and analyze content from online media;
- write code that can scale systems to handle ever-expanding data requirements.
- This job is remote friendly.
- Deadline: n/a
- college degree or other domain-specific accreditation, preferably in computer science or data science related field;
- at least two years experience working as a software engineer on big data systems;
- programming fluency — Python required;
- some experience with Linux;
- demonstrated ability to design, build, test, and deploy robust code;
- demonstrated ability to iterate quickly through prototypes;
- demonstrated ability to use data to validate architectural decisions;
- ability to work productively in a virtual environment with remote team members all over the world;
- interest in working on issues related to hate-speech, democracy, gender, race, or health.
- experience implementing and maintaining a production ETL pipeline;
- experience scaling platforms to handle large data sets;
- experience writing web crawlers or API scrapers;
- experience writing, maintaining, and optimizing SQL queries against databases;
- experience working with PostgreSQL and Solr / Lucene in Ubuntu environments;
- experience working with text-based data system (ie. NLP);
- experience working in a modern dev / systems environment including git and docker.