Pew Research Center is a nonpartisan fact tank that informs the public about the issues, attitudes and trends shaping America and the world. It conducts public opinion polling, demographic research, media content analysis and other empirical social science research in the areas of U.S. politics and policy views; media and journalism; internet and technology; religion and public life; Hispanic trends; global attitudes and U.S. social and demographic trends. The Center does not take policy positions. It is a subsidiary of The Pew Charitable Trusts. Pew Research Center’s work is carried out by a staff of 160.
In response to the changing nature of data on human behaviors and attitudes, the Center created Data Labs in 2015 to explore the emerging field of computational social science (CSS), which applies data science methods to social science research questions. Pew Research Center Labs uses cutting edge data science and computational methods to contribute to our ongoing research in our key areas: politics; religion; journalism; science and technology, Hispanics; social trends; and global attitudes.
The Computational Social Scientist should be eager to be a part of a team using new methods to contribute to the public good. He/she should have a computational and social science background. The Data Scientist will use data and modeling in creative ways to pull the bottom line from complex datasets. He/she should be adaptable and comfortable trying out new approaches and languages.
The Data Scientist will contribute on all aspects of a wide range of CSS research projects. This includes development, original research and writing – and managing involvement in a number of different projects at once.
- Proposes and executes computational social scientific data analysis
- Manages defined CSS projects, collaborating on data collection, processing, validation, exploratory analysis, and production research.
- Helps select and scope CSS projects and applies appropriate research designs.
- Regularly writes and edits reports, Decoded posts, and Fact Tank posts.
- Reviews pull requests and conducts data and number checks.
- Leads internal CSS code classes and workshops and gives external presentations for research peers, journalists, and decision-makers.
- Stays abreast of trends in data science, new kinds of data sources and methodologies
Skills and responsibilities
- Proficiency in R (including ggplot2, tidyverse/dplyr) required
- Proficiency in Python (including Pandas, Scikit-learn, SciPy + NumPy) is required
- Familiarity with Natural Language Processing (preprocessing, term-document matrix representation, named entity recognition, POS taggers/parsers/etc.) preferred
- Familiarity working with image data, OpenCV, and/or convolutional neural networks for machine learning (e.g., Caffe) preferred
- Experience writing about CSS research for both technical and general audiences preferred
- Experience using crowd-sourcing (e.g., Mechanical Turk) to gather or make data preferred
- Experience scraping unstructured data from the web preferred
- Experience compiling and/or using network data preferred
- Experience with SQL, Mongo (or other NoSQL DB), Hadoop/Hive/Spark/Pig/etc.) preferred
- Deadline: n/a
- BA required, advanced degree preferred.
- 5-9 years of research experience, with at least 3-5 years of specialized research and analysis experience expected. Often includes significant graduate training at the PhD level or equivalent experience in an applied setting.
- Experience interacting with web APIs, working with JSON data, and utilizing regex required.
- Experience working with very large data sets (data too large to fit into memory) required.
- Experience with machine learning (e.g., SVM, Random Forests, GBRT/GBDT, ensemble methods, etc.) required.