Data Science for Global Applications

Course materials for Data Science for Global Applications, by Dr. Anna Haensch and Dr. Karin Knudson

View project on GitHub

Project Check-In: Draft Data Audit

The purpose of this data audit is to gain an understanding of the datasets your are using, and make sure that your data is reusable, and that your work is reproducible. This is also a mechanism for understanding the origins of the data you’re working with, to help you uncover bias. Specifically, your data audit should address the following (adapted from the Mozilla Science Guide (Links to an external site.)):

What What is the title of the dataset? Provide as much context as possible, including a proper citiation for the dataset where possible (you can get more information about citing datasets from the Tufts Library Research GuideLinks to an external site.) Are the any relevant research publications related to the dataset? This can include datasets that were used to create your dataset, or briefs written in support of the dataset.

Who Who is responsible for the data? This can include PIs, research groups, or institutions that contributed to the data collection. This might also include specific contact information for a person who can answer questions about the data. Who else can use this data? If possible, identify the specific license assigned to the dataset. Where Where was the data collected? This can include multiple geographic locations. Where does the data live now?

When When was the data collected? When writing about time, be sure to use ISO format (YYYY-MM-DD hh:mm:ss) and be as specific as possible. What timespan does the data cover?

How How was the data collected? What were the steps and instruments used to collect the data? How was the data processed? What were the steps taking to clean the data, how were null values handled, what sort of pre-processing has been done?

What else? Who else? What or who might be missing in the data? What do you still not know about the process of creating the dataset after answering the questions above? Do you have any other questions or concerns about the data?

The data audit can be submitted as a bulleted list for, but keep in mind that eventually it will be transformed into a narrative form and included as a section of your final paper called “Data Overview.”