Data Science for Global Applications

Course materials for Data Science for Global Applications, by Dr. Anna Haensch and Dr. Karin Knudson

View project on GitHub

Project Assignment and Rubric

You will identify a policy-related question of interest, and use data to help answer or address the question. Your project will involve:

  • identifying relevant data sources and discussing their provenance,
  • bringing together multiple datasets,
  • presenting relevant features of the data with a high quality visual analysis,
  • applying a model, and
  • discussing your analysis and conclusions in writing.

Rubric

Data relevance and integration Effectively integrates two or more data sources that are highly relevant to the policy question at hand.

Exploration of the data and its provenance The source of the data is thoroughly discussed, and unresolved questions about the data collection and/or source are clearly identified.

Visual communication The report uses a variety of different kinds of visualizations that follow principles of effective design and that display and highlight important and relevant features of the data that address the policy question. Moreover, the choice of visualizations are suitable for the aspects of the data being highlighted.

Data modeling The report includes appropriately applied models for regression, classification, and/or clustering. The method and results of the modeling are clearly described, as are the limitations of the analysis.

Written communication The report is clearly written, with written analysis that synthesizes and explains the visualization and modeling aspects of the report, and at an appropriate level of technicality.

Clarity and appropriateness of conclusions The report uses the data to give a great deal of insight related to the chosen policy question. Uncertainty and limitations of the data and analyses in the report are clearly communicated. The report includes enough information about the context of the policy question so that a non-expert can clearly understand the issue and its importance.

Notebook(s) with code Along with the report, student submits well-commented notebook(s) with Python code used for the project.

Presentation Presentation is submitted with appropriate visualization, key takeaways and discussion.

Guidelines for in-class one-slide presentation

Our final project presentations will take place in-class on the 26th and 28th of April. In the interest of time, rather than a full scale presentation, think of this as one-slide “teaser” for your project that’s meant to give us just a small taste of what you’ve done. Each student will be limited to one slide, and your slide should contain:

  • The title of your project (it should be descriptive).
  • Your name
  • Your favorite visualization from your project (it should have a title, nice labeled axes, a legend if appropriate).
  • One main takeaway from your project (this should be a sentence or two highlighting something interesting that you discovered, maybe the type of modeling that you did, a feature that you found surprisingly useful, etc).

You can submit either a single .pptx file or a link to a one-slide Google Slides presentation, and we will put them all together as a single presentation.