Data Science for Global Applications

This course provides an overview of the principles and applications of data science. The first part of the course deals with the acquisition and handling of data, with an emphasis on contextualizing data and framing data analysis. The second part of the course will focus on tools and techniques for data manipulation and visualization. The third part of the course will introduce students to several methods for modeling data. The course will present both theoretical frame-works and practical tools for implementing various algorithms for regression, classification, and clustering. Students will become proficient in Python based tools for data analysis including numpy, pandas, and scikit-learn.

Essential Questions

The organizing questions for the three sections of the course are:

Getting and Cleaning Data
- Where does data come from and how do we make it useful?
- What is data?
- How do the tools we use obscure the data or help us understand the data more deeply?
- What are the human choices involved with analyzing and presenting data?
Visualizing Data and Communicating Findings
- What are the human choices involved with analyzing and presenting data?
- How can data analysis and visualization inform policy? How can it mislead?
- How can we communicate effectively about data (with different audiences)?
- How do the tools we use obscure the data or help us understand the data more deeply?
Modeling Data
- How do we validate modeling choices? Once I build a model, how do I know whether it’s good or not?
- Why do we model data? What can models help us to achieve or learn?
- What kinds of assumptions do we make when we build a model? How can models mislead?

Semester schedule

Course schedule

Materials

Instructions for getting set up with Python, Jupyter and conda on your own computer

Colab notebooks for in-class demos and exercises

Colab notebooks for Python exercises

Readings and journal prompts

Project

License

These materials for Data Science for Global Applications by Karin Knudson and Anna Haensch are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.