Bayesian Statistics

Karin Knudson

Course description

Bayesian statistics provides powerful tools for analyzing data, making inferences, and expressing uncertainty. This course will provide an introduction to a Bayesian perspective on statistics. Students will begin with some basics of probability and Bayes’ Theorem. We will spend the term looking at the far-reaching consequences and applications of this modest theorem as we learn to create and select statistical models, choose appropriate prior distributions, and apply our models to real data. This course will include a computational component with the freely available statistical software R.

Essential Questions

Resources

Grading

Grading will be weighted approximately according to the following percentages: 15% homework, 10% quizzes, 40% tests, 35% projects/presentations/labs.

Outline and Calendar

Test Topics

Final Project Assignment Outline

Goal: The goal of this project is to use Bayesian methods and computation to analyze and derive conclusions about some real data that has meaning to you.

Part 1: Questions
Come up with a question that you care about and that you wish to address. This can be a large question, broader than what you would like to solve with the project. It should be an interesting question - one for which a full answer could have an impact.

Then, narrow this to one or several smaller, related questions that your statistical analysis will allow you to answer.

Part 2: Data
You may gather the data yourself, or use another data source. You will submit your data. Include in your writeup the source of your data, as much as you know about the method of collection, and any concerns you have for the data in terms of bias, etc. Be clear about what variables were measured and how they were measured.

Part 3: Exploratory Data Analysis
Read your data into R, and present numerical and graphical information that summarizes the data. For example, you may wish to report appropriate sample means or standard deviations, or present graphs like histograms or scatterplots. Be very clear about what the information that you include is representing, and carefully label your graphs. The goal here is to get a handle on some basic features of the data set before you delve into a statistical analysis.

Part 4: Data Analysis
Perform an appropriate analysis of your data using Bayesian methods. How exactly this looks will vary widely depending on the kind of data you have and what question you would like to answer. It will be helpful for you to be communicating early and often with me about what methods might be appropriate for your data. If appropriate, you may include related frequentist methods in your analysis.

The methods you use need to go beyond the methods we have used in class. For example, it would be insufficient to simply estimate a proportion by using a beta prior with a binomial likelihood as you did with homework 12 - although such a method could be one piece of your analysis. As another example, performing a simple linear regression as we will explore in class will not be sufficient - although expanding on the ideas of simple linear regression to consider a problem of multiple linear regression could work very well.

Other methods you could explore from a Bayesian perspective include: (multiple) linear regression (including ridge regression and/or the lasso) logistic regression, Poisson processes, classification methods, time series, Gibbs sampling (or other MCMC methods), Markov chains, or natural language processing.

Clearly explain your method. In particular, be sure to justify any prior distributions that you use, with the goal of making them acceptable to a skeptical audience.

Part 5 Results:
Present the results of your Bayesian analysis clearly, and interpret your analysis in the context of the question that you wished to answer.

The Output: You will submit a written report that includes all of the elements above and appropriate graphics. You will also summarize your methods and results in a 10 minute presentation to the class.