Bayesian Statistics

Karin Knudson

Course description

Bayesian statistics provides powerful tools for analyzing data, making inferences, and expressing uncertainty. This course will provide an introduction to a Bayesian perspective on statistics. Students will begin with some basics of probability and Bayes’ Theorem. We will spend the term looking at the far-reaching consequences and applications of this modest theorem as we learn to create and select statistical models, choose appropriate prior distributions, and apply our models to real data. This course will include a computational component with the freely available statistical software R.

Essential Questions

How is statistical analysis used and misused in society?
How can we mathematically analyze probability?
How do we construct and analyze a Bayesian model?
How can we effectively present, interpret, and evaluate the results of (Bayesian) statistical analysis?
How can we use computers to aid our statistical analysis?

Resources

An Introduction to Bayesian Statistics, Second Edition, by William Bolstad
Bayesian Data Analysis, Third Edition, by Gelman et al.
Doing Bayesian Data Analysis, Second Edition, by John Kruschke
R statistical software (https://cran.r-project.org)

Grading

Grading will be weighted approximately according to the following percentages: 15% homework, 10% quizzes, 40% tests, 35% projects/presentations/labs.

Outline and Calendar

Week 1 - Course expectations, introductions to language of statistics, misuse of statistics, sampling, data collection, R
- Day 1 - Introduction, Expectations, Misuse of Statistics examples, HW 1
- Day 2 (long block) - Introduction to language of statistics, sampling, HW 2
- Day 3 - Introduction to language of statistics, data collection, introduction to R, HW 3
Week 2 - Probability, Bayes’ Theorem, interpretations of Bayes’ Theorem, interpretations of probability, discrete distributions
- Day 1 - Probability, Bayes’ theorem, HW 4
- Day 2 - Probability, Bayes’ theorem, Interpretations HW 5
- Day 3 (long block) - Introduction to language of statistics, data collection, introduction to R, HW 6
- Day 4 - Discrete distributions, HW 7
Week 3 - Probability, Bayes theorem, discrete probability distributions, more R
- Day 1 - Probability, Bayes theorem, descriptive statistics in R, HW 8
- Day 2 - Discrete probability distributions, HW 9
- Day 3 (long block) - Discrete distributions
- Day 4 - Test 1
Week 4 - Continuous distributions, Bayesian calculations with conjugate distributions - using beta prior with binomial likelihood
- Day 1 - Continuous distributions, HW 10
- Day 2 - Continuous distributions, HW 11
- Day 3 (long block) - Bayesian calculations with beta and binomial distributions, HW 12
- Day 4 - Bayesian calculations with beta and binomial distributions, HW 13
Week 5 - Normal distributions, Bayesian credible intervals, hypothesis testing
- Day 1 - Bayesian calculations with normally distributed random variables, HW 14
- Day 2 (long block) - Bayesian credible intervals, hypothesis testing, HW 15
Week 6 - Test 2, Comparison with frequentist analysis
- Day 1 - Review
- Day 2 - Test 2
- Day 3 (long block) - Comparison with frequentist analysis - p-values and confidence intervals (proportions) HW 15.5 (problems from AP statistics course)
- Day 4 - Comparison with frequentist analysis - p-values and confidence intervals, HW 16
Week 7 - Continuous distributions, Bayesian calculations with conjugate distributions
- Day 1 - Confidence intervals, Poisson likelihood with Gamma prior, HW 17
- Day 2 - Bayesian models with continuous distributions, no HW
- Day 3 (long block) - Bayesian models with continuous distributions - difference of means (normally distributed)
- Day 4 - Review
Week 8 - Linear regression, Test 3, Projects, Guest Speaker
- Day 1 - Linear regression, HW: generate three possible project ideas
- Day 2 - Test 3
- Day 3 (long block) - Finding and loading data, HW 20
- Day 4 - Guest Speaker, HW: work on project
Week 9 - Linear regression, Projects, Guest Speaker
- Day 1 - Linear regression and projects, HW: project work
- Day 2 - Guest Speaker, HW: project work
- Day 3 (long block) - Projects, HW: project work
- Day 4 - Projects, HW: project draft
Week 10 - Projects
- Day 1 - Projects
- Day 2 - Projects
- Day 3 (long block) - Projects
- Day 4 - Project presentations
- Finals block - Final projects due, Project presentations

Test Topics

Test 1 Topics
- Statistics - some uses and basic vocabulary
  - Identifying misleading uses of statistics
  - Population
  - Sample
  - Statistic
  - Sampling error
  - Association
  - Causation vs. correlation
  - Confounding
  - Lurking variable
  - Bias, response bias
  - Observational study vs. Experiment
  - Placebo and placebo effect
  - Control group and treatment group
  - Blinding
- Introduction to probability and Bayes’ Theorem
  - Joint probability
  - Conditional Probability
  - Intersections and unions
  - Probability of disjoint (mutually exclusive) events
  - Independent events
  - Applications of probability rules (problems from class...)
- Discrete random variables
  - Mean (expected value)
  - Variance
  - Joint probabilities
  - Conditional probabilities
  - Independent random variables
  - Binomial random variables (probability distribution, mean, variance, applicability)
Test 2 Topics
- Discrete random variables, including...
  - Binomial random variables
  - Poisson random variables
- Continuous random variables
  - Probability distribution function
  - Integrating pdf to find probabilities
  - Cumulative distribution function
  - Mean, median, mode of continuous probability distribution functions
  - Variance and standard deviation of continuous probability distributions
  - IQR and quantiles
- Useful continuous distributions
  - Uniform distribution
  - Beta distribution
  - Gamma distribution
  - Normal distribution
  - Exponential distribution
- Bayesian calculations with..
  - Discrete priors
  - Uniform priors
  - Beta prior, binomial likelihood
  - Normal prior, normal likelihood with known variance
  - Improper priors
- Interpreting/summarizing posterior distributions:
  - Posterior mean, median, mode
  - Posterior variance, standard deviation
  - Posterior IQR, quantiles
  - Bayesian credible interval - equal tails or shortest
  - Creating null vs. alternative hypothesis
  - Hypothesis testing
Test 3 Topics
- All topics from Test 1 and 2
- Comparison of Bayesian and frequentist methods
- Null hypothesis significance testing and p-values (frequentist perspective), calculating and interpreting p-values (Chapter 9/12, outside materials)
- Frequentist confidence intervals
  - Interpreting confidence intervals
  - Constructing a confidence interval for the mean of a normal distribution with known variance
  - Constructing a confidence interval for a proportion
- Poisson likelihood with Gamma prior (calculating the posterior, constructing and interpreting a BCI, hypothesis testing)
- Bayesian hypothesis testing
  - When the null hypothesis is an inequality, we calculate P(H₀) under the posterior distribution
  - When the null hypothesis is equality, we can check if the null value of the parameter is in the BCI, or we can construct a region of practical equivalence (ROPE) and look at the relationship between the ROPE and the BCI.
- Difference of means of normal distributions with known common variance
  - For independent data or paired data
  - Calculating posterior for μ₁ − μ₂, using it to construct a BCI, perform hypothesis testing, etc.

Final Project Assignment Outline

Goal: The goal of this project is to use Bayesian methods and computation to analyze and derive conclusions about some real data that has meaning to you.

Part 1: Questions
Come up with a question that you care about and that you wish to address. This can be a large question, broader than what you would like to solve with the project. It should be an interesting question - one for which a full answer could have an impact.

Then, narrow this to one or several smaller, related questions that your statistical analysis will allow you to answer.

Part 2: Data
You may gather the data yourself, or use another data source. You will submit your data. Include in your writeup the source of your data, as much as you know about the method of collection, and any concerns you have for the data in terms of bias, etc. Be clear about what variables were measured and how they were measured.

Part 3: Exploratory Data Analysis
Read your data into R, and present numerical and graphical information that summarizes the data. For example, you may wish to report appropriate sample means or standard deviations, or present graphs like histograms or scatterplots. Be very clear about what the information that you include is representing, and carefully label your graphs. The goal here is to get a handle on some basic features of the data set before you delve into a statistical analysis.

Part 4: Data Analysis
Perform an appropriate analysis of your data using Bayesian methods. How exactly this looks will vary widely depending on the kind of data you have and what question you would like to answer. It will be helpful for you to be communicating early and often with me about what methods might be appropriate for your data. If appropriate, you may include related frequentist methods in your analysis.

The methods you use need to go beyond the methods we have used in class. For example, it would be insufficient to simply estimate a proportion by using a beta prior with a binomial likelihood as you did with homework 12 - although such a method could be one piece of your analysis. As another example, performing a simple linear regression as we will explore in class will not be sufficient - although expanding on the ideas of simple linear regression to consider a problem of multiple linear regression could work very well.

Other methods you could explore from a Bayesian perspective include: (multiple) linear regression (including ridge regression and/or the lasso) logistic regression, Poisson processes, classification methods, time series, Gibbs sampling (or other MCMC methods), Markov chains, or natural language processing.

Clearly explain your method. In particular, be sure to justify any prior distributions that you use, with the goal of making them acceptable to a skeptical audience.

Part 5 Results:
Present the results of your Bayesian analysis clearly, and interpret your analysis in the context of the question that you wished to answer.

The Output: You will submit a written report that includes all of the elements above and appropriate graphics. You will also summarize your methods and results in a 10 minute presentation to the class.