Bayesian Statistics
Course description
Bayesian statistics provides powerful tools for analyzing data, making inferences, and expressing uncertainty. This course will provide an introduction to a Bayesian perspective on statistics. Students will begin with some basics of probability and Bayes’ Theorem. We will spend the term looking at the far-reaching consequences and applications of this modest theorem as we learn to create and select statistical models, choose appropriate prior distributions, and apply our models to real data. This course will include a computational component with the freely available statistical software R.
Essential Questions
How is statistical analysis used and misused in society?
How can we mathematically analyze probability?
How do we construct and analyze a Bayesian model?
How can we effectively present, interpret, and evaluate the results of (Bayesian) statistical analysis?
How can we use computers to aid our statistical analysis?
Resources
An Introduction to Bayesian Statistics, Second Edition, by William Bolstad
Bayesian Data Analysis, Third Edition, by Gelman et al.
Doing Bayesian Data Analysis, Second Edition, by John Kruschke
R statistical software (https://cran.r-project.org)
Grading
Grading will be weighted approximately according to the following percentages: 15% homework, 10% quizzes, 40% tests, 35% projects/presentations/labs.
Outline and Calendar
Week 1 - Course expectations, introductions to language of statistics, misuse of statistics, sampling, data collection, R
Day 1 - Introduction, Expectations, Misuse of Statistics examples, HW 1
Day 2 (long block) - Introduction to language of statistics, sampling, HW 2
Day 3 - Introduction to language of statistics, data collection, introduction to R, HW 3
Week 2 - Probability, Bayes’ Theorem, interpretations of Bayes’ Theorem, interpretations of probability, discrete distributions
Day 1 - Probability, Bayes’ theorem, HW 4
Day 2 - Probability, Bayes’ theorem, Interpretations HW 5
Day 3 (long block) - Introduction to language of statistics, data collection, introduction to R, HW 6
Day 4 - Discrete distributions, HW 7
Week 3 - Probability, Bayes theorem, discrete probability distributions, more R
Day 1 - Probability, Bayes theorem, descriptive statistics in R, HW 8
Day 2 - Discrete probability distributions, HW 9
Day 3 (long block) - Discrete distributions
Day 4 - Test 1
Week 4 - Continuous distributions, Bayesian calculations with conjugate distributions - using beta prior with binomial likelihood
Day 1 - Continuous distributions, HW 10
Day 2 - Continuous distributions, HW 11
Day 3 (long block) - Bayesian calculations with beta and binomial distributions, HW 12
Day 4 - Bayesian calculations with beta and binomial distributions, HW 13
Week 5 - Normal distributions, Bayesian credible intervals, hypothesis testing
Day 1 - Bayesian calculations with normally distributed random variables, HW 14
Day 2 (long block) - Bayesian credible intervals, hypothesis testing, HW 15
Week 6 - Test 2, Comparison with frequentist analysis
Day 1 - Review
Day 2 - Test 2
Day 3 (long block) - Comparison with frequentist analysis - p-values and confidence intervals (proportions) HW 15.5 (problems from AP statistics course)
Day 4 - Comparison with frequentist analysis - p-values and confidence intervals, HW 16
Week 7 - Continuous distributions, Bayesian calculations with conjugate distributions
Day 1 - Confidence intervals, Poisson likelihood with Gamma prior, HW 17
Day 2 - Bayesian models with continuous distributions, no HW
Day 3 (long block) - Bayesian models with continuous distributions - difference of means (normally distributed)
Day 4 - Review
Week 8 - Linear regression, Test 3, Projects, Guest Speaker
Day 1 - Linear regression, HW: generate three possible project ideas
Day 2 - Test 3
Day 3 (long block) - Finding and loading data, HW 20
Day 4 - Guest Speaker, HW: work on project
Week 9 - Linear regression, Projects, Guest Speaker
Day 1 - Linear regression and projects, HW: project work
Day 2 - Guest Speaker, HW: project work
Day 3 (long block) - Projects, HW: project work
Day 4 - Projects, HW: project draft
Week 10 - Projects
Day 1 - Projects
Day 2 - Projects
Day 3 (long block) - Projects
Day 4 - Project presentations
Finals block - Final projects due, Project presentations
Test Topics
Test 1 Topics
Statistics - some uses and basic vocabulary
Identifying misleading uses of statistics
Population
Sample
Statistic
Sampling error
Association
Causation vs. correlation
Confounding
Lurking variable
Bias, response bias
Observational study vs. Experiment
Placebo and placebo effect
Control group and treatment group
Blinding
Introduction to probability and Bayes’ Theorem
Joint probability
Conditional Probability
Intersections and unions
Probability of disjoint (mutually exclusive) events
Independent events
Applications of probability rules (problems from class...)
Discrete random variables
Mean (expected value)
Variance
Joint probabilities
Conditional probabilities
Independent random variables
Binomial random variables (probability distribution, mean, variance, applicability)
Test 2 Topics
Discrete random variables, including...
Binomial random variables
Poisson random variables
Continuous random variables
Probability distribution function
Integrating pdf to find probabilities
Cumulative distribution function
Mean, median, mode of continuous probability distribution functions
Variance and standard deviation of continuous probability distributions
IQR and quantiles
Useful continuous distributions
Uniform distribution
Beta distribution
Gamma distribution
Normal distribution
Exponential distribution
Bayesian calculations with..
Discrete priors
Uniform priors
Beta prior, binomial likelihood
Normal prior, normal likelihood with known variance
Improper priors
Interpreting/summarizing posterior distributions:
Posterior mean, median, mode
Posterior variance, standard deviation
Posterior IQR, quantiles
Bayesian credible interval - equal tails or shortest
Creating null vs. alternative hypothesis
Hypothesis testing
Test 3 Topics
All topics from Test 1 and 2
Comparison of Bayesian and frequentist methods
Null hypothesis significance testing and p-values (frequentist perspective), calculating and interpreting p-values (Chapter 9/12, outside materials)
Frequentist confidence intervals
Interpreting confidence intervals
Constructing a confidence interval for the mean of a normal distribution with known variance
Constructing a confidence interval for a proportion
Poisson likelihood with Gamma prior (calculating the posterior, constructing and interpreting a BCI, hypothesis testing)
Bayesian hypothesis testing
When the null hypothesis is an inequality, we calculate P(H0) under the posterior distribution
When the null hypothesis is equality, we can check if the null value of the parameter is in the BCI, or we can construct a region of practical equivalence (ROPE) and look at the relationship between the ROPE and the BCI.
Difference of means of normal distributions with known common variance
For independent data or paired data
Calculating posterior for μ1 − μ2, using it to construct a BCI, perform hypothesis testing, etc.
Final Project Assignment Outline
Goal: The goal of this project is to use Bayesian methods and computation to analyze and derive conclusions about some real data that has meaning to you.
Part 1: Questions
Come up with a question that you care about and that you wish to address. This can be a large question, broader than what you would like to solve with the project. It should be an interesting question - one for which a full answer could have an impact.
Then, narrow this to one or several smaller, related questions that your statistical analysis will allow you to answer.
Part 2: Data
You may gather the data yourself, or use another data source. You will submit your data. Include in your writeup the source of your data, as much as you know about the method of collection, and any concerns you have for the data in terms of bias, etc. Be clear about what variables were measured and how they were measured.
Part 3: Exploratory Data Analysis
Read your data into R, and present numerical and graphical information that summarizes the data. For example, you may wish to report appropriate sample means or standard deviations, or present graphs like histograms or scatterplots. Be very clear about what the information that you include is representing, and carefully label your graphs. The goal here is to get a handle on some basic features of the data set before you delve into a statistical analysis.
Part 4: Data Analysis
Perform an appropriate analysis of your data using Bayesian methods. How exactly this looks will vary widely depending on the kind of data you have and what question you would like to answer. It will be helpful for you to be communicating early and often with me about what methods might be appropriate for your data. If appropriate, you may include related frequentist methods in your analysis.
The methods you use need to go beyond the methods we have used in class. For example, it would be insufficient to simply estimate a proportion by using a beta prior with a binomial likelihood as you did with homework 12 - although such a method could be one piece of your analysis. As another example, performing a simple linear regression as we will explore in class will not be sufficient - although expanding on the ideas of simple linear regression to consider a problem of multiple linear regression could work very well.
Other methods you could explore from a Bayesian perspective include: (multiple) linear regression (including ridge regression and/or the lasso) logistic regression, Poisson processes, classification methods, time series, Gibbs sampling (or other MCMC methods), Markov chains, or natural language processing.
Clearly explain your method. In particular, be sure to justify any prior distributions that you use, with the goal of making them acceptable to a skeptical audience.
Part 5 Results:
Present the results of your Bayesian analysis clearly, and interpret your analysis in the context of the question that you wished to answer.
The Output: You will submit a written report that includes all of the elements above and appropriate graphics. You will also summarize your methods and results in a 10 minute presentation to the class.