# Bayesian Statistics

# Course description

Bayesian statistics provides powerful tools for analyzing data, making inferences, and expressing uncertainty. This course will provide an introduction to a Bayesian perspective on statistics. Students will begin with some basics of probability and Bayes’ Theorem. We will spend the term looking at the far-reaching consequences and applications of this modest theorem as we learn to create and select statistical models, choose appropriate prior distributions, and apply our models to real data. This course will include a computational component with the freely available statistical software R.

# Essential Questions

How is statistical analysis used and misused in society?

How can we mathematically analyze probability?

How do we construct and analyze a Bayesian model?

How can we effectively present, interpret, and evaluate the results of (Bayesian) statistical analysis?

How can we use computers to aid our statistical analysis?

# Resources

An Introduction to Bayesian Statistics, Second Edition, by William Bolstad

Bayesian Data Analysis, Third Edition, by Gelman et al.

Doing Bayesian Data Analysis, Second Edition, by John Kruschke

R statistical software (https://cran.r-project.org)

# Grading

Grading will be weighted approximately according to the following percentages: 15% homework, 10% quizzes, 40% tests, 35% projects/presentations/labs.

# Outline and Calendar

Week 1 - Course expectations, introductions to language of statistics, misuse of statistics, sampling, data collection, R

Day 1 - Introduction, Expectations, Misuse of Statistics examples, HW 1

Day 2 (long block) - Introduction to language of statistics, sampling, HW 2

Day 3 - Introduction to language of statistics, data collection, introduction to R, HW 3

Week 2 - Probability, Bayes’ Theorem, interpretations of Bayes’ Theorem, interpretations of probability, discrete distributions

Day 1 - Probability, Bayes’ theorem, HW 4

Day 2 - Probability, Bayes’ theorem, Interpretations HW 5

Day 3 (long block) - Introduction to language of statistics, data collection, introduction to R, HW 6

Day 4 - Discrete distributions, HW 7

Week 3 - Probability, Bayes theorem, discrete probability distributions, more R

Day 1 - Probability, Bayes theorem, descriptive statistics in R, HW 8

Day 2 - Discrete probability distributions, HW 9

Day 3 (long block) - Discrete distributions

Day 4 - Test 1

Week 4 - Continuous distributions, Bayesian calculations with conjugate distributions - using beta prior with binomial likelihood

Day 1 - Continuous distributions, HW 10

Day 2 - Continuous distributions, HW 11

Day 3 (long block) - Bayesian calculations with beta and binomial distributions, HW 12

Day 4 - Bayesian calculations with beta and binomial distributions, HW 13

Week 5 - Normal distributions, Bayesian credible intervals, hypothesis testing

Day 1 - Bayesian calculations with normally distributed random variables, HW 14

Day 2 (long block) - Bayesian credible intervals, hypothesis testing, HW 15

Week 6 - Test 2, Comparison with frequentist analysis

Day 1 - Review

Day 2 - Test 2

Day 3 (long block) - Comparison with frequentist analysis - p-values and confidence intervals (proportions) HW 15.5 (problems from AP statistics course)

Day 4 - Comparison with frequentist analysis - p-values and confidence intervals, HW 16

Week 7 - Continuous distributions, Bayesian calculations with conjugate distributions

Day 1 - Confidence intervals, Poisson likelihood with Gamma prior, HW 17

Day 2 - Bayesian models with continuous distributions, no HW

Day 3 (long block) - Bayesian models with continuous distributions - difference of means (normally distributed)

Day 4 - Review

Week 8 - Linear regression, Test 3, Projects, Guest Speaker

Day 1 - Linear regression, HW: generate three possible project ideas

Day 2 - Test 3

Day 3 (long block) - Finding and loading data, HW 20

Day 4 - Guest Speaker, HW: work on project

Week 9 - Linear regression, Projects, Guest Speaker

Day 1 - Linear regression and projects, HW: project work

Day 2 - Guest Speaker, HW: project work

Day 3 (long block) - Projects, HW: project work

Day 4 - Projects, HW: project draft

Week 10 - Projects

Day 1 - Projects

Day 2 - Projects

Day 3 (long block) - Projects

Day 4 - Project presentations

Finals block - Final projects due, Project presentations

# Test Topics

Test 1 Topics

Statistics - some uses and basic vocabulary

Identifying misleading uses of statistics

Population

Sample

Statistic

Sampling error

Association

Causation vs. correlation

Confounding

Lurking variable

Bias, response bias

Observational study vs. Experiment

Placebo and placebo effect

Control group and treatment group

Blinding

Introduction to probability and Bayes’ Theorem

Joint probability

Conditional Probability

Intersections and unions

Probability of disjoint (mutually exclusive) events

Independent events

Applications of probability rules (problems from class...)

Discrete random variables

Mean (expected value)

Variance

Joint probabilities

Conditional probabilities

Independent random variables

Binomial random variables (probability distribution, mean, variance, applicability)

Test 2 Topics

Discrete random variables, including...

Binomial random variables

Poisson random variables

Continuous random variables

Probability distribution function

Integrating pdf to find probabilities

Cumulative distribution function

Mean, median, mode of continuous probability distribution functions

Variance and standard deviation of continuous probability distributions

IQR and quantiles

Useful continuous distributions

Uniform distribution

Beta distribution

Gamma distribution

Normal distribution

Exponential distribution

Bayesian calculations with..

Discrete priors

Uniform priors

Beta prior, binomial likelihood

Normal prior, normal likelihood with known variance

Improper priors

Interpreting/summarizing posterior distributions:

Posterior mean, median, mode

Posterior variance, standard deviation

Posterior IQR, quantiles

Bayesian credible interval - equal tails or shortest

Creating null vs. alternative hypothesis

Hypothesis testing

Test 3 Topics

All topics from Test 1 and 2

Comparison of Bayesian and frequentist methods

Null hypothesis significance testing and p-values (frequentist perspective), calculating and interpreting p-values (Chapter 9/12, outside materials)

Frequentist confidence intervals

Interpreting confidence intervals

Constructing a confidence interval for the mean of a normal distribution with known variance

Constructing a confidence interval for a proportion

Poisson likelihood with Gamma prior (calculating the posterior, constructing and interpreting a BCI, hypothesis testing)

Bayesian hypothesis testing

When the null hypothesis is an inequality, we calculate

*P*(*H*_{0}) under the posterior distributionWhen the null hypothesis is equality, we can check if the null value of the parameter is in the BCI, or we can construct a region of practical equivalence (ROPE) and look at the relationship between the ROPE and the BCI.

Difference of means of normal distributions with known common variance

For independent data or paired data

Calculating posterior for

*μ*_{1}−*μ*_{2}, using it to construct a BCI, perform hypothesis testing, etc.

# Final Project Assignment Outline

**Goal**: The goal of this project is to use Bayesian methods and computation to analyze and derive conclusions about some real data that has meaning to you.

**Part 1: Questions**

Come up with a question that you care about and that you wish to address. This can be a large question, broader than what you would like to solve with the project. It should be an interesting question - one for which a full answer could have an impact.

Then, narrow this to one or several smaller, related questions that your statistical analysis will allow you to answer.

**Part 2: Data**

You may gather the data yourself, or use another data source. You will submit your data. Include in your writeup the source of your data, as much as you know about the method of collection, and any concerns you have for the data in terms of bias, etc. Be clear about what variables were measured and how they were measured.

**Part 3: Exploratory Data Analysis**

Read your data into R, and present numerical and graphical information that summarizes the data. For example, you may wish to report appropriate sample means or standard deviations, or present graphs like histograms or scatterplots. Be very clear about what the information that you include is representing, and carefully label your graphs. The goal here is to get a handle on some basic features of the data set before you delve into a statistical analysis.

**Part 4: Data Analysis**

Perform an appropriate analysis of your data using Bayesian methods. How exactly this looks will vary widely depending on the kind of data you have and what question you would like to answer. It will be helpful for you to be communicating early and often with me about what methods might be appropriate for your data. If appropriate, you may include related frequentist methods in your analysis.

The methods you use need to go beyond the methods we have used in class. For example, it would be insufficient to simply estimate a proportion by using a beta prior with a binomial likelihood as you did with homework 12 - although such a method could be one piece of your analysis. As another example, performing a simple linear regression as we will explore in class will not be sufficient - although expanding on the ideas of simple linear regression to consider a problem of multiple linear regression could work very well.

Other methods you could explore from a Bayesian perspective include: (multiple) linear regression (including ridge regression and/or the lasso) logistic regression, Poisson processes, classification methods, time series, Gibbs sampling (or other MCMC methods), Markov chains, or natural language processing.

Clearly explain your method. In particular, be sure to justify any prior distributions that you use, with the goal of making them acceptable to a skeptical audience.

**Part 5 Results:**

Present the results of your Bayesian analysis clearly, and interpret your analysis in the context of the question that you wished to answer.

**The Output**: You will submit a written report that includes all of the elements above and appropriate graphics. You will also summarize your methods and results in a 10 minute presentation to the class.