bayesian statistics | Arbias Hashani

An Introduction to Bayesian Statistics

October 3, 2013

Consider a coin toss.

We “know” that the probability of getting a heads (and tails) is $\frac{1}{2}$ . We know that coin tosses are independent of each other.

In the language of probability, a coin toss is a Bernoulli random variable with parameter $p=\frac{1}{2}$ of getting heads (or tails).

The probability of a heads (or tails) is very simple. Let $1$ be the outcome for heads and $0$ be the outcome for tails. Then we have the coin toss to be a random variable $X$ with probability (mass) function

$\mathbb{P}(X=x_{|x=0,1}) = \frac{1}{2}^x(1-\frac{1}{2})^{1-x} = \frac{1}{2}$

Then suppose the probability of getting a heads or tails is no longer symmetric (or fair), i.e we have $\frac{1}{2} \mapsto \rho$ . The probability mass function is now

$\mathbb{P}(X=x_{|x=0,1}) = {\rho}^x(1-\rho)^{1-x}$ .

It seems that we are finished.

Actually we have seen everything from a frequentist (statistician’s) view. A Bayesian statistician looks at this very differently. It is the difference between someone who views probability with objectivity and someone who views probability with subjectivity.

How do we know the probability of getting heads is $\rho$ ?

Instead of accepting the probability mass function as it is, we attach another probability to it: the probability of the probability of getting a heads, say $\mathbb{P}(\rho)$ .

The answers the question of why we have to assume the probability of getting a heads is $\frac{1}{2}$ ?

We no longer do. We also no longer assume it is the probability $\rho$ . Our probability mass function now becomes

$\mathbb{P}(X=x_{|x=0,1}) = {\rho}^x(1-\rho)^{1-x}\, \, \mathbb{P}(\rho)$ .

What value can $\mathbb{P}(\rho)$ take? This is the difference in our thinking. The parameter $\rho$ is no longer taken as a constant, but is assumed to have a distribution. We say this is the prior (before) distribution.

Then the random variable $X$ , by definition, is the posterior (after) distribution.

We present the connection between this inference and our usual (frequentist) inference.

Distribution of $\mathbb{P}(\rho)$

Assume the probability of the probability of getting a heads is equal to one. This means it is the probability of heads being equal to $\frac{1}{2}$ is as likely as the probability of heads being equal to $0$ or $1$ or $\frac{3}{4}$ or any value between zero and one.

We are assuming that the probability of getting heads follows the unitary continuous uniform distribution. We have

$\mathbb{P}(\rho) = 1$ , for $0 \leq \rho \leq 1$ .

The mass function is just as before. With no inference, we have the same Bernoulli distribution. Challenge: What happens as we change the probability distribution of $\mathbb{P}(\rho)?$ .