Beta, Dirichlet and the Dirichlet Process
My reference notebook on the beta and Dirichlet distribution and the Dirichlet Process
- Introduction
- Beta Distribution
- Dirchclet Distribution
- Dirichlet Process
- Helper Functions
- Plot for the Blog Post
- Sources
- References
!pip install gif
from scipy.stats import beta
from IPython.display import Image
Image(url='https://upload.wikimedia.org/wikipedia/commons/7/78/PDF_of_the_Beta_distribution.gif?1602925483101')
Introduction
This post is dedicated to the named distribtions. They are important in Baysian statistics because we can imagine draws from these distribtions to be distributions themselfs. This property makes them useful as priors. As of today (17.10.2020) everything is heavily based on this blog post https://karinknudson.com/dirichletprocesses.html and her interview in the learn bayes podcast. In the future I will add all the information I find from the papers linked in the sources section.
Beta Distribution
Draws from the beta distribution are most commonlky values in $(0, 1)$ that means we can interpret each draw from the beta distribution as a "generation" of a bernoulli random variable with sucess probability equal to the draw from the beta distribution. In the baysian woirld it makes sense to use the beta distribution as prior for bernoulli/binomial distributed data as the beta is the conjugate prior for these distributions. In this case the beta distribution even has a nice interpretation. Say someone gives us a coin but doesnt tell us if its fair or not. Lets say we trust this person is trustworth and its most likely that he gave us a fair coin, but we are not complety sure. We can describe our uncertany as a beta distribution with parameters $a=5$ and $b= 5$.
x = np.linspace(0,1, 100)
y = beta.pdf(x, 5, 5)
plt.plot(x, y, label = "Beta(5, 5)")
plt.legend();
Now we flip the coin n times (binomial distribution), one can show that the posterior distribution for the sucess prob of the coin is proportionalk equalk to
$x^{(a+k)-1}(1-x)^{(b+n-k)-1}$
which again is a beta distribution. $k$ is the number of sucesses. So $n-k$ is hte number of failures. Then we can think of $a$ and $b$ as pseudocounts as they have the same effect as setting $a^*=a+1$ is the same as seeing a sucess in the actual flipping of the coin
frames
frames[5]