Consider a case where there’s only a single binary outcome: “success”, with probability p “failure”, with probability 1-p constituents \begin{equation} X \sim Bern(p) \end{equation} requirements the probability mass function:

\begin{equation} P(X=k) = \begin{cases} p,\ if\ k=1\\ 1-p,\ if\ k=0\\ \end{cases} \end{equation}

This is sadly not Differentiable, which is sad for Maximum Likelihood Parameter Learning. Therefore, we write:

\begin{equation} P(X=k) = p^{k} (1-p)^{1-k} \end{equation}

Which emulates the behavior of your function at 0 and 1 and we kinda don’t care any other place. We can use it additional information properties of Bernoulli distribution expected value: p variance: p(1-p) Bernoulli as indicator If there’s a series of event whose probability you are given, you can use a Bernoulli to model each one and add/subtract MLE for Bernouli \begin{equation} p_{MLE} = \frac{m}{n} \end{equation} m is the number of events

[[curator]]
I'm the Curator. I can help you navigate, organize, and curate this wiki. What would you like to do?