probability of an event is the proportion of times the event occurs in many repeated trials. It is “our belief that an event E occurs”. “the probability of a outcome is a number between 0-1 which highlights how likely the outcome is likely to occur realtive to other outcomes” Frequentist Definition of Probability That is, it is a number between 0-1. Whereby:
“frequentist definition of probability” probability is the ratio between the number of times E occurring n(E) divided by the number of times you did the thing n. This system converge because of the law of large numbers. uncertainty and probability Say you are training some kind of model. When it says 0.8 for motorcycle, its not that there are 80\% chance that there’s a motorcycle there. Its that the model is 80\% confident that there’s a motorcycle. Probability can not only represent the world, but our understanding of the world axiom of probability 0 \leq P(E) \leq 1 P(S) = 1, where S is the sample space if E and F are mutually exclusive, P(E) + P(F) = P(E \cup F) This last axiom can be chained This results in three correlaries: P(E^{C}) = 1- P(E) Proof: We know that E^{C}, E are mutually exclusive.
Now, recall the fact that something happening OR not happening is 1. So we have: P(E \cup F) = P(E) + P(F) - P(E \cap F) if E \subset F, P(E) \leq P(F) conditional probability “What is the new belief that something E happened, conditioned upon the fact that we know that F already happened.” Written as: P(E|F). Furthermore, we have:
In this case, we call Y the “evidence”. this allows us to find “what is the chance of x given y”. We can continue this to develop the probability chain rule:
and so:
and so on. If you are performing the chain rule on something that’s already conditioned:
you can break it up just remembering that A needs to be preserved as a condition, so:
Now:
because this is still a probability over x. law of total probability say you have two variables x, y. “what’s the probablity of x”
a.k.a.:
by applying conditional probability formula upon each term This is because:
If its not conditional, it holds too:
Bayes rule See: Bayes Theorem independence If X and Y are independent (written as X \perp Y), we know that P(x,y) = P(x)P(y) for all x, y. Formally:
if A and B is independent. That is, P(AB) = P(A) \cdot P(B). You can check either of these statements (the latter is usually easier). Independence is bidirectional. If A is independent of B, then B is independent of A. To show this, invoke the Bayes Theorem. This is generalized:
and this tells us that subset of x_{j} is independent against each other.