Entropy

Sun 15 March 2020


Entropy tells us the theoretical minimum average encoding size for events that follow a particular probability distribution.

Helps measure uncertainty.

N different values expressed in bits is $ \log_2(N)$

Sum of probability * number of bit required to represent the value.

(0.5 x 1 bit)+(0.25 x 2 bits)+(0.125 x 3 bits)+(0.125 x 3 bits)=1.75 bits

Entropy gives us the average encoding size of events p over distribution p.

Continuous

\begin{aligned} H(p) &= - \int\limits_{X} p(x) \log p(x) dx \end{aligned}

Discrete

\begin{aligned} H(p) &= - \sum\limits_{x \in X} p(x) \log p(x) \end{aligned}

Cross Entropy gives us the average encoding size of events p over a different distribution q.

Continuous

\begin{aligned} H(p, q) &= - \int\limits_{X} p(x) \log q(x) dx \end{aligned}

Discrete

\begin{aligned} H(p, q) &= - \sum\limits_{x \in X} p(x) \log q(x) \end{aligned}

Kullback–Leibler Divergence tells us how one probability is different than another.

Continuous

\begin{aligned} D_{\text{KL}} (p || q) = - \int\limits_{X} p(x) \log \frac{q(x)}{p(x)} dx \end{aligned}

Discrete

$$ D_{\text{KL}} (p || q) = - \sum\limits_{x \in X} p(x) \log \frac{q(x)}{p(x)} $$

Resources

Formulas taken from: https://leimao.github.io/blog/Cross-Entropy-KL-Divergence-MLE/