Entropy tells us the theoretical minimum average encoding size for events that follow a particular probability distribution.
Helps measure uncertainty.
N different values expressed in bits is \( \log_2(N)\)
Sum of probability * number of bit required to represent the value.
(0.5 x 1 bit)+(0.25 x 2 bits)+(0.125 x 3 bits)+(0.125 x 3 bits)=1.75 bits
Entropy gives us the average encoding size of events p over distribution p.
Cross Entropy gives us the average encoding size of events p over a different distribution q.
Kullback–Leibler Divergence tells us how one probability is different than another.
Formulas taken from: https://leimao.github.io/blog/Cross-Entropy-KL-Divergence-MLE/