Abstract
In his paper on the mathematical theory of communication, Shannon defines entropy as a measure of uncertainty, which is the descriptive complexity of a random variable. This notion corresponds to entropy in statistical physics, which is at the heart of thermodynamics. It depends on the number of possible configurations of the physical system. In contrast to Fisher's information, Shannon's approach is non-parametric. Mathematically, Shannon's entropy is the expectation of the negative log probability.
We establish the properties of joint and conditional entropy of pairs of random variables, as well as their mutual information and relative entropy. We demonstrate that a large number of independent random variables with the same probability are concentrated with high probability in a typical set whose cardinal is proportional to the entropy of this probability distribution. This result demonstrates that the size of an optimal code is bounded by entropy, and that this bound can be arbitrarily approached by typical coding.