Abstract
Information geometry studies the geometric structures , distances and invariance notions of a family of probability distributions called a statistical model. A parametric statistical model can be treated as a Riemannian variety by equipping it with the Fisher metric tensor, which induces the Rao distance.
This Riemannian structure on the Fisher-Rao variety was later generalized by a dual structure based on pairs of affine connections coupled to the Fisher metric. This dual structure makes it possible to explain the close interaction between estimators for statistical inference (maximum likelihood) and the generation of parametric statistical models ( exponential families obtained by the maximum entropy principle), and puts intoplay a generalized Pythagorean theorem. Applications of information geometry in statistics and neural network learningwill be illustrated .