Information geometry and entropy
Information geometry (here and here) is a program which aims to apply the techniques of differential geometry to statistics. So we find statistical manifolds, such as the 2-dimensional manifold of normal distributions (mu, sigma), for which there are notions of Riemannian metrics, connections, curvature, etc. (see this for an analysis of standard cases: Bernoulli, Poisson, Gaussian). What I have yet to find is a very clear overview of the program as a whole. There are plenty of highly sophisticated presentations, and there are also some very gentle introductions, such as these videoed talks which towards the end of the second lecture touch on general Bregman divergences. But what I'd love is for someone to give a Baez-style sketch of the big picture.
Overfitting of data occurs when a set of model is so rich that members can be found which easily accommodate the data. These papers (here, here and here) by Vijay Balasubramanian (University of Pennsylvania) discuss in geometric terms what it is about a statistical manifold of models that corresponds to its capacity to represent data distributions.
Entropy is in the air, since the different 'alpha-connections' on the statistical manifolds correspond to different divergences. Now, entropy has a slippery character. Just as you think you're coming to terms with it, there's something new to take into consideration. An important component of the big picture comes from dynamical systems theory, where one finds the notion of the entropy of map. There's even an entropy for braids. (See the conjecture that braids of maximum entropy have either 3 or 4 strands.)
No doubt we should imagine how as data comes in we move about our statistical manifold. My guess is that the entropy of the mapping corresponding to this updating has something to do with the notion of entropy/divergence at play in the first paragraph. And perhaps, in view of his expertise in dynamical systems, I ought to try to understand Smale's position.
This post marks my great state of confusion. If I could penetrate the fog, I'd have a good grip on what statistics has to tell us about learning.
Overfitting of data occurs when a set of model is so rich that members can be found which easily accommodate the data. These papers (here, here and here) by Vijay Balasubramanian (University of Pennsylvania) discuss in geometric terms what it is about a statistical manifold of models that corresponds to its capacity to represent data distributions.
Entropy is in the air, since the different 'alpha-connections' on the statistical manifolds correspond to different divergences. Now, entropy has a slippery character. Just as you think you're coming to terms with it, there's something new to take into consideration. An important component of the big picture comes from dynamical systems theory, where one finds the notion of the entropy of map. There's even an entropy for braids. (See the conjecture that braids of maximum entropy have either 3 or 4 strands.)
No doubt we should imagine how as data comes in we move about our statistical manifold. My guess is that the entropy of the mapping corresponding to this updating has something to do with the notion of entropy/divergence at play in the first paragraph. And perhaps, in view of his expertise in dynamical systems, I ought to try to understand Smale's position.
This post marks my great state of confusion. If I could penetrate the fog, I'd have a good grip on what statistics has to tell us about learning.
0 Comments:
Post a Comment
<< Home