Note that some papers could fall under different headings
H. Snoussi, Geometry of Prior Selection link
(mixture of Gaussians, blind source separation)Several papers by Zhu/Rohwer link, especially 1995-7
Huaiyu Zhu and Richard Rohwer, Information Geometry, Bayesian Inference, Ideal Estimates, and Error Decomposition link
Discussion about information geometry
http://math.ucr.edu/home/baez/corfield/2006/07/information-geometry-and-machine.htmlSumio Watanbe, link
(Layered models, such as neural nets, Bayesian networks)
Y. Altun and A. Smola, Unifying Divergence Minimization and Statistical Inference via Convex Duality link
(ML, MAP, GP classification and regression, graphical models, conditional random fields, sparse estimation methods are all mentioned)
M. Dudik and R. E. Schapire, Maximum entropy distribution estimation with generalized regularization link
For PAC-Bayesian bounds
Arindam Banerjee, On Bayesian Bounds, link
Alex Smola, Summer School Taiwan 2006, lecture 2 link. (and conditional random fields)
J.-F. Cardosa, Dependence, correlation and Gaussianity in independent component analysis, Journal of Machine Learning Research. Vol. 4, pages 1177-1203, dec 2003. link
A. Paiva et al., Kernel Principal Components Are Maximum Entropy Projections link
M. Collins et al., A Generalization of Principal Component Analysis to the Exponential Family link
I. Csiszar and G. Tusnady Information geometry and alternating minimization procedures. Statistics and Decisions, Supplement Issue, 1: 205-237, 1984
A. Gunawardana and W. Byrne. Convergence theorems for generalized alternating minimization procedures. Journal of Machine Learning Research, (6):2049-2073, December 2005. link
T. Jebara et al, Maximum Entropy Discrimination link (and graphical models)
Koji Tsuda, Information Geometry of Diffusion Kernels link and link
Justin Dauwels, On information-geometric aspects of graphical models and kernel machines link
Information Geometry of U-Boost and Bregman Divergence Noboru Murata et al. link
Information Geometry and Statistical Pattern Recognition Shinto Eguchi link
Guy Lebanon, An Extended Cencov-Campbell Characterization of Conditional Information Geometry link (AdaBoost and logistic regression)
see also his thesis
Guy Lebanon, Riemannian Geometry and Statistical Machine Learning link
Ikeda, S., Tanaka, T., and Amari, S. (2004). Stochastic reasoning, free energy, and information geometry. Neural Computation, 16. link
Shinto Eguchi and John Copas, Recent Developments in Discriminant Analysis from an Information Geometric Point of View link
E. Laurķa, "Learning the structure of a Bayesian network," in Maximum Entropy and Bayesian Methods, AIP Conf. Proc., 2005, link
A Simple Approach for Finding Globally Optimal Bayesian Network
Structure
link (not MaxEnt)
C.-H. Yeang, An information geometric perspective on active learning link
Shun-ichi Amari, Hierarchy of Probability distributions link
Franz Josef Och, Hermann Ney, Discriminative Training and Maximum Entropy Models for Statistical Machine Translation link
V. Balasubramanian, MDL, Bayesian Inference and the Geometry of the Space of Probability Distributions, in Advances in Minimum Description Length: Theory and Applications, P.J. Grunwald et al. eds, pp. 81-99. MIT Press, 2005. link
C. Rodrigues, The ABC of Model Selection: AIC, BIC, and the New CIC. link
Miscellaneous
Topsoe manuscripts
link
Shalizi link
See also Funchun Peng's Maximum Entropy Models list, link