Abstract
This chapter discusses the role of information theory for analysis of neural networks using differential geometric ideas. Information theory is useful for understanding preprocessing, in terms of predictive coding in the retina and principal component analysis and decorrelation processing in early visual cortex. The chapter introduces some concepts from information theory. In particular, the entropy of a random variable and the mutual information between two random variables are focused. One of the major uses for information theory has been in interpretation and guidance for unsupervised neural networks: networks that are not provided with a teacher or target output that they are to emulate. The chapter describes how information relates to the more familiar supervised learning schemes, and discusses the use of error back propagation (BackProp) to minimize mean squared error (MSE) in a multi-layer perceptron (MLP). Other distortion measures are possible in place of MSE. In particular, the information theoretic cross-entropy distortion has been focused in the chapter.