The Minimum Description Length Principle and Reasoning under Uncertainty Peter Grunwald Abstract: %Nr: DS-1998-03 %Author: Peter Grunwald %Title: The Minimum Description Length Principle and Reasoning under Uncertainty Most research reported in the thesis concerns the so-called Minimum Description Length (MDL) Principle. The MDL Principle is a general method for inductive inference. The fundamental idea behind the MDL Principle is that any regularity in a given set of data can be used to compress the data, i.e. to describe it using fewer symbols than needed to describe the data literally. The more regularities there are in the data, the more we can compress it. This leads to the view (which is just a version of Occam's famous razor) that the more we can compress a given set of data, the more we can say we have learned about the data. The thesis starts with an introduction to MDL intended for a general audience. It continues with some new theoretical research concerning MDL and related statistical methods, like Bayesian Statistics and the Maximum Entropy Principle. This is followed by a practical part, in which results of applying MDL on real-world data sets are reported. It ends with a part on non-monotonic reasoning and Artificial Intelligence's frame problem, not directly related to MDL. Question: Can we use simplistic models? The main question investigated in the thesis is the following: Under what circumstances is it safe or even advisable to use overly simplistic models for the data at hand? The result of statistical analysis of a given set of data is nearly always a model for this data that is really a gross simplification of the process that actually underlies these data. Nevertheless, such overly simple models are often succesfully applied in practice. How is this possible? And can we identify situations in which this is possible and situations in which it is not? These are the main questions we investigate. Though our research is in the framework of the MDL Principle, the results are relevant for other statistical methods also. Answer: 'Safe' and 'Risky' Statistics Briefly, we reach the following conclusion: overly simple models can be applied to make predictions and decisions in two different ways: a `safe' one and a `risky' one. If a model is used in the `safe' way, then it will be `reliable' in the following sense: the model gives a correct impression of the prediction error that will be made if the model is used to predict future data, even in the case where the model is a gross simplification of the process that truely underlies the given data. If the model is used in the `risky' way, there is no such guarantee (nevertheless, such usage of a model often makes sense). We state and prove several theorems which show that incorrect models can be `reliable' in the sense indicated above under many circumstances. The concept of 'reliability' has several applications. We mention three: * It can be used to extend the scope of the MDL Principle to arbitrary model classes and error measures. * It allows to distinguish between 'safe' and 'risky' versions of the method of Maximum Entropy (which is closely related to MDL), thereby explaining both successes and failures of this controversial principle. * It provides new insights in the ongoing debate between 'frequentist' and 'subjectivist' (Bayesian) statisticians. References 1. http://robotics.stanford.edu/~grunwald/thesis/longabstract.html 2. http://robotics.stanford.edu/~grunwald/thesispage.html 3. http://robotics.stanford.edu/~grunwald/index.html