The Minimum Description Length Principle and Reasoning under Uncertainty
Peter Grunwald

Abstract:
%Nr: DS-1998-03
%Author: Peter Grunwald
%Title: The Minimum Description Length Principle and Reasoning under Uncertainty
   
   Most research reported in the thesis concerns the so-called Minimum
   Description Length (MDL) Principle. The MDL Principle is a general
   method for inductive inference. The fundamental idea behind the MDL
   Principle is that any regularity in a given set of data can be used to
   compress the data, i.e. to describe it using fewer symbols than needed
   to describe the data literally. The more regularities there are in the
   data, the more we can compress it. This leads to the view (which is
   just a version of Occam's famous razor) that the more we can compress
   a given set of data, the more we can say we have learned about the
   data.
   The thesis starts with an introduction to MDL intended for a general
   audience. It continues with some new theoretical research concerning
   MDL and related statistical methods, like Bayesian Statistics and the
   Maximum Entropy Principle. This is followed by a practical part, in
   which results of applying MDL on real-world data sets are reported. It
   ends with a part on non-monotonic reasoning and Artificial
   Intelligence's frame problem, not directly related to MDL.
  
 
    Question: Can we use simplistic models?

   The main question investigated in the thesis is the following: 
   
    Under what circumstances is it safe or even advisable to use overly
                  simplistic models for the data at hand?

   The result of statistical analysis of a given set of data is nearly
   always a model for this data that is really a gross simplification of
   the process that actually underlies these data. Nevertheless, such
   overly simple models are often succesfully applied in practice. How is
   this possible? And can we identify situations in which this is
   possible and situations in which it is not?
   These are the main questions we investigate. Though our research is in
   the framework of the MDL Principle, the results are relevant for other
   statistical methods also.

   
    Answer: 'Safe' and 'Risky' Statistics
    
   Briefly, we reach the following conclusion: overly simple models can
   be applied to make predictions and decisions in two different ways: a
   `safe' one and a `risky' one. If a model is used in the `safe' way,
   then it will be `reliable' in the following sense: the model gives a
   correct impression of the prediction error that will be made if the
   model is used to predict future data, even in the case where the model
   is a gross simplification of the process that truely underlies the
   given data. If the model is used in the `risky' way, there is no such
   guarantee (nevertheless, such usage of a model often makes sense). We
   state and prove several theorems which show that incorrect models can
   be `reliable' in the sense indicated above under many circumstances.
   The concept of 'reliability' has several applications. We mention
   three:

     * It can be used to extend the scope of the MDL Principle to
       arbitrary model classes and error measures.

     * It allows to distinguish between 'safe' and 'risky' versions of
       the method of Maximum Entropy (which is closely related to MDL),
       thereby explaining both successes and failures of this
       controversial principle.

     * It provides new insights in the ongoing debate between
       'frequentist' and 'subjectivist' (Bayesian) statisticians.

       
References

   1. http://robotics.stanford.edu/~grunwald/thesis/longabstract.html

   2. http://robotics.stanford.edu/~grunwald/thesispage.html

   3. http://robotics.stanford.edu/~grunwald/index.html