Incomplete knowledge and statistics

It is offen erroneously assumed that incomplete knowledge can always be described by statistics. But this is by no means the case.

If one knows about a number x only that it is in [0,1], one cannot apply statistics since one knows nothing at all about the distribution (except for its support). It is perfectly consistent with the knowledge that in fact always x=0.75, except that one does not know it, or that x oscillates regularly, or.... The ignorance is in this case simply deterministic lack of information. In particular, it would be a mistake to assume that the distribution is uniform (ignorance interpretation). Using the noninformative prior of the Bayesian school, which makes this assumption, may be seriously flawed.

More realistically, in engineering, an uncertainty in the elasticity module of 5% in steel bars may be the only information available to an architect; but 3/4 of the bars used later in the building may have a deviation of 0.1% and the remaining quarter one of 3.7%.

In general, all one can deduce from information that takes the form of deterministic bounds on a vector x of variables and/or on expressions in x are bounds on derived quantities y=f(x) one would like to compute from it. This leads to global optimization problems, where f(x) is minimized or maximized subject to the known constraints. See http://arnold-neumaier.at/glopt/intro.html

The lack of knowledge that statistics can model is of a different kind. It assumes that the _maximal_attainable_ knowledge about the system - at the given level of description - is a probability distribution, and that this probability distribution is indeed known. When this characterization applies, the system is called stochastic. A measurement done on a stochastic system is only indirect information about the system, in that it helps to better estimate the probability distribution. (This interpretation gives the classical density and the quantum density equal ontological status, and makes the interpretation of quantum measurements much more intuitive that the traditional claim that a reading from a macroscopic instrument gives perfect information about a tiny quantum system coupled with it.)

The knowledge of the probability distribution can be replaced by a qualitative knowledge of it (e.g. 'some Gaussian distribution'), together with the knowledge of an incomplete sample from the ensemble of interest; in this case, however, the best statistics can offer are parameter estimation techniques that give credible probability distributions compatible at some confidence level with the sample data.

There are also combinations of both kinds of incomplete information, where one knows the maximal knowledge about a system should be stochastic, but one lacks complete information on the distribution. This is handled by the field of 'imprecise probability', although there is not yet a generally accepted way for analyzing such situations, and different schools with quite different basic approaches compete. See, e.g, the links in http://class.ee.iastate.edu/berleant/home/ServeInfo/Interval/intprob.html

Theoretical physics is always concerned about describing the maximal attainable knowledge about a system (at a given level of description), irrespective of what anyone actually knows about it. In this way, and only in this way, it is possible to get close to the objectivity that science always is striving for.


Arnold Neumaier (Arnold.Neumaier@univie.ac.at)
A theoretical physics FAQ