Briefly: the linear regression model. We suppose we can explain or predict y using a vector of variables x. As in Gauß’ estimation theory, y is supposed to be unobservable, and thus has to be estimated. The assumption that y depends on x is expressed this way: the posterior distribution Prob{ Y | X } is different from the prior distribution Prob{ Y }.

The minimization of variance of the difference between [our estimation of Y given X] and [Y] leads to a unique solution: the conditional expectation.

The linear hypothesis says that the estimated value should be an affine expression of X. Moreover, the affine parameters which minimise the variance of the error are given by:



The above linear model coincides with the optimal conditional expectation model when X,Y are Gaussian.

Michel Grabisch, in Modeling Data by the Choquet Integral
(liberally edited)