![]() ![]() We say this linear model has p model degrees of freedom, with n − p residual degrees of freedom. In ordinary linear regression with full-rank n × p predictor matrix X, the fitted response μ ̂ = X β ̂ is the orthogonal projection of y onto the p-dimensional column space of X, and the residual r = y − μ ̂ is the projection onto its orthogonal complement, whose dimension is n − p. The original meaning of degrees of freedom, the number of dimensions in which a random vector may vary, plays a central role in classical statistics. Degrees of freedom in classical statistics To understand why our intuition should lead us astray here, we must first review why effective degrees of freedom is defined as it is, and what classical concepts the definition is meant to generalize.ġ.2. ![]() ![]() The degrees of freedom of best-subsets regression exceeded p for some k < p in 179 of 200 realizations of μ, or about 90% of the time. Figure 1 shows that the degrees of freedom for both best-subsets regression and another method, forward selection, can exceed the ambient dimension p for values of k < p. Here μ = X β, where the entries of X are independent N (0, 1) variates and the coefficients β j are independent N (0, 4) variates, and σ 2 = 1. However, we have only p free parameters at our disposal, of which p − k must be set to zero, so best-subsets regression with k parameters is still less complex than the saturated model with all p parameters and no constraints.Īs convincing as this argument may seem, it is contradicted by a simple simulation with n = 50 and p = 15. We certainly expect it to be greater than k, since we use the data to select the best model of size k among all the possibilities. What is the effective degrees of freedom, henceforth just degrees of freedom, of this method?Ī simple, intuitive, and wrong argument predicts that the degrees of freedom, which depends on μ, is somewhere between k and p. That is, we find the best linear model that uses only a subset of k predictors, with k < p. Now, suppose that instead of fitting a linear model using all p predictors, we fit the best-subsets regression of size k. The name suggests that a method with p degrees of freedom has a complexity comparable to linear regression on p predictor variables, for which the effective degrees of freedom is p. Σ − 2 ∑ i = 1 n cov ( y i, μ ^ i ), has emerged as a popular and convenient measuring stick for comparing the complexity of very different fitting procedures. For more general estimators, the effective degrees of freedom of Efron (1986), defined as Commonly we set μ = X β for some n × p design matrix X.įor the ordinary-least-squares estimator μ ̂, the most natural measure of complexity is the number p of fitted parameters, i.e., the degrees of freedom. A critical property of any estimator μ ̂ is its so-called model complexity informally, how flexibly it is able to conform to the observed response y. A motivating example: best-subsets regressionĬonsider observing data y = μ + ε with μ ∈ ℝ n and independent errors ε ~ N (0, σ 2 I n), and producing an estimate of μ. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |