The Concept of Heteroscedasticity 3: GOLDFELD–QUANDT TEST
This sequence presents the Goldfeld-Quandt Test for heteroscedasticity
The disturbance term in a regression model is said to be homoscedastic if it has the same potential distribution in all observations. If this condition is not satisfied, it is said to be heteroscedastic, and clearly the possible types of heteroscedasticity are endless.
However, in one particularly common type the standard deviation of the distribution is proportional to the size of one of the explanatory variables.
This type of heteroscedasticity is illustrated in the diagram above. The standard deviation of the distribution is proportional to X.
The Goldfeld–Quandt test is a test for this type of heteroscedasticity. The sample is divided into three ranges containing the 3/8 of the observations with the smallest values of the X variable, the 3/8 of the observations with the largest values, and 1/4 in the middle.
In the present case with 28 observations, the lower, middle, and upper ranges have 11, 6, and 11 observations, respectively.
You then fit regression lines to the lower and upper ranges of the observations, as shown.
The regression line for the lower range has been buried under the observations. Here it is, in red.
You then compare the residual sum of squares for the two regressions. We will denote them RSS1 and RSS2 for the lower and upper ranges, respectively.
If the disturbance term is homoscedastic, there should be no systematic difference between RSS1 and RSS2.
However, if the standard deviation of the distribution of the disturbance term is proportional to the X variable, RSS2 is likely to be greater than RSS1.
If it is greater, the question is whether it is significantly greater. The test statistic is the F statistic shown above. n1 and n2 are the numbers of observations in the lower and upper regressions. (Normally they will be the same.) k is the number of parameters in the model.
In the present case we reject the null hypothesis of homoscedasticity at the 0.1% level. We therefore need to find an alternative to straightforward OLS regression.
Incidentally, why was the sample split into three ranges? Why not split it into two halves, and compare RSS for the regressions using the two halves?
The reason is that, by omitting the central range, you increase the contrast between the variances of the residuals, and you have a better chance of rejecting the null hypothesis of homoscedasticity.
However, the larger the omitted central section, the smaller will be the number of degrees of freedom in the subsample regressions, and this will make it more difficult to reject the null hypothesis.
Thus there is a trade-off between making the omitted range too large and too small. On the basis of experimentation, Goldfeld and Quandt recommend omitting about a quarter of the observations.