5
$\begingroup$

I am currently thinking about formalization of some statistics (in Coq). One thing I don't understand is the logic of e.g. the Shapiro-Wilk test for normality. To explain my problem, let's first look at a Kolmogorov test for normality, which doesn't have this problem. A hypothesis test is in general a contradiction argument: one assumes a certain statistical property of an observation (the null hypothesis) and then shows that with this assumption the observation is highly improbable and concludes that the assumption is likely not true. When I do a KS normality test, I assume as null hypothesis that the distribution is not normal and then show that the distance between the observed distribution and the assumed distribution is so small that this is unlikely. The statistics for the distribution distance derived by Kolmogorov is valid for any continuous distribution, so in essence the logic of such a test is:

"distribution is not normal" -> "distribution is continuous" -> observation is unlikely

from which one can conclude that either premise is likely false.

Now let's look at the Shapiro-Wilk test. The difference to the KS test is, that the distribution of the W statistic given in the 1965 paper by Shapiro and Wilk applies only to normal distributions (otherwise one could use the Shapiro-Wilk test for any distribution). So a normality argument based on W statistics has the logic:

"distribution is not normal" -> "distribution is normal" -> unlikely

where the first premise is the null hypothesis and the second premise is required for applying the W statistics and coming to the "unlikely" conclusion. Again one can conclude from this that either premise is likely false, but in this case this is not that helpful.

A non normality test (assuming normality as null hypothesis) would of cause work.

Can someone please cut this knot for me?

Added (and later edited)

How do people work in practice with statistical methods requiring normality tests? In the abstract of reference [1] it is said: "normal distribution ... is an underlying assumption of many statistical procedures". So people do a test with a null hypothesis that the data is normal, and in case the data is not normal this hypothesis is rejected and the method requiring normal distributed data cannot be used.

What happens if the test does not reject the null hypothesis of normality? I would think many people then apply the methods requiring normal distributed data without much further thought, since the scientific procedure for checking if the data is not not normal was followed.

Is this justified? From a logic point of view not, because after a Shapiro-Wilk test we know nothing at all in case the null hypothesis is accepted. Also as Iosef pointed out (I hope I got him right) statistics claims nothing in this case.

What I wanted to say above is this: in case the null hypothesis is accepted - and I would say this is a frequent use case - some tests really say nothing at all, while other tests still give some information.

What I still don't understand is

  • The connection between a Shapiro-Wilk test and the applicability of methods which require normal distributed data. Reference [1] claims that there is such a connection, but I see this only in a negative sense - maybe it is meant in this way.
  • If it is possible to know more than nothing at all in case the null hypothesis is accepted and if other tests (like KS) are better in this respect compared to Shapiro-Wilk.
  • How to convince a formal logic system like Coq that some methods requiring (approximately) normal distributed data are applicable - as far as I understood Iosef normality tests are always negative, so they can only show that such methods cannot be applied, but not that they can be applied.

References

  1. Mohd Razali, Nornadiah & Yap, Bee. (2011). Power Comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and Anderson-Darling Tests. J. Stat. Model. Analytics. 2.
  2. S. S. Shapiro and M. B. Wilk (1965). An Analysis of Variance Test for Normality (Complete Samples). Biometrika Vol. 52, No. 3/4 (Dec., 1965), pp. 591-611
$\endgroup$
6
  • $\begingroup$ I like this question, but it may be a better fit for our sister-site stats.stackexchange.com However we can first see how it fares here. $\endgroup$
    – Vincent
    Commented Jun 6, 2018 at 8:36
  • $\begingroup$ @Vincent: thank's for the hint! It's not urgent, and I try to avoid cross posting, so I will wait a week or so if I find enlightenment here :-) $\endgroup$ Commented Jun 6, 2018 at 8:47
  • $\begingroup$ @MichaelSoegtrop : (i) If you cannot reject H0, there is absolutely no point in testing. If your significance level is 99%, it means you allow an up to 0.99 type I error probability, and then testing can make only very, very little sense. (ii) "The probability that H0 is true" is not defined and has no meaning. $\endgroup$ Commented Jun 6, 2018 at 16:51
  • $\begingroup$ @Iosif Pinelis : exactly. Which leads to two questions: 1.) what are people doing with W limits for a significance level of 99%? 2.) Why don't people use a KS test with non normality null hypothesis to very normality assumptions? What I am saying is that a Shapiro-Wilk test can only be used to reject normality, while in reference [1] it is suggested to use it to verify normality assumptions in order to justify the use of methods which require normality. [1] states that Shapiro-Wilk is better for this purpose than KS, while I claim that Shapiro-Wilk cannot be used at all for this. $\endgroup$ Commented Jun 6, 2018 at 17:03
  • $\begingroup$ @MichaelSoegtrop : I cannot find the term "verify" or anything of that root in [1]. I cannot find there or in any statistical publication any use of a 99% significance level or higher. Anyhow, as should be clear from my answer, about any claim of a proof of independence in any nontrivial case, especially with such a small sample size as 50 in your question, would be most likely quite wrong. $\endgroup$ Commented Jun 6, 2018 at 18:16

1 Answer 1

3
$\begingroup$

In statistics, the test is usually named after the the null hypothesis. For instance, in the tests for independence, the null hypothesis is that certain random variables are independent -- see e.g. https://newonlinecourses.science.psu.edu/stat500/node/56/ . So, in your case, a normality test should assume normality as the null hypothesis $H_0$. Indeed, the Wikipedia article "Normality test", referred to in a comment by the OP, has this: "data are tested against the null hypothesis that it is normally distributed", and then, as you wrote, there is no problem. (In particular, if the Shapiro--Wilk test, based on the normality assumption, says $H_0$ is not rejected, then it means that your data set does not rule out normality.

In statistics, one usually cannot positively prove anything specific with certainly; rather, one may only decide to reject or not to reject a specific hypothesis. In particular, in the case of an independence test, the standard goal is not to definitely prove independence. That is quite impossible with usually finite data sets. Indeed, to verify the independence of (say) two real-valued random variables (r.v.'s) $X$ and $Y$ with a joint cumulative distribution function (cdf) $F_{X,Y}$ and marginal cdf's $F_X$ and $F_Y$ means to verify the system $F_{X,Y}(x,y)=F_X(x)F_Y(y)$ of infinitely many equations, for all pairs of real $x,y$, with infinitely many unknowns that are all the values of $F_{X,Y}(x,y),F_X(x),F_Y(y)$. On the other hand, it is incomparably easier to (statistically) disprove independence of such r.v.'s $X$ and $Y$: it suffices to (statistically) show that $F_{X,Y}(x,y)\ne F_X(x)F_Y(y)$ for at just one pair of real $x,y$.

Added:

The important feature of a test is, not how it is called, but its power, and in this regard, the Shapiro--Wilk test for independence does seem to "pass the test". Indeed, in the abstract of the paper Power Comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and Anderson-Darling Tests, cited in the OP's comment, we find: "Results show that Shapiro-Wilk test is the most powerful normality test, followed by Anderson-Darling test, Lilliefors test and Kolmogorov-Smirnov test."

As for the standard logic of testing, it is not " 'distribution is normal' -> 'distribution is likely normal' ", as suggested in a comment by the OP. Rather, it is like this:

Assume for a moment that $H_0$ is true. If the most powerful test we know (of the prescribed significance level) tells us that the data contradict this null assumption, then we'll reject $H_0$; otherwise, we won't reject $H_0$.

I can see no problem with this logic. In your case, the null hypothesis $H_0$ is that the distribution is normal.


More added: Concerning Table 6 on page 605 [2]: The misleading term "Level" in the head of that table is not the significance level. It actually is the level of the quantiles of the distribution of the test statistic $W$. The use of that table is illustrated right after it, at the top of page 606 in [2]: for $n=7$, the "tabulated 50\% point" (that is, the 50th percentile of the distribution of $W$) is $0.928$. This is less than the value $0.9530$ of $W$ on the sample of size $n=7$ considered in that example. In view of what is said on page 593 in [2], "the mean values of $W$ for non-null distributions tends [sic -- I.P.] to shift to the left of that for the null case", it appears that the test rejects the null hypothesis $H_0$ for smaller values of $W$, less than a chosen critical value. The illustrative piece in [2] concludes thus: "Referring to Table 6, one finds the value of $W$ to be substantially larger than the tabulated 50\% point, which is 0.928. Thus there is no evidence, from the $W$ test, of non-normality of this sample."

In other words, the $p$-value of the test in that situation was found to be very large, $>50\%$; therefore, the null hypothesis, of normality, is not rejected. No claim of proving or establishing normality is made (and such a claim would be quite ridiuculous here, given such a small sample size as $n=7$). One may also note that no use of a prescribed significance level is made here; instead, the $p$-value (of the strength of evidence) is used.

So, so far I see nothing really objectionable in [2], except for the misleading use of the term "level".

$\endgroup$
7
  • $\begingroup$ Thanks for the nomenclature hint - I am physicist and as such good at messing up mathematical notations :-) But a few points: 1.) I would interpret the first line of en.wikipedia.org/wiki/Normality_test otherwise In statistics, normality tests are used to determine if a data set is well-modeled by a normal distribution and to compute how likely it is for a random variable underlying the data set to be normally distributed. $\endgroup$ Commented Jun 6, 2018 at 13:31
  • $\begingroup$ Thanks for the nomenclature hint - I am physicist and as such good at messing up mathematical notations :-) But a few points: 1.) In the paper by Shapiro and Wilk statistics for both ends are given (say 1% and 99) 2.) I would interpret the first line of en.wikipedia.org/wiki/Normality_test otherwise 3.) With a KS test one can show normality with the weak assumption that the true distribution of the data is continuous 4.) there are quite a few papers which compare the "power" of various tests for showing that a distribution is normal which name the Shapiro-Wilk test $\endgroup$ Commented Jun 6, 2018 at 13:37
  • $\begingroup$ @MichaelSoegtrop : In the Wikipedia article "Normality test" that you are referring to, the only mentioning of "null" is this: "data are tested against the null hypothesis that it is normally distributed", which is in complete agreement with what I believe is the usual statistical practice. Leaving now the nomenclature aside, indeed the important feature of a test is its power. I suggest you cite in your question papers comparing the power of the Shapiro--Wilk test with that of other tests. $\endgroup$ Commented Jun 6, 2018 at 13:58
  • $\begingroup$ an example article comparing normality tests (including Shapiro-Wilk) with the clear goal of positively showing normality is "Mohd Razali, Nornadiah & Yap, Bee. (2011). Power Comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and Anderson-Darling Tests. J. Stat. Model. Analytics. 2." $\endgroup$ Commented Jun 6, 2018 at 14:16
  • $\begingroup$ If you do a Shapiro-Wilk normality test with low alpha value (as done in the article referenced above), essentially you have an argument of the form "distribution is normal" -> "distribution is likely normal", which is not much of an argument either. Shapiro and Wilk gave W statistics for low alpha values in their paper suggesting such use. $\endgroup$ Commented Jun 6, 2018 at 14:20

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .