Second things first
Zero confidence
As I have previously pointed out, the idea that point estimates are primary and estimates of their uncertainty are secondary is seductive and commonly held but dangerous and potentially misleading. The alternative, of regarding the expression of uncertainty as primary and the point estimate as a particular value amongst many to which we might pay attention, has its attractions.
Figure 1. Various confidence intervals for a clinical trial in asthma.
Figure 1 shows a series of confidence intervals for the estimated treatment effect in millilitres of forced expiratory volume in one second in a (fictional) clinical trial in asthma comparing (say) a beta-agonist to placebo. The interval at the top is the conventional 95% confidence interval and various intervals with lesser degrees of confidence are also plotted, with the lowest shown being that corresponding to 5% confidence. Since the intervals have been calculated using the t-distribution (which is symmetric), the point estimate, shown by a blue dot, is in the middle. As the confidence drops towards zero, the interval collapses to the point estimate which is halfway between the two limits.
It therefore follows that one way of describing the point estimate is as a value in which we have zero confidence.
A continuous version of the plot is given by Figure 2, which also includes the previous intervals. This shows what has been referred to as a confidence distribution. It enables one to read off an interval of any desired precision.
Figure 2. The various confidence intervals of Figure 1 augmented by a continuous confidence distribution.
So What?
"So what?", the reader may ask.
The way I calculated the confidence interval was by starting with a point estimate of 90 mL and then noting that the standard deviation was 450 mL, which given that there were 200 patients per arm, yielded a standard error of 45 mL and 398 degrees of freedom and that these statistics together with the critical values of the t-distribution allowed me to calculate the confidence intervals.
Everything started with the point estimate.
Am I not being disingenuous in claiming that point estimates are not primary?
One has to be careful. In my opinion, two currently fashionable approaches to analysing data are problematic because of the overemphasis on point estimation and the associated concepts of estimands and unbiasedness. The two approaches I am thinking of are structural causal models (SCM) with their associated devices of directed acyclic graphs and the propensity score (PS).
It my opinion theses techniques live in the linear ordinary least squares world and whereas this world has challenges enough of its own, which these approaches may be useful and even powerful in addressing, such as how to deal with missing data and confounders and distinguish the latter from so-called colliders, they don't really explain how to deal with error structure.
A nice example is given by a recent extremely interesting blog by Kaspar Rufibach using a potential outcomes framework to discuss randomisation. He argues, quite correctly in my view, that when it comes to covariates in a clinical trial, it is not the covariates per se that matter but their combined effect and what randomisation enables is that these can be fairly estimated. (See Indefinite Irrelevance for a discussion.) He states:
One way to mathematically prove this is via *potential outcomes* (PO): at baseline, each patient has two POs,
Y(Z = 0) and
Y(Z = 1)
We are interested in the *average causal effect*
ACE = E(Y(1) - Y(0))
and it turns out this can be estimated from an RCT. Assuming consistency and exchangeability on top of linearity of E(.) ACE can be expressed in quantities actually *observed* in a trial:
Nevertheless, I think that this potential outcomes approach cannot explain everything that randomisation achieves. For example, potential outcomes have been used to develop the propensity score approach[1], the idea that if we can stratify subjects in a study by their probability of receiving one treatment or another, we can obtain an unbiased estimate. However, such an approach does not explain why a randomised blocks design should be analysed differently from a completely randomised design: after all, the point estimate does not change whether you use a two-sample or paired sample t-test. Yet, for both such designs, the PS would be 1/2 for every subject, implying that a single stratum would suffice.
However, to analyse a matched pair design like a completely randomised design is recognised as being an elementary statistical blunder.
The confidence interval cannot be estimated identically for the two designs. The usual defence that is given for the PS approach is that using it does not prevent your also addressing variability. In my opinion this is a very weak defence. The Rothamsted Approach [2,3] requires you to address variability from the beginning and it can explain why the two designs must be analysed differently. As RA Fisher put it:
...randomisation was never intended from the first moment it was advocated to exclude the elimination from the error of components which could be completely eliminated , as in the case of differences between blocks in a randomised block system.... It only requires that these components shall equally be eliminated from the estimation of error ...I often put this by saying that it is only the components which contribute to the actual error of the experiment which need to be randomised to provide an estimate of that error. (Letter to Harold Jeffreys 26 September 1938)[4]
How do you like your squares?
Part of the problem, in my opinion, is that although the theory of randomisation was developed in a context, agriculture, in which the block, treatment and randomisation structures were extremely complex and the analysis had to be developed to take this into account, we medical statisticians tend to discuss it in the comparatively simple context of parallel group trials with the typical trivial treatment structure of a single factor (with often two levels) varied at the same level. In agriculture, however, split-plot designs were common and treatments could be varied at different levels. The effect on correct estimation of standard errors could be quite complex but even the estimation of treatment effects might not be simple.
Recommended by LinkedIn
The basic simplicity of parallel group trials lulls us into the lazy habit of using the simple linear model as our yardstick.
Now, if we look at the simple linear model, which is appropriate for a standard elementary error structure (independently and identically distributed outcomes) which we now tend to express in Aitken's matrix algebra[5] as, for example, given in the first line of Figure 3 below, then we shall see that
Figure 3. Multiple regression equations for estimates and variances of estimates expressed in matrix algebra for ordinary least squares (line 1) and generalised least squares (line 2).
Thus, by proceeding to consider point estimates first and then the variance structure there is no particular difficulty. If we now look at the second line of Figure 3 we see that for the more complicated case of generalised least squares
The net consequence of this is that you cannot safely proceed by considering variation as a secondary matter, it has to be addressed first. Of course, the application of GLS is difficult in practice because the variance covariance matrix is not known but has to be estimated but that is another reason why variation cannot be banished to a secondary position[6].
Even identifiability in causal inference may be illusory since without understanding variation we do not know what infinity has to look like. For example, in a cluster randomised trial, it is not sufficient for the patients studied to grow to infinity, the clusters have to.
Does it matter?
Yes it does. First note that not all trials are parallel group trials. I have already evoked cluster randomised trials but I myself have frequently been involved in cross-over studies [7] and if these have incomplete block structures then estimation may be complicated.
A good example of a complex structure is the Lanarkshire Milk Experiment[8], in which two types of milk (pasteurised and raw) were given as a feeding supplement to schoolchildren. Within schools pupils were either allocated to receive milk or not but each school was only given one type of milk. Thus milk type varied between schools. How to go about estimating effects of diet appropriately cannot be decided without thinking carefully about components of variation.
Second, note that just as variances can vary at different levels so can covariances. Thus to estimate the necessary adjustment for a covariate may require one to go beyond OLS[9]. There may be more than one slope that matters. Failure to appreciate this has bedevilled discussion of Lords Paradox.
Infinity is the universe but not our world
Furthermore, even if we understand that it is the joint effect of covariates that matters, thus freeing ourselves from worrying about indefinitely many confounders if we still obsess about point estimates we shall worry if our sample size is large enough for the point estimate to be correct. However, large the sample size we shall have zero confidence in the point estimate.
The infinity we invoke is only relevant to define our frame of reference. We have to accept that the inferential world we inhabit is a small and finite part of this universe and that that universe does not deliver certainty but governs our bets.
Other things being equal, the smaller the sample size, the wider the confidence interval. The validity of our confidence does not increase with increasing sample size. The task is to express our uncertainty validly. This is what randomisation is deigned to help us do.
References
1. Rosenbaum, P.R. and D.B. Rubin, The Central Role of the Propensity Score in Observational Studies for Causal Effects. Biometrika, 1983. 70(1): p. 41-55.
2. Nelder, J.A., The analysis of randomised experiments with orthogonal block structure I. Block structure and the null analysis of variance. Proceedings of the Royal Society of London. Series A, 1965. 283: p. 147-162.
3. Nelder, J.A., The analysis of randomised experiments with orthogonal block structure II. Treatment structure and the general analysis of variance. Proceedings of the Royal Society of London. Series A, 1965. 283: p. 163-178.
4 Bennett, J.H., Statistical Inference and Analysis Selected Correspondence of R.A. Fisher. 1990, Oxford: Oxford University Press. 380.
5. Aitken, A.C., On least-squares and linear combinations of observations. Proceedings of the Royal Society of Edinburgh, 1934. 55: p. 42-48.
6. Senn, S.J., Various varying variances: The challenge of nuisance parameters to the practising biostatistician. Statistical Methods in Medical Research, 2015. 24(4): p. 403-19.
7.Senn, S.J., Cross-over trials in drug development: theory and practice. Journal of Statistical Planning and Inference, 2001(96): p. 29-40.
8.Senn, S., Student and the Lanarkshire milk experiment. Eur J Epidemiol, 2022.
9. Kenward, M.G. and J.H. Roger, The use of baseline covariates in crossover studies. Biostatistics, 2010. 11(1): p. 1-17.
Chief Data Officer (Statistician, Data Scientist)
1yAs always a great article! From my experience with the analysis of randomised parallel group trials, variance pre specification for repeated measures almost always went with the most general (unspecified) option as the requirement to prespecify the analysis before unblinding (together with the outsourcing of analyses) is seen as the least risky to the success of the trial (wrongly p<0.05). I fear that your suggestion to look at the variance first would follow the same path!
Stephen Senn, thanks for sharing! Here is a related post, https://lnkd.in/dDS8sD75. I'm curious to know your thoughts.
I really like the confidence curve, but I prefer to plot it using the upper- and lower-tailed p-value, rather than the confidence level. Plotting it with the confidence level makes it look like an unusual funnel.