Local Influence Analysis of Varying-Coefficient Model with Random Right Censorship ()
1. Introduction
Local influence analysis is proposed from the viewpoint of differential geometry [1]. Nearly thirty years, the diagnosis and influence analysis of linear regression model have been fully developed (Ref. [2,3]). The varing-coefficient model is a useful extension of classical linear model. It has been widely applied in statistical modelling, for example, see Ref. [1,4-6]. However, all the above results are obtained under the uncensored case. In many applications, some of the responses and/or covariates may not be observed, but are censored. For censored data, the usual statistical techniques for complete data situations are not readily applicable. When the response is censored, the relationship between the response and the covariate has been widely studied in the literature [7-10].
So far the local influence analysis of varying-coefficient model with random right censorship has not yet seen in the literature, this paper attempts to study it. The paper is organized as follows: The introduction of local influence is given in Section 2; The model and the estimators are introduced in Section 3; The statistical diagnostics are given in Section 4; The example to illustrate our results is given in Section 5.
2. Local Influence
Ref. [2,3] have discussed the method of local influence analysis. Let be an unknown k-dimensional parameter, whose domain is an open subset of Euclidean space. is a object function (for example, likelihood function, punishment log-likelihood function). is a n-vector which denotes disturbed factor, for example weighted or tiny shift. Let be the disturbed model, whose object function is. is the estimate which is from. Given makes and, where has continuous second-order partial derivatives, is the function of. In geometry, denotes n-dimentional surface
(1)
This image is called influence image, which varies with. The variation rate in of influence image reflects that the sensitivity of model, where corresponds to the primary model. This method is called local influence. COOK advanced that utilize influence curvature to measure the change of influence image near.
Ref. [2,3] pointed out that the influence curvature of is given by
(2)
where is second derivatives of with respect to, and
(3)
D and are matrix, where .
The influence matrix is given by
(4)
Formula (2) shows that the maximal influence curvature, where is the eigenvalue of whose absolute value is maximal, and is the corresponding eigenvector which is called the direction of maximal influence curvature. Ref. [5] pointed out that the diagonal value of influence matrix also is the important diagnostic statistics.
3. The Model and Estimators
Let Y be the response variable and be its associated covariates. The varying-coefficient regression model assumes the following structure:
(5)
where is of dimension and
is a p-dimensional vector of unknown coefficient functions. is a stochastic error with
.
Consider the model (5), where Y is the survival time. Let C be the censoring time associated with the survival time Y. Assume that Y and C are conditionally independent given the associate covariates. Denote
and, where is the index function. The observations are
which are random samples from, where. Thus instead of observing, we observe the pairs, where and. Observations on for which are uncensored, and observations on for which are censored. Model (5) is called varying-coefficient regression model with random right censorship right now. Let is the distribution function of, G is the common distribution function of, and. Note that and.
Lemma,.
Proof. Since
and
thus,.
Now we consider follow the model
(6)
where is i.i.d. and,. In practice, we replace with which is the KaplanMeier product-limited estimator of (Ref. [11]). The expression of is given as follows:
(7)
where
.
Let, model (5) is transformed to following varying-coefficient regression model
(8)
Now we want to estimate the unknown coefficient function vector based on the transformed data. In varying-coefficient model, there are a lot of estimates for. Here we use the B-spline estimate.
Let are the knots in, and are the basis functions of m-th B-spline,
is the space of m-th Bspline function. We use the lemma 1.2 of Ref. [3], every smooth coefficient function can be approximated by B-spline function. The B-spline estimator of the coefficient function in model (8) is the solution of following formula
(9)
In order to depict conveniently, supposed that
, ,
,
, ,
, ,
,
then, and Formula (9) can be transformed to following minimize problem
(10)
Utilize the least-square method, the estimator of is
The estimator of the l-th coefficient function, is
Then, the estimator of the coefficient function is
(11)
where is an unit matrix, and is Kronecker product of matrix.
4. The Local Influence of the Model
4.1. Weighted Perturbation Model
Suppose that, then the weighted perturbation model can be shown that
(12)
Substituting this result into (3) yields
(13)
where andthe second derivatives of with respect to
is given by
(14)
Substituting (13) and (14) into (4), we obtain the corresponding influence matrix
(15)
Here denotes the direction of maximal influence curvature.
4.2. Response Variable Perturbation Model
Suppose that, then the response variable perturbation model can be shown that
(16)
Substituting this result into (3) yields
(17)
the second derivatives of with respect to is given by
(18)
Substituting (17) and (18) into (4), we obtain the corresponding influence matrix
(19)
Here denotes the direction of maximal influence curvature.
5. An Illustrative Example
(Vicious Tumour Data) Now we consider an example as the illustration for the above results. Considering a clinical research trial data (see Ref. [4]), there are 205 cancer patients who have been treated in Odense university hospital and tracked until the end of 1977. The survival time of some individuals due to death or end of the trial for other reasons were censored. Ref. [11] utilized a linear semi-parametric model to fit this test data. We utilized varying-coefficient model to fit the data of 57 patients. Where denoted the thickness of tumour, denoted the sex (1 is male, 0 is female). Considering that there was
Figure 1. The direction of maximal influence curvature dwj.
Figure 2. The diagonal value of influence matrix Fwj.
Figure 3. The diagonal value of influence matrix Frj.
Figure 4. The direction of maximal influence curvature drj.
relation between the thickness of tumor and the sex, so we supposed that there was a relation between the coefficient and. Hence, we utilized the varying-coefficient model to analyze these data. The results are as Table 1 and Figures 1-4.
Figures 1 and 2 show that the first and the fourth data are the outlier, Figures 3 and 4 show that the first and the fourth data are the outliers. Indeed, the diagnostic effect of the diagonal value is identical with the direction of maximal influence curvature and this result is similar to Li Yali [12].