1. Introduction
Let us consider a data model which lives time where the event of interest is a failure (or death) due to the event, and the non-zero integer m, the number of possible causes. By convention, corresponds to the state of functioning (or of life) of the observed individual. It is assumed that the observation is stopped when a failure (or death) occurs, but this observation may be right-censored in a non-informative way. Some examples of this situation corresponds to the case where the event of interest is due to another cause, or withdrawal of the individual from the study or at the end of the study. In the case of right censoring time, the time of failure of year for individuals and their causes are not known to the experimenter. A data model as described above is commonly called “competing risks model” (or competitors) and is studied in fields such as medical control, demography, actuarial science, economics or industrial reliability. In Andersen et al. [4] , an illustration and details of mathematics techniques on competing risks in biomedical applications are developed. For example in the study of AIDS, the different competitive risks can be 1) death due to AIDS, 2) death due to tuberculosis or 3) death due to other causes and in this case (see Figure 1).
It is important to note that in most data models in competing risks, the functions that characterize the probability distribution of the variable of interest and the marginal are not always observable (see Tsiatis [5] , Heckman and Honoré [6] ). Issues to be resolved include virtually the underlying functions for different causes and effects of covariates on the rate of occurrence of competing risks. One of the problems we may face is that the information on the cause of failure of the individual observation can only be known after the autopsy, while we don’t know anything about individuals censored in monitoring. In addition, the incident distributions (due to specific causes) do not allow to describe satisfactorily the probabilities of the various marginal (failures case) in competing risks models. Assumptions of independence of competing risks can help ensure observability in some cases, but they are not reasonable only in such models.
1.1. Related Works
The estimators of Nelson-Aalen and Kaplan-Meier [3] are generally studied in the literature following two approaches: firstly, the method of martingale (Aalen [1] [2] ; Andersen et al. [4] ; Fleming and Harrington [7] , Prentice et al. [8] ) and secondly the law of the iterated logarithm (Breslow and Crowley [9] , Földes and Rejtö [10] or Major and Rejtö [11] , Földes and Rejtö [12] , Gill [13] , Csörgö and Horváth [14] , Ying [15] and Chen and Lo [16] ). Recently, applications have been made in the context of competing risks (Latouche [17] ; Belot [18] ). Latouche [17] states that during the planification of clinical trials, the evaluation of the number of patients to be included is a critical issue because such a formulation does not exist in the Fine and Gray’s [19] model. For this purpose, he therefore computes the number of patients within the context of competition for an inference based function on cumulative incidence and then, he studies the properties of the model of Fine and Gray when it is wrongly specified. Belot [18] presents the data got from randomized clinical tests on prostate cancer patients who died for several reasons.
1.2. Contributions
In this paper, the stochastic processes developed by Aalen [1] [2] are adapted to Nelson-Aalen and KaplanMeier estimators [3] in a context of competing risks (e.g. Aalen and Johansen [20] , Andersen et al. [4] ). We focus only on the complete probability distributions of downtime individuals whose causes are known and which bring us to consider a partition of individuals sub-groups for each cause. We provide a new proof of the consistency of the Nelson-Aalen estimator in the context of competing risks by using the method of martingale. Under the regularity assumptions for the sequence (is a sequence of integers such that and is the number of observable samples) we obtain an almost-safe speed estimator of Kaplan-Meier [3] which is the same as that obtained by Giné and Guillou [21] which is
The rest of the paper is organized as follows: Section 2 describes preliminary results and notations used in the paper and Section 3 evaluates the conditional functions of distribution to the specific causes. Section 4 contains the main results of the paper as well as some properties of our estimators obtained. The last section concludes the paper.
2. Preliminary Results
Lifetime analysis (also referred to as survival analysis) is the area of statistics that focuses on analyzing the time
Figure 1. Example of 3 risks competing model.
duration between a given starting point and a specific event. This endpoint is often called failure and the corresponding length of time is called the failure time or survival time or lifetime.
Formally, a failure time is a nonnegative random variable (r.v.) that describes the length of time from a time origin until an event of interest occurs. We will suppose throughout that
The most basic quantities used to summarize and describe the time elapsed from a starting point until the occurrence of an event of interest are the distribution function and the hazard function. The cumulative distribution function at time also called lifetime distribution or the failure distribution, is the probability that the failure time of an individual is less or equal than the value It is given for by:
The function is right-continuous, nondecreasing and satisfies and We denote by the left-continuous function obtained from in the following way:
The distribution of may equivalently be dealt with in terms of the survival function which is given, for by:
The cumulative hazard function is defined for by:
When is continuous, the relation is valid for all We can then call the log-survival function.
If admits a derivative with respect to Lebesgue measure on the probability density function exists and is defined for by:
Heuristically, the function may be seen as the instantaneous probability of experiencing the event.
With the same hypothesis of differentiability, the hazard function exists and is defined for by:
The quantity can be interpreted as the instantaneous probability that an individual dies at time conditionally on he or she having survived until that time.
For an extensive introduction to lifetime analysis, the reader is referred e.g. to the books of Cox and Oakes [22] and Kalbfleisch and Prentice [23] .
The main difficulty in the analysis of lifetime data lies in the fact that the actual failure times of some individuals may not be observed. An observation is right-censored if it is known to be greater than a certain value, provided the exact time is unknown. Let be the nonnegative r.v. with distribution function that stands for the censoring time of the individual. As before, the nonnegative r.v. with distribution function denotes the failure time of the individual. If is censored, instead of we observe which gives the information that is greater than In any case, the observable r.v. consists of , where denotes the indicator function. The nonnegative r.v. stands for the observed duration of time which may correspond either to the event of interest or to a censoring time
As a sequel to above, it is assumed that and are independent. Consequently, the random variable has the distribution function given by
The following subdistribution functions of will be needed:
and
The relation
is valid for any
The relations that connect the subdistribution functions and to the distribution functions and are given by:
and
The cumulative hazard function of can be expressed as:
Kaplan and Meier [3] introduced the product-limit estimator for the survival distribution function. The estimator of the cumulative hazard function is the Nelson-Aalen estimator introduced by Nelson [24] [25] and generalized by Aalen [1] [2] .
Let for be independent copies of the random vector Let be the order statistics associated to the sample If there are ties between a failure time (or several failure times) and a censoring time, then the failure time(s) is (are) ranked ahead of the censoring time(s).
We define the empirical counterparts of and by:
The Kaplan-Meier product-limit estimator is defined for by:
The Nelson-Aalen estimator for is then defined for by:
The following relations are valid for
where the Kaplan-Meier estimator of, is defined for by:
Let be a sequence of integers between and In order to always have asymptotical results, we suppose that the sequence satisfies the following hypothesis:
for large enough, the sequence is non-increasing and
for large enough, the sequence is non-increasing and there exists a constant such that with is a non-increasing sequence such that:
Condition is required when applying the results of Gin? and Guillou [21] while Condition is required when applying the results of Csrgö [26] .
The following result formulates the laws of the iterated logarithm-type (LIL-type) result on the mentioned increasing intervals.
Theorem 1 (Csörgö [26] ; Giné and Guillou [21] ) Let be a sequence of integers such that and, for the almost sure results, satisfying We have1:
If, in addition, is assumed continuous, then we also have:
Proof. See Csörgö [26] ; Giné and Guillou [21] .
The continuity of is required to linearize the Kaplan-Meier process. Indeed, if is continuous, then can be approximated by on the random interval Precisely, we have the following result.
Proposition 1 (Giné and Guillou [21] ) Let be a sequence of integers satisfying and Hypothesis. If is continuous, then
Proof. See Giné and Guillou [21] .
3. Evaluation of the Conditional Functions of Distribution to the Specific Causes
Let be a continuous random variables representing respectively the lifetimes in each of the risks competing, be the set of index cause, where 0 corresponds to the condition of the individual observed, the random variable of the event of interest and the random variable case, where if for all is the distribution function of
the survival function such that the random variable C of the event censoring right, and for technical reasons, such that if (and) and if.
We notice that and are observable and is so only for uncensored.
We assume that censorship is not informative. The joint law is completely specified by the specific incident distributions cause defined by
(1)
which are none other than the sub-distributions of the specific cause of failure
The cumulative hazard rate of specific-cause corresponding to is given by
(2)
Let be n-sample of observable triplet where and , with and where represent the time that an individual is subject to the cause If and are independent, the random variable admits distribution function defined by Then the Nelson-Aalen estimator of is given for by (see e.g. in Andersen et al. [4] )
(3)
with
and where
(4)
is the counting of the number of failures observed in case of the time interval and
(5)
is the number of individuals in the sample observation that survive beyond time Thus, for any
(6)
represents the number of individuals who may fall down specific cause or be censored.
Estimator similar analogue to (2) and on the sub-group individuals crashing case is given by
(7)
and with and
The relation between the cumulative hazard rate and survival in the subgroup Aj is given by2
(8)
A nonparametric estimator of the distribution function of time life in subgroups is defined by
(9)
is given by
(10)
The size of the subgroup individuals is not observable due to the inaccessibility of all subgroups of specific causes Nevertheless, we can assign a probability to each of the individuals belonging to one of the subgroups. Thus, one can estimate the size by given by ( see e.g. in Satten and Datta [27] or Datta and Satten [28] ) where is the estimator of the probability that the individual n˚ in the sample subgroup, subset of risk of specific-cause. Thus, the final estimators for the cumulative hazard rate due to the specific cause and the corresponding distribution function have the respective expressions
(11)
and for
(12)
4. Main Results
Let be a positive random variable and be a censoring variable such that and In this model of random censorship, for a sample subject to a specific causes we can observe the couple where and with and where is the time that an individual is subject to the cause
For a given and an individual with the counting process is defined by:
Therefore, if an individual undergoes event before time then otherwise We can also define the counting process
Naturally, it appears that we considered the information provided over time as a filter, which is used to describe the fact that past information is contained in the current information, hence we have the natural filtration where
For and for we have
If denotes the left boundary at of we have
since, the quantity takes only the values 0 and 1.
For a given we define the function
which indicates whether the individual is still at risk just before time (the individual has not yet undergone the event). Therefore• if then, and
• if then,
where is the natural filtration (all information available at time), where the notation refers to formal writing of the stochastic integral
writing made possible because is a growing process. The expression of in function of the counting process is given by
Thus, we have
The stochastic process defined for and by
is the martingale associated with the subject at risk Thereafter is the compensating process because it is the integral of the product of two predictable processes.
Theorem 2 Let be an absolutely continuous lifetime and be a censoring variable for any arbitrary distribution Let be the risk function associated with Let’s put and .
For the process defined by
is a martingale if and only if
for such that
Proof. See Breuils ([30] , p. 25) and Fleming and Harrington ([7] , p. 26).
For a given and a given, the expressions of, and are those of formulas (4), (5) and (2) respectively. Using these notations, we can directly obtain the following preliminary result:
Proposition 2 For a given and a given, the stochastic processes defined by
(13)
is the martingale associated with the subject specific cause
Proof.
The martingale represents the difference between the number of failures due to a specific cause observed in the time interval, i.e., and the number of failures predicted by the model for the cause. This definition fulfills the Doob-Meyer decomposition.
The first result of this paper concerns the consistency of the Nelson-Aalen estimator for the competing risks based on martingale approach.
Theorem 3 For such that we have
Proof.
where the expectation of the martingale (specific for cause) is equal to zero and where Indeed,
Hence, we arrive at result.
Using the fact that
we have:
It follows that is an asymptotically unbiased estimator of Hence, we arrived at result.
Our second LIL-type result provides almost sure and in probability rates of convergence of to for uniformly over the random increasing intervals. (See is Deheuvels and Einmahl [31] [32] for very fine results of the model law iterated logarithm functional and available in a point or on a compact strictly included in the support of H). This result is consistent with that of Stute [33] which constitutes a compromise between the results of Breslow and Crowley [9] , Földes and Rejtö [10] or Major and Rejtö [11] , and those of Földes and Rejtö [12] , Gill [13] , Csörgö and Horváth [14] , Ying [15] and Chen and Lo [16] .
Following Giné and Guillou [34] , we say that a non-increasing sequence of numbers is regular if there exists a constant such that for all We denote by the following hypothesis:
for large enough, the sequence is regular non-increasing and there exists a constant such that with is a non-increasing sequence such that
Theorem 4 Let be a sequence of integers such that for all and which satisfies hypothesis for the almost-sure part. For all we assume that is alway continuous. Therefore,
where is the Landau in almost sure sense, and
where is the Landau in probability.
Both results of Theorem above always provides a rate in probability of uniform convergence of to for all through a random growing intervals
To prove Theorem 4, we have drawn from results based on the inference of empirical processes, given that in order to linearize the Kaplan-Meier process, it is necessary to impose continuity condition on Firstly, under the Hypothesis we have the following result:
Lemma 1 Let be a sequence of integers such that and, for the almost-sure results, such that is satisfied. The rate of convergence of to is given by
Proof. The proof of this result follows straightforwardly from the proof of the first part of Theorem 1 concerning the supremum of
Proof of Theorem 4. The following decomposition is obtained for by means of integration by parts:
(14)
Equality (14) entails that:
Notice that the assumption of continuity of for ensures that is continuous according to proposition 1. We then conclude with Theorem 1 and Lemma 1.
5. Conclusion
In this paper, we have adapted the stochastic processes of Aalen [1] [2] to the Nelson-Aalen and Kaplan-Meier [3] estimators in a context of competing risks. We have focused particularly on the probability distributions of complete downtime individuals whose causes are known and which bring us to consider a partition of individuals into sub-groups for each cause. We have also provided some asymptotic properties of nonparametric estimators obtained.
Acknowledgements
I would like to thank Prof. Nicolas Gabriel ANDJIGA, Prof. Celestin NEMBUA CHAMENI, Prof. Eugene Kouassi for their support and their advices. I would also like to thank specially Prof. Kossi Essona GNEYOU for his collaboration and his cooperation during the preparation of this paper.
NOTES
1 is the Landau in almost sure sense and is the Landau in probability.
2 denote the product integral (see Gill & Johansen ).