1. Introduction
Distributed statistical inference, as a hot topic and an effective method, has been widely discussed in the past ten years, and a lot of research results have been accumulated. Its representative work are: In theory, Chen and Zhou (Chen and Zhou) [1] put forward the distributed Hill estimator and prove its Oracle property; Volgushev et al. (2017) [2] propose distributed inference for quantile regression processes, and propose a method to calculate the efficiency of this inference, which requires almost no additional computational cost. In application, Mohammed et al. (2020) [3] propose a technique to divide a Deep Neural Networks (DNN) in multiple partitions, which reduces the total latency for DNN inference; Smith and Hollinger (2018) [4] propose a distributed inference-based multi-robot exploration technique that uses the observed map structure to infer unobserved map features, resulting in a reduction in the cumulative exploration path length in the trial; Ye (2017) [5] started to study the stability of the beta coefficient of the Chinese stock market and found the best beta estimation time. Mitra (2019) [6] uses a smooth linear transfer function to measure the amplitude and direction of market movement, and the proposed classification can better capture the asymmetric behavior of beta.
Market beta, also known as systematic risk or equity beta, is a measure of a stock’s sensitivity to overall market movements. The development of market beta can be traced back to the early 20th century when economists and financial analysts began to understand the importance of systematic risk in determining asset returns. One of the pioneering works in this area is the paper of Markowitz (1952) [7] on portfolio selection, which laid the foundation for modern portfolio theory, and emphasized the importance of diversification. Black and Scholes (1973) [8] introduced the concept of beta to measure the systematic risk of individual securities or stock. Banks with a higher beta are expected to suffer from larger capital losses in the event of an extremely adverse shock in the financial system.
Estimating market beta involves analyzing historical data on a stock’s returns and its correlation with the market returns. The usually used econometric model is that the stock’s returns are regressed against the market returns over a specific time period, which has been used to evaluate beta of financial returns on commodities, currencies (Atanasov and Nitschka (2014) [9] ; Lettau et al. (2013)) [10] , stocks (Post and Versijp (2004)) [11] , and active trading strategies (Mitchell and Pulvino 2001) [12] . However, in extreme cases, conditional regression is based on a small number of tail observations, which may produce a relatively large variance of the estimator, and the data of the financial market are mostly heavy tail, which may further increase error. To avoid these situations, Oordt and Zhou (2017) [13] proposed a new method to estimate market
.
Let X and Y be continuous random variables with distribution functions
and
, respectively. Assume that
and
be heavy-tail with tail index
and
, respectively. This means that
(1.1)
where
and
are slowly varying functions as
. Let
with small
, relation of X and Y restricted under extreme X is given by
(1.2)
is the error term that is assumed to be independent of the X under the condition
.
To get estimator under the EVT method, we consider the following tail dependence measure from multivariate EVT (see, e.g, Hult and Lindskog (2002)) [14] ,
[14] , (1.3)
where
denotes the quantile function of Y defined as
. And we assume that the usual second-order condition (see, e.g, de Haan and Stadtmüller (1996)) [15] for X, which quantifies the speed of convergence in this relation as
[15] , (1.4)
where
and
,
.
Suppose (1.4) hold, under the linear model in (1.2), with
,
, the following conclusion is given in Oordt and Zhou (2017) [13] :
[13] . (1.5)
Naturally, consider independent and identically distributed (i.i.d.) observations
with the i.i.d. unobserved error terms
, we mimic the limit procedure
by considering only the lowest k observations in the tail region, such that
and
as
, Oordt and Zhou (2017) [13] gives an estimator of
as
[13] . (1.6)
And to prove asymptotic normality, the second-order condition for Y is given by:
[15] , (1.7)
where
is an eventually positive or negative function,
,
. Then, Drees and Huang (1998) [16] define
. For this dependence structure, we assume that,
as
for some positive function
, with a speed of convergence as follows: there exists a
for which, as
,
(1.8)
for all
. And we can simply get
(1.9)
Under condition (1.4), (1.7) and (1.8) hold, suppose
, where
Oordt and Zhou (2017) [13] prove the asymptotic normality of
.
This estimation method can be used not only for the assessment of investment risks, but also for banking (Oordt and Zhou (2018)) [17] , insurance and other fields. However, due to confidentiality, banks may not share their operating losses with each other, and insurance companies cannot share any observation results with the outside world in order to protect the privacy of customers. Therefore, banks and insurance companies can only make statistics based on their own data and share the results, and cannot re-identify individual data from the shared information. Distributed statistical inference is a good way to deal with these situations, it can analyze data stored in multiple machines, and it usually requires a divide-and-conquer algorithm that estimates the required parameters on each machine, transmits the results to a central machine that combines all the results, usually by simple averaging, to arrive at a computationally feasible estimator.
The objective of this paper is to apply divide-and-conquer idea to estimating market
. Considering independent and identically distributed (i.i.d.) observations
are distributed across k different machines, each machine has m observations,
, and we assume as
,
(1.10)
We follow a divide-and-conquer algorithm, first estimating
in each machine, and then taking the average of k machines as the distributed estimator
for
,
(1.11)
Sort the observations
in the j-th machine, we get the order statistic
, and only the first d are selected to estimate
, where
,
as
. From (1.5), we have
(1.12)
where, the tail index is estimated using the Hill estimator given in Hill (1975) [18] :
[18] .
The estimator of dependence measure is provided by multivariate EVT, see Embrechts et al. (2000) [19] , that is
[19]
where
is the
-th highest order statistic of
. Finally,
,
.
When
, we require some additional conditions to ensure the asymptotic normality of the Hill estimator:
(1.13)
Suppose there would exist a sequence
as
such that, for sufficient large n, we have
, which implies that the linear model in (1.2) applies for sufficiently large n.
The remainder of this paper is organized as follows. Section 2 provides the main results; finite behaviors of
are considered in Section 3; all proofs are deferred to Section 4.
2. Main Innovations and Results
The innovations of this paper are:
· Under extreme market conditions, with less data and heavy tails, a new beta estimator is proposed by using the distributed idea.
· In the numerical simulation, the profile of data pollution is considered, and the expected effect is achieved, and the data is more inclusive.
The results of this paper are:
• The consistency of
.
Theorem 2.1. Under the linear tail model in (1.2), assume (1.1) and (1.4) hold. (1.13) holds when
. Then, as
,
.
• The asymptotic normality of
.
Theorem 2.2. Assume that the conditions in Theorem 2.1 hold,
as
. Suppose both (1.7) and (1.8) hold,
, Further assume that
, where
Then, as
,
3. Simulation
We conduct two sets of simulations to demonstrate the finite sample performance of the distributed beta estimator
. For each simulation, we consider three linear models, that is,
. We generate samples with samples size n = 10,000. Based on r = 1000 repetitions, we obtain the finite sample squared bias, variance and Mean Squared Error (MSE) for our estimator.
3.1. Compare for Different Level of d
In the first set of simulations, we vary the level of d in the distributed beta estimator to verify the theoretical results on the oracle property. The oracle sample
contains n = 10,000 observations stored in k machines with m observations each. We fix k = 20 and m = 500, compare the finite sample performance of the distributed beta estimator with that of the oracle beta estimator for different values of d. Since the Student’s t-distribution is known to be heavy-tailed with the tail index equal to the degrees of freedom, we perform simulations of X and ε based on random draws from a Student’s t-distribution with Four degree of freedom. According to Lemma 1.3.1 in Embrechts et al. (1997) [20] , the sum of two heavy-tailed random variables is also a heavy-tailed random variable, and the tail index of the sum is controlled by smaller tail index. Then, the observations for Y are constructed by aggregating the simulated X and
ε, which could guarantees Y is also heavy-tailed and
.
The first column of Figure 1 compares the Mean Square Error of the distributed beta estimator
and the oracle beta estimator
. Firstly, Mean Square Error gradually decreases with the increase of β. Theoretically, τ increases with the increase of β, while Mean Square Error decreases with the increase of
Figure 1. Finite sample performance for the distributed beta estimator and the oracle beta estimator for different levels of d. The blue report the simulation results for distributed EVT approach; the yellow lines report those for the EVT approach.
τ. Therefore, the simulation results are in agreement with the theoretical results. Secondly, the second and third columns of Figure 1 show decomposition of the MSE into squared bias and variance, we observe a trade off between the bias and varience for the both estimators: as d increase, the bias increase while the variance decreases, and when the number of observations is small, the variance of the oracle beta estimator is smaller than that of the distributed beta estimator, and as the number of observations increases, the variance becomes equal, which is in line with the result of Theorem 2.2.
3.2. Data Is Contaminated
In the second set of simulations, we want to know whether distributed estimators have good properties when the data is contaminated. We simulate three cases of X being contaminated, ε being contaminated and both X and ε being contaminated respectively. The total number of observations does not change, that is, n = 10,000 is divided into k = 20 machines with m = 500 observations in each machine.
Figure 2 shows the Mean Square Error, square deviation and variance of the two estimators when X is contaminated. We also model 10,000 observations of ε
Figure 2. X is contaminated, finite sample performance for the distributed beta estimator and the oracle beta estimator for different levels of p. The blue lines represent the simulation results of the distributed beta estimator, and the red lines represent the corresponding Oracle results.
from a Student’s t-distribution with 4 degrees of freedom, observations of X are drawn from a standard normal distribution with probability 0.1 and a Student’s t-distribution with 4 degrees of freedom with probability 0.9, this means that 1000 out of 10,000 observations are contaminated. We then sort the observations in each machine and use (1.12) to get
.
The third column in Figure 2 shows the variance of the two estimators when d takes different values, which is almost the same as the result when the observations are not contaminated. When the number of observations is small, the variance of the distributed estimator is larger than that of the Oracle estimator, and with the increase of d, the variance is close to zero. Observe the first column, the Mean Square Error is less than 0.05, the estimation effect is good.
Figure 3 shows the Mean Square Error, square deviation and variance of the two estimators when ε is contaminated. We also model 10,000 observations of X from a Student’s t-distribution with 4 degrees of freedom, observations of ε are drawn from a standard normal distribution with probability 0.1 and a Student’s t-distribution with 4 degrees of freedom with probability 0.9, this means that 1000 out of 10,000 observations are contaminated. We then sort the observations in each machine and use (1.12) to get
. Obviously, Figure 3 is basically
Figure 3. ε is contaminated, finite sample performance for the distributed beta estimator and the oracle beta estimator for different levels of p. The blue lines represent the simulation results of the distributed beta estimator, and the red lines represent the corresponding oracle results.
consistent with Figure 1, indicating that the selection of ε does not affect the properties of the estimator, which is consistent with the theory that the random error can be thin-tailed.
Figure 4 shows the Mean Square Error, square deviation and variance of the two estimators when both ε and X are contaminated. Observations of ε and X are drawn from a standard normal distribution with probability 0.1 and a Student’s t-distribution with 4 degrees of freedom with probability 0.9, this means that 1000 out of 10,000 observations are contaminated. We then sort the observations in each machine and use (1.12) to get
. Similar to Figure 1, this is consistent with the theoretical results, indicating that distributed estimators can be treated similarly when the data is contaminated.
4. Proof
In order to prove the main results, we need the following two lemmas.
Lemma 4.1. Assuming that (1.3) and (1.7) hold,
,
, then as
, we have
Figure 4. Both ε and X are contaminated, finite sample performance for the distributed beta estimator and the oracle beta estimator for different levels of p. The blue lines represent the simulation results of the distributed beta estimator, and the red lines represent the corresponding Oracle results.
Proof. According to theorem 2.3.3 in de Haan and Ferrira (2006) [21] , as
,
is a regular function with parameter
, then
where
is a slowly varying function, since
, we have
, then,
similarly,
.¨
Let
with
to be specified later. It’s clear that for any
,
.
Lemma 4.2. Suppose
as
, Further assume that (1.4) and (1.7) hold,
, where
then as
,
(4.1)
(4.2)
(4.3)
Proof. Let
,
, such that
Notice that
, the choice of
is feasible. And we have that
and
. We first prove that as
,
.
From the heavy-tailed property of the distribution function of Y in (1), we obtain that
,
is a slowly varying function as
, then
Since
,
, we have
, together with
, as
, we have
(4.4)
Then we prove (4.1) first: Notice that
The penultimate step is based on (4.4). As
, since
, then,
Next, we prove (4.2): for some
, we write
The last step uses the condition that
. By (4.4), the denominator converges to
, which is positive and finite. Same as (4.1), we have
Then, we prove (4.3): by Lemme (4.1), we know that
. Recalling the second-order condition (1.4), we have
substituting ux and u by
and
, together with the fact that
, we get that
Compared with (4.3), since
The penultimate step uses Lagrange’s mean value theorem, where x is between 1 and
. Hence, (4.3) is proved since
.¨
Proof of Theorem 2.1. Since
(4.5)
where,
here we show that
,
,
,
,
uniformly for
separately.
We first deal with
. For
in the neighborhood of (1, 1), denote
then
can be written as
and
is in the neighborhood of (1, 1). According to Corollary 2.2.2 in de Haan and Ferrira (2006) [21] , as
,
where
,
. Combining with
, we have
, then, as
,
Hence, for any
, as
, we have
A similar relation for Y holds. Therefore, in order to prove that
, we will prove a more general result that
uniformly for all
,
for some
.
Since the tail dependence function
, denote
since the observed values of different machines are independent and identically distributed, we have
Applying Chebyshev’s inequality, as
, we have
The penultimate step using the convergence of
, that is (1.9), then as
,
.
Hence, what remains to be proved is that
holds uniformly for all
. If
, as
,
If
, applying Lemma 1 in Oordt and Zhou (2017) [13] with
directly gives that
holds uniformly for all
. We further simplify the denominator as follows: as
,
the last step uses the second order condition of X, that is (1.4).
From (1.5), as
, we get that
holds uniformly for
. In addition, as
,
holds uniformly for
. Combine with
, we get that
holds uniformly for
, as
. Hence, we proved that
, together with the consistency of
, we have
uniformly for
, thus,
.
Next, we deal with
. Note that the observations of different machines are independently and identically distributed, similar to the proof in Oordt and Zhou (2017) [13] , if
, then the consistency of
leads to
, as
; If
, to prove that as
, there is
, equivalent to prove
Theorem 3.2.5 in de Haan and Ferrira (2006) [21] guarantees the asymptotic normality of
under conditions (1.3) and (1.13): as
,
that is,
therefore, it only remains to prove that
.
If
,
, by (1.13),
, then,
If
, by (1.5), for sufficiently large n, we have
for some
and
, the last step uses the Potter inequality. Therefore,
. Thus,
,
uniformly for
.
For
, according to Theorem 2.2.1 in de Haan and Ferrira (2006) [21] , as
, for
, we have
then
Since
,
The last step exploits the properties of slowly varying functions, then we have
then
uniformly for
.
For
, same as
, we know
, then
Finally, according to (1.5),
, then
.
From the above analysis,
¨
Proof of Theorem 2.2. From (4.5),
, let’s analyze
separately. Deal with
first. By definition,
is a homogeneous function of the first degree. According to Lemma 2 in Oordt and Zhou (2017) [13] , for
, we have
. Thus, for
,
; for
,
, hence, the partial derivatives of R at the neighborhood of (1, 1) exist as
and
, where
denotes the partial derivatives of R with respect to x and y, respectively. Due to tail stable dependency function
, Theorem 2 in Section 2 of Huang (1992) [22] gives the asymptotic normality of
as
,
, we have
where
is a continuous zero-mean Gaussian process, its covariance is
and
;
;
, then we have
Let
then
the second step uses the Delta method, where
is between
and
. Thus,
, and
(4.6)
Next we deal
. According to Theorem 3.2.5 in de Haan and Ferrira (2006) [21] , we know that Gaussian process can also control the convergence of the tail exponents, i.e.,
where
is the same zero-mean Gaussian process as above, as
,
. Use Delta method, let
,
,
, we have
From Lemma 4.1, as
,
, then,
Let
thus,
and
(4.7)
For
. According Theorem 4 in Chen et al. (2021) [23] , we know that
let
, then
, and
(4.8)
Finally, we deal with
. Similar to
, we know that
use Delta method, let
, then
,
,
Let
, then
and
(4.9)
Thus,
where
, combine (4.6), (4.7), (4.8) and (4.9), we have
Based on the expression for
, we get
, where
And
is independent and identical distributed on different machines, use the Central Limit Theorem,
where,
Therefore, what remains to be proved is the following deterministic relation
According to (1.5), we know that
, use Delta method, the above relation is equivalent to
Next, from (1.8) and (1.9), we have
, combine with
,
, we get that
what remains to be proved is
(4.10)
Notice that Lemma 1 in Oordt and Zhou (2017) [13] can be written as
(4.11)
when
and
. And by (1.5), we have
, then, for any
, for sufficiently small p, we have
, then, for sufficiently large n,
. Thus, (4.11) is equal to
i.e., as
, we have
Let’s just prove that the convergence rate is
. By referring to the set
, without loss of generality, let
,
, by Lemma 4.2,
We prove (4.10) by dealing with the three sets
. The limit relation in (4.3) implies that
and the limit relation in (4.1) implies that
For C1, due to X and ε independent, we have that
the limit relation (4.2) implies that
. Together with (4.3), we have
Since
, combining
,
and
, we have
then
Therefore,
¨