New Probability Distributions in Astrophysics: II. The Generalized and Double Truncated Lindley ()
1. Introduction
The Lindley distribution, after [1] [2], has one parameter. In recent years the Lindley distribution has been the subject of many generalizations, we report some of them among others: one with two parameters [3], a two-parameter weighted one [4], the generalized Poisson-Lindley [5], the extended Lindley [6] and a transmuted Lindley-geometric distribution [7]. Several generalizations of the Lindley distribution can be found in a recent review [8]. The Lindley distribution is useful in modeling biological data from grouped mortality studies [4] [9] and the first application to astrophysics of the Lindley distribution has been done for the initial mass function (IMF) for stars and the luminosity function for galaxies [10]. The IMF is routinely modeled by the lognormal distribution and therefore the following question naturally arises. Can a Lindley distribution or a generalization be an alternative to the lognormal fit for the IMF? In order to answer the above question Section 2 reviews the notion of statistical sample and Lindley distribution, Section 3 reviews five generalizations of the Lindley distribution, Section 4 introduces the double Lindley distribution and Section 5 fits the six new Lindley distributions to four samples for the mass of the stars.
2. Preliminaries
We report some basic information on the adopted sample and on the original Lindley distribution with one parameter.
2.1. The Sample
The experimental sample consists of the data
with i varying between 1 and n; the sample mean,
, is
(1)
the unbiased sample variance,
, is
(2)
and the sample rth moment about the origin,
, is
(3)
2.2. The Lindley Distribution with One Parameter
The Lindley probability density function (PDF) with one parameter,
, is
(4)
where
and
.
The cumulative distribution function (CDF),
, is
(5)
At
,
and is not zero.
The average value or mean,
, is
(6)
the variance,
, is
(7)
The rth moment about the origin for the Lindley distribution,
, is
(8)
where
(9)
is the gamma function, see [11]. The central moments,
, are
(10a)
(10b)
More details can be found in [2].
3. Generalizations of the Lindley Distribution
We review the statistics of the Lindley distribution with two parameters, power, generalized, new generalized and new weighted.
3.1. The Lindley Distribution with Two Parameters
The Lindley PDF with two parameters TPLD [3] is
(11)
where
,
and
. The CDF of the TPLD is
(12)
The average value or mean of the TPLD is
(13)
and the variance of the TPLD is
(14)
The mode of the TPLD is at
(15)
see Equation (2.3) in [3]. The rth moment about the origin for the TPLD,
, is
(16)
The two parameters b and c can be obtained by the following match
(17a)
(17b)
which means
(18)
and
(19)
3.2. The Power Lindley Distribution
The power Lindley PDF with two parameters (PLD) according to [3] is
(20)
where b, c and
. The CDF of the PLD is
(21)
The average value or mean of the PLD is
(22)
and the variance of the PLD is
(23)
where
(24)
and
(25)
The mode of the PLD is at
(26)
The rth moment about the origin for the PLD is
(27)
The two parameters b and c of the PLD can be found by numerically solving the nonlinear system given by Equation (17a) and Equation (17b).
3.3. The Generalized Lindley Distribution
The generalized Lindley PDF with three parameters (GLD) according to [12] is
(28)
where a, b, c and
. The CDF of the GLD is
(29)
where
is the Whittaker M function, see [11]. The average value or mean of the GLD is
(30)
and the variance of the GLD is
(31)
The hazard rate function,
, of the GLD is
(32)
and Figure 1 reports an example. Here the CDF, Equation (29), and the hazard rate function, Equation (32), are reported in closed form in contrast to what was asserted by [12]. The mode of the GLD is at
Figure 1. Plot of the three-dimensional surface of the hazard rate function when b = 3 and c = 0.5.
(33)
The rth moment about the origin for the GLD is
(34)
and in particular the third moment is
(35)
The three parameters a, b and c of the GLD can be obtained by numerically solving the following three non-linear equations
(36a)
(36b)
(36c)
3.4. The New Generalized Lindley Distribution
The new generalized Lindley PDF with three parameters (NGLD) according to [13] is
(37)
where a, b, c and
. The CDF of the NGLD is
(38)
where
(49)
where
is the incomplete Gamma function, defined by
(40)
see [11]. The average value of the NGLD is
(41)
and the variance of the NGLD is
(42)
The rth moment about the origin for the NGLD is
(43)
and the third moment is
(44)
The three parameters a, b and c of the NGLD are obtained by numerically solving the three non-linear Equation (36a), Equation (36b) and Equation (36a).
3.5. The New Weighted Lindley Distribution
The new weighted Lindley PDF with two parameters (NWL) according to [14] is
(45)
where b, c and
. The CDF of the NWL is
(46)
where
(47)
The average value of the NWL is
(48)
and the variance of the NWL is
(49)
where
(50)
The rth moment about the origin for the NWL is
(51)
where
(52)
The two parameters b and c of the NWL can be found by numerically solving the nonlinear system given by Equation (17a) and Equation (17b).
4. The Double Truncated Lindley Distribution
Let X be a random variable defined in
; the double truncated (DTL) version of the Lindley PDF with one parameter,
, is
(53)
where the effect of the double truncation increases the parameters from one to three, see [15]. The double truncated Lindley distribution with scale, which has four parameters, was introduced in [10].
Its CDF,
, is
(54)
where
(55)
The average value,
, is
(56)
The rth moment about the origin for the DTL,
, is
(57)
where
(58)
The three parameters which characterize the DTL can be found in the following way. Consider the sample of stellar masses
and let
denote their order statistics, so that
,
. The first two parameters
and
are
(59)
The third parameter c can be found by solving the following non-linear equation
(60)
5. Application to the IMF
We report the adopted statistics for four samples of stars which will be subject of fit, with the lognormal, the Lindley generalizations and the double truncated Lindley.
5.1. The Involved Statistics
The merit function
is computed according to the formula
(61)
where n is the number of bins,
is the theoretical value, and
is the experimental value represented by the frequencies. The theoretical frequency distribution is given by
(62)
where N is the number of elements of the sample,
is the magnitude of the size interval, and
is the PDF under examination.
A reduced merit function
is evaluated by
(63)
where
is the number of degrees of freedom, n is the number of bins, and k is the number of parameters. The goodness of the fit can be expressed by the probability Q, see equation 15.2.12 in [16], which involves the degrees of freedom and
. According to [16] p. 658, the fit “may be acceptable” if
.
The Akaike information criterion (AIC), see [17], is defined by
(64)
where L is the likelihood function and k the number of free parameters in the model. We assume a Gaussian distribution for the errors and the likelihood
function can be derived from the
statistic
where
has
been computed by Equation (65), see [18] [19]. Now the AIC becomes
(65)
The Kolmogorov-Smirnov test (K-S), see [20] [21] [22], does not require binning the data. The K-S test, as implemented by the FORTRAN subroutine KSONE in [16], finds the maximum distance, D, between the theoretical and the astronomical CDF as well the significance level
, see formulas 14.3.5 and 14.3.9 in [16]; if
, the goodness of the fit is believable.
5.2. The Selected Sample of Stars
The first test is performed on NGC 2362 where the 271 stars have a range
, see [23] and CDS catalog J/MNRAS/384/675/Table 1.
Table 1. Numerical values of
, AIC, probability Q, D, the maximum distance between theoretical and observed CDF, and
, significance level, in the K-S test of the lognormal distribution, see Equation (66), for different mass distributions. The number of linear bins, n, is 20.
The second test is performed on the low-mass IMF in the young cluster NGC 6611, see [24] and CDS catalog J/MNRAS/392/1034. This massive cluster has an age of 2 - 3 Myr and contains masses from
. Therefore the brown dwarfs (BD) region,
is covered.
The third test is performed on
Velorum cluster where the 237 stars have a range
, see [25] and CDS catalog J/A + A/589/A70/Table 5.
The fourth test is performed on young cluster Berkeley 59 where the 420 stars have a range
, see [26] and CDS catalog J/AJ/155/44/Table 3.
5.3. The Lognormal Distribution
Let X be a random variable defined in
; the lognormal PDF, following [27] or formula (14.2) in [28], is
(66)
where m is the median and
the shape parameter. The CDF is
(67)
where
is the error function, defined as
(68)
see [11]. The average value or mean,
, is
(69)
the variance,
, is
(70)
the second moment about the origin,
, is
(71)
The statistics for the lognormal distribution for these four astronomical samples of stars are reported in Table 1.
5.4. The Generalizations of the Lindley Distribution
The statistics for the Lindley distribution and its generalizations are reported in the following tables: Table 2 for the Lindley distribution with one parameter, Table 3 for the TPLD, Table 4 for the PLD, Table 5 for the GLD, Table 6 for the NGLD and Table 7 for the NWL. The best fit for NGC 2362 is obtained with the PLD, see Figure 2.
The best fit for NGC 6611 is obtained with the Lindley PDF with one parameter, see Figure 3.
Table 2. Numerical values of
, AIC, probability Q, D, the maximum distance between theoretical and observed CDF, and
, significance level, in the K-S test of the Lindley distribution with one parameter for different mass distributions. The number of linear bins, n, is 20.
Table 3. Numerical values of
, AIC, probability Q, D, the maximum distance between theoretical and observed CDF, and
, significance level, in the K-S test of the TPLD distribution with two parameters for different mass distributions. The number of linear bins, n, is 20.
Table 4. Numerical values of
, AIC, probability Q, D, the maximum distance between theoretical and observed CDF, and
, significance level, in the K-S test of the PLD distribution with two parameters for different mass distributions. The number of linear bins, n, is 20.
Table 5. Numerical values of
, AIC, probability Q, D, the maximum distance between theoretical and observed CDF, and
, significance level, in the K-S test of the GLD distribution with three parameters for different mass distributions. The number of linear bins, n, is 20.
Table 6. Numerical values of
, AIC, probability Q, D, the maximum distance between theoretical and observed CDF, and
, significance level, in the K-S test of the NGLD distribution with three parameters for different mass distributions. The number of linear bins, n, is 20.
Table 7. Numerical values of
, AIC, probability Q, D, the maximum distance between theoretical and observed CDF, and
, significance level, in the K-S test of the NWL distribution with two parameters for different mass distributions. The number of linear bins, n, is 20.
The best fit for
Velorum is obtained with the lognormal PDF, see Figure 4.
The best fit for the young cluster Berkeley 59 is obtained with the NGLD, see Figure 5.
5.5. The Double Truncated Lindley
The statistics for the DTL with three parameters are reported in Table 8. Figure 6 reports the CDF of the DTL for NGC 6611 which is the best fit of the various distributions here analysed for this cluster.
6. Conclusion
In this paper we explored five generalizations of the Lindley distribution as well
Table 8. Numerical values of
, AIC, probability Q, D, the maximum distance between theoretical and observed CDF, and
, significance level, in the K-S test of the DTL for different mass distributions. The number of linear bins, n, is 20.
Figure 2. Empirical PDF of mass distribution for NGC 2362 cluster data (273 stars + BDs) when the number of bins, n, is 20 (steps with blue full line) with a superposition of the PLD (red dashed line). Theoretical parameters as in Table 4.
Figure 3. Empirical PDF of mass distribution for NGC 6611 cluster data when the number of bins, n, is 20 (steps with blue full line) with a superposition of the Lindley PDF with one parameter (red dashed line). Theoretical parameters as in Table 2.
Figure 4. Empirical PDF of mass distribution for
Velorum cluster data when the number of bins, n, is 20 (steps with blue full line) with a superposition of the lognormal PDF (red dashed line). Theoretical parameters as in Table 1.
Figure 5. Empirical PDF of mass distribution for the young cluster Berkeley 59 when the number of bins, n, is 20 (steps with blue full line) with a superposition of the NGLD (red dashed line). Theoretical parameters as in Table 6.
Figure 6. Empirical CDF of mass distribution for NGC 6611 cluster data (blue dotted line) with a superposition of the DTL CDF with one parameter (red line). Theoretical parameters as in Table 8.
Table 9. Best fits: Name of the cluster, name of the distribution, D, the maximum distance between theoretical and observed CDF, and
, significance level, in the K-S test.
Figure 7. Part of the empirical CDF of mass distribution for NGC 6611 cluster data (orange circles) with a superposition of the DTL CDF with one parameter (black full line), the lognormal (red dashed line), the Lindley with one parameter (green dot-dash-dot-dash line) and the TPLD (blue dot line).
the double truncated Lindley distribution against the lognormal distribution. For each IMF of the four clusters here analysed, the distribution which realizes the best fit is reported in Table 9. The above table allows concluding that the Lindley family here suggested produces better fits than does the lognormal distribution. Figure 7 reports the CDF for NGC 6611 as well as four fitting curves.