1. Introduction
Cerebrovascular disease is a serious health problem worldwide. “Cerebrovascular disease” refers to a group of conditions that affect blood flow and/or blood vessels in the brain [1] . Cerebrovascular disease includes stroke, carotid stenosis, vertebral and intracranial stenosis, aneurysms, and vascular malformations [2] . Stroke, or sudden loss of blood flow to part of the brain, was the world’s second-leading cause of death [3] and the third-leading cause of disability-adjusted life years (DALYs) [4] . Stroke can occur due to occlusion, e.g. by a blood clot or other embolus that travels to the brain, or by the rupture of a blood vessel in the brain [5] [6] . According to GBD 2019 Stroke Collaborators [7] , there were 12.2 million incident cases of stroke, 101 million individuals who experienced stroke, 143 million DALYs due to stroke, and 6.55 million deaths from stroke in 2019. The World Stroke Organization (WSO) [8] estimated the occurrence of almost 14 million new stroke cases, 5.5 million deaths, and 116 million DALYs each year. The WSO also estimated that 80 million people were living with the impact of stroke and that 1 in 4 people age 25 would experience stroke in their lifetime.
In the United States, cerebrovascular disease (stroke) caused 160,264 deaths, comprising 4.7% of all causes and 38.8 deaths per 100,000 standardized population [9] in 2020. Every year, more than 795,000 people had a stroke. Stroke-related costs in the United States came to nearly $53 billion between 2017 and 2018. This total included the cost of health care services, medications to treat stroke, and missed days of work [10] [11] .
The types of stroke are [12] ischemic or embolic stroke caused by occlusion of the blood supply to the brain and hemorrhagic stroke caused by bleeding in or around the brain. A transient ischemic attack (TIA), also known as a mini-stroke, caused by temporary occlusion, is a serious warning sign of a future stroke [6] . The other problem of stroke is that it frequently requires long-term rehabilitation [13] - [18] , which causes additional medical costs and deteriorates patients’ quality of life (QOL). Gadidi et al. [19] studied the recovery of 139 patients who suffered their first-ever stroke between February 1 and March 30, 2004. At 4 years post-stroke, 9 patients were lost to follow-up, 59 had died, and 71 were surviving and were reassessed. Among these patients, 42.3% had serious activity limitations, 28.2% were classified as somewhat restricted in activity, and 78.1% felt they had not completely recovered.
Since cerebrovascular disease including stroke is a very serious problem, various studies of the risk factors and methods for preventing and detecting cerebrovascular disease have been done [20] - [40] . Various treatment guidelines have also been published [41] [42] [43] [44] . Moreover, high mortality rates have been reported for COVID-19 patients with cerebrovascular disease [45] .
Knowledge of risk factors is very important, not only for prevention but also for urgent treatment, as the etiology of stroke determines treatment. Stroke is a medical emergency that requires immediate treatment [17] [46] [47] and time is one of the most critical factors. The American Heart Association (AHA) has recently developed a campaign “Stroke Target III” [48] [49] to improve treatments and outcomes of ischemic stroke, the most prevalent type, accounting for about 87% of all strokes [11] .
The risk factors for stroke are classified as nonmodifiable (without your control) and modifiable (within your control). The American Stroke Association (ASA) [50] [51] considers age, family history, race, gender, prior stroke, TIA and heart attack as nonmodifiable factors whereas high blood pressure, smoking, diabetes, diet, physical inactivity, obesity, high blood cholesterol, artery disease, peripheral artery disease, atrial fibrillation and sickle cell disease as modifiable factors. Hankey [52] estimated that about 90% of all strokes were attributable to modifiable risk factors.
In Japan, the medical cost of cerebrovascular disease in fiscal year 2019 was 1.825 trillion yen [53] . Cerebrovascular disease caused 102,978 deaths, making it the fourth-leading cause of death, and they accounted for 7.5% of the nation’s deaths (1,372,755) in 2020 [54] . It was estimated that the number of cerebrovascular disease patients in Japan was 1,182,000 in 2017 [55] .
In the present study, the risk factors for cerebrovascular disease are reexamined using the JMDC Claims Database [56] including 13,157,681 medical checkups performed on 3,233,271 individuals between January 2005 and September 2019 in Japan by logit (logistic regression) models. While retrospective in nature, the study is not strictly cross-sectional but rather designed with a unique feature, in that data from the year prior to cerebrovascular events/non-events are examined.
2. Data and Models
2.1. Data
In Japan, the Industrial Safety and Health Act requires that most employees age 40 or older undergo mandatory medical checkups at least once a year. Younger employees and family members of employees may undergo medical checkups on a voluntary basis. The JMDC Claims Database is the nationwide health information database, which collects data from various health insurance societies providing employment-based health insurance. The results of 13,157,681 medical check-ups obtained from 3,233,271 individuals between January 2005 and September 2019 are included. The database tracks various heath information of individuals including histories of cerebrovascular disease.
The ASA [50] states that “A person who has had one or more transient ischemic attacks (TIAs) is almost 10 times more likely to have a stroke than someone of the same age and sex who hasn’t.” Nawata [57] confirmed this statement; the probability of having an ischemic stroke in persons with a history of cerebrovascular disease is much higher than that in those without a cerebrovascular disease history. This fact makes it especially important to prevent first-time cerebrovascular disease. In this study, individuals who had no cerebrovascular disease history at year t and had data (either positive or negative) concerning cerebrovascular disease at year t + 1 (i.e., the following year) are selected, and risk factors of experiencing first-time cerebrovascular disease are analyzed. Excluding the observations with missing values of covariates, 2,657,864 observations satisfy these criteria, and 5984 observations or 0.23% experienced cerebrovascular disease at year t + 1 among the total observations.
2.2. Logit Models
Logit models are used in the analysis. Let
be a dummy variable taking 1 if person i had a cerebrovascular disease history by year t and 0 otherwise. From selection of the sample,
for all observations and
or
at year t + 1.
The following variables are used in the model as nonmodifiable covariates. To avoid causality problems, all values of the covariates are measured at year t.
Age: age of an individual;
Female (dummy variable) 1 female, 0 male;
Family (dummy variable) 1 a family member with cardiovascular disease, 0 otherwise;
t1 (time trend) year – 2004;
Heart_D (dummy variable) 1 individual has a history of heart disease by year t, 0 otherwise.
For the modifiable covariates, the following variables are used.
BMI (body mass index) weight (kg)/height (m)2;
SBP (systolic blood pressure) mmHg;
DBP (diastolic blood pressure) mmHg;
HDL (high-density lipoprotein cholesterol blood) mg/dL;
LDL (low-density lipoprotein cholesterol) mg/dL;
Triglyceride (serum triglyceride level) mg/dL;
ALT (alanine aminotransferase) units per liter (U/L);
AST (aspartate aminotransferase) U/L;
GGP (γ-glutamyl transferase) U/L;
B_Sugar (blood sugar) mg/dL;
HbA1c (hemoglobin A1c) %;
U_Sugar (urine sugar, integers of 1 - 5) 1 undetected, 2 around 50 mg/dL, 3 around 100 mg/dL, 4 around 250 mg/dL, 5 around 500 mg/dL or over;
U_Protein (urine protein; integers of 1 - 5) 1 undetected, 2 around 15 mg/dL, 3 around 30 mg/dL, 4 around 100 mg/dL, 5 250 mg/dL or over;
Weight_1 (dummy variable) 1 weight changed by 3 kg or more in a year, 0 otherwise;
Weight_20 (dummy variable) 1 weight increased by 10 kg or more from age 20, 0 otherwise;
Eat_Fast (dummy variable) 1 eating faster than other people, 0 otherwise;
Late_Supper (dummy variable) 1 eating supper within two hours of bedtime three times or more in a week, 0 otherwise;
No_Breakfast (dummy variable) 1 not eating breakfast three times or more in a week, 0 otherwise;
Exercise (dummy variable) 1 doing exercise for 30 minutes or more twice or more in a week for more than a year, 0 otherwise;
Activity (dummy variable) 1 doing physical activities (walking or equivalent) for one hour or more daily, 0 otherwise;
Speed (dummy variable) 1 walking faster than other people of a similar age and the same gender, 0 otherwise;
Sleep (dummy variable) 1 sleeping well, 0 otherwise;
Alcohol_Freq (frequency of alcohol intake, integer 0 - 2) 0 never, 1 sometimes, 2 every day;
Alcohol_Amount (amount of alcohol intake, integer 0 - 3) 0 none, 1 drinking less than 180 ml of Japanese sake wine (with an alcohol percentage of about 15%) or equivalent alcohol per day when drinking, 2 drinking 180 - 360 ml, 3 drinking 360 - 540 ml, 4 drinking 540 ml or more;
Smoke (dummy variable) 1 smoking habit, 0 otherwise.
In addition to these variables, the side effects of taking medications are of great interest and concern, and the following variables are considered in the analysis.
M_Antihypertensive (dummy variable) 1 taking antihypertensive medications, 0 otherwise;
M_Glucose (dummy variable) 1 taking medications to control glucose levels (including insulin injections), 0 otherwise;
M_Cholestrol (dummy variable) 1 taking medication to control cholesterol or triglycerides, 0 otherwise.
The summary of these covariates is given in Table 1. We consider the observations satisfying
and perform an analysis using the logit model.
Model A:
(1)
SD: Standard Deviation.
where Λ is the distribution function of the logistic distribution given by
.
If there are no side effects of the medications, the coefficients of dummy variables that represent taking medications become zero. All covariates are values at year t.
Clearly, taking medications affects health factors. For example, taking antihypertensive medications affects blood pressures. It is obvious that health factors affect whether an individual is taking medications or not. The problems of endogeneity might occur. Hence, the reduced form model, which does not contain the variables of taking medications, is also considered.
Model B:
(2)
3. Results of Estimation
The results of the estimation are given in Table 2, and we obtained similar results in both models. Among the nonmodifiable variables, the estimates of Age and Heart_D were positive. The t-values were quite large, and they were significant at any reasonable significance level. The estimates of other nonmodifiable variables were not significant at the 5% level. For modifiable variables, the estimates of DBP, HbA1c, U_Protein and Weight_1 were positive and significant at the 1% level in both models. The estimates of No_Breakfast were positive and significant at the 1% and 5% levels in Models A and B, respectively. The estimate of BMI was positive and significant at 1% level in Model A. The estimates of Exercise and Weight_20 were positive and significant at the 5% level in Model A and Model B, respectively. On the other hand, the estimates of Triglyceride, ALT, Speed and Smoke were negative and significant at the 1% level in both models.
The estimates of three dummy variables that represent taking medications were positive, their t-values were quite large and significant at any reasonable significance level in Model A. Especially, the estimate and t-value of M_Antihypertensive were 0.660 and 19.156, respectively. This might imply that taking these medications, especially for hypertension, would constitute very important risk factors for cerebrovascular disease, and we must consider the side effects of these medications.
None of all other modifiable variables were significant in either Models A or B.
SE: Standard Error.
4. Discussion
For an individual without a history of cerebrovascular disease, the overall probability of developing cerebrovascular disease within one year was very small, with a gross rate of just 0.23%. Therefore, the odds ratio (OR) and confidence interval (CI) are approximately equal to the probability ratio (PR) and its CI as shown in Appendix A. Among the nonmodifiable risk variables for cerebrovascular disease, age and a history of heart disease are considered the most important. As shown in Figure 1, the OR comparing persons aged 60 to those aged 50 is 1.43 with a 95% CI of 1.41 - 1.46, and the OR comparing persons aged 70 to those aged 50 is the 2.05 with a 95% CI of 1.99 - 2.12. The risk of persons aged 70 is almost twice as large as that of those aged 50. The OR for a heart disease history (comparing those with and without a heart disease history) is 2.29 with a CI of 95% 2.18 - 2.41. This means that individuals with a history of heart disease will have cerebrovascular disease at rates more than double those without a heart disease history. It is necessary for these individuals to pay special attention to prevention of cerebrovascular disease. Since cerebrovascular disease, especially stroke, is a medical emergency, it is also important to ensure that medical personnel know these facts for proper treatments.
Figure 1. Odds ratios and 95% confidence intervals of nonmodifiable variables.
Figure 2 shows the ORs and 95% CIs of the modifiable variables whose estimates are significant at the 5% level in Model A, except for dummy variables that represent taking medications. For these variables, not only estimates but also their distributions are also important. Let z be a variable of interest. When z is a numerical variable, the OR is calculated by comparing z and (z + one standard deviation). When z is a dummy variable, the OR is calculated comparing z = 0 and 1. For U_Protein, the majority of the values are 1 or 2, so that the OR is calculated comparing U_Protein = 1 and 2. Among these variables, triglyceride level (Triglyceride) and recent large weight change (3 kg or more within a year, Weight_1) are important factors. The OR of Triglyceride is 0.73 with a 95% CI of 0.67 - 0.79. The fact that a higher triglyceride level reduces the risk of cerebrovascular disease seems inconsistent with the expected result. For example, the CDC [58] advises on its website to “limit foods high in saturated fat. Saturated fats come from animal products…” However, Sauvaget et al. [59] reported that higher consumption of animal fat and cholesterol appeared to reduce the risk of deaths from cerebral infarction in Japan. Their finding is consistent with the result of this study. The OR of Weight_1 is 1.32 with a 95% CI of 1.29 - 1.37. This means that recent large (≥3-kg) weight change would increase the risk of the cerebrovascular disease by 30% or more, and individuals in this category should recognize this fact. The ORs (95% CIs) of other important variables are 1.10 (1.07 - 1.12) for DBP, 1.09 (1.07 - 1.12) for U_Protein, 1.11 (1.06 - 1.152) for No_ Breakfast, 0.86 (0.84 - 0.88) for Speed, and 0.88 (0.85 - 0.91) for Smoke. The risks of cerebrovascular disease change about 10% by these variables. The ORs (95% CIs) for BMI and Exercise are 0.94 (0.92 - 0.95) and 1.07 (1.03 - 1.10), respectively. The effects of ALT and HbA1c on the risks of cerebrovascular disease are relatively small.
Figure 3 shows the ORs and 95% CIs of dummy variables that represent taking medications. The ORs (95% CIs) are 1.94 (1.86 - 2.00) for M_Antihypertensive, 1.26 (1.19 - 1.34) for M_Glucose and 1.42 (1.36 - 1.48) for M_Cholesterol. These results may imply that taking these medications would increase the risk of cerebrovascular disease. Especially, taking antihypertensive medications almost doubles
Figure 2. Odds ratios and 95% confidence intervals of modifiable variables.
Figure 3. Odds ratios and 95% confidence intervals of dummy variables taking medications.
the risk of cerebrovascular disease; this conforms with the results of Nawata [57] . The results of this study suggest that every 10-mmHg increment of DBP increases the risk of cerebrovascular disease by 10%; however, the negative side effects of antihypertensive medications are much greater. All medications have risks [58] , and various studies have been done about the side effects of antihypertensive drugs [60] - [64] . There are several types of antihypertensive medications [65] [66] . It is necessary to explore this finding in greater detail and manage antihypertension therapy carefully so as to minimize the negative side effects of treatment [67] - [71] .
Although the effects are not as large as those of antihypertensive medications, medications controlling glucose and cholesterol levels would increase the risk of cerebrovascular disease. Hence, the side effects of these medications should also be carefully considered when prescribing these medications.
5. Conclusions
In this study, we analyzed the risk factors for cerebrovascular disease using data from 2,678,054 medical checkups obtained from the JMDC Claims Database. The sample period was from January 2005 to September 2019. Cerebrovascular disease is a very serious problem in the world. Logit models were used in the analysis. The data of individuals who had no history of cerebrovascular disease at year t and had information (either positive or negative) at the next year (i.e., year t + 1) were analyzed. Among the nonmodifiable factors, age and a history of heart disease are important risk factors. The risk of persons aged 70 is almost twice as large as that of those aged 50. A heart disease history is also an especially important factor; the risk more than doubles. Therefore, individuals with a history of heart disease must pay careful attention to prevent cerebrovascular disease.
Among the modifiable factors, triglyceride level and recent large weight change are very important factors, and they change the risk of cerebrovascular disease about 30%. DBP, U_Protein, No_Breakfast, Speed and Smoke are other important modifiable factors, which change the risk of cerebrovascular disease by about 10%. Taking medications for hypertension, hyperglycemia or hypercholesterolemia also statistically increases the risks of cerebrovascular disease. In particular, taking antihypertensive medications almost doubles the risk. All medications have side effects, and it is necessary to manage these medications carefully to minimize negative side effects.
As the dataset comprises health checkups of workers in Japan, it does not include individuals age 76 or over and includes relatively few individuals in their early 70s, especially those too feeble to work. Age is a critical factor for cerebrovascular disease, and it is necessary to collect the data of old individuals. The medications are not evaluated by pharmacological class, and more useful data might be obtained by doing so. Finally, the dataset contains only information of Japanese individuals, so that potential effects of ethnicity on cerebrovascular disease are not considered. The results might be different in other countries or regions. These are subjects for future study.
Acknowledgements
This study is a part of the research project “Basic research for exploring the ideal medical intervention after the advent of the new coronavirus” at the Research Institute of Economy, Trade and Industry (RIETI). The JMDC Claims Database was purchased by RIETI from the JMDC Cooperation for the project. The author would like to thank the project leader, Yoichi Sekizawa, for his very helpful cooperation. The author would also like to thank an anonymous referee for his/her helpful comments and suggestions. This study was approved by the Institutional Review Boards of Hitotsubashi University.
Appendix A. Odds Ratio and Percentage Ratio
Let z be the variable which we want to know the effect on the probability and x be a vector of other variables. Suppose that z takes two different values z1 and z2 (the values of x are the same in both cases). Let Y be a binary variable in which Y=1 if the objective incident occurs (in this case, having cerebrovascular disease) and Y=0 otherwise. Define
and
. From the definition of the logistic distribution, the log of odds is calculated by
. (3)
When
is small enough (i.e.,
),
(4)
Let
and
. If
is small enough, we get
. (5)
From (3), (4) and (5),
. (6)
The probability ratio (PR) and its variance are obtained by
, and (7)
.
When values of
and
are sufficiently small, we can approximately use the same formulas of the OR for the PR.