4U2 Community
current position: Home > Gamenews > Text

[Li Ka SHING MONEY MANAGEMENT]Can machine learning to use conventional clinical data to improve cardiovascular risk prediction?

2021-06-25 15:48:06 

  Research Background

  Cardiovascular disease (CVD) is still the main cause of death in a global scale.2012,1.75 million people died in cardiovascular disease,Among them, 7.4 million people died in coronary heart disease.6.7 million people died in stroke[1].American Heart Association / USA Cardiology (ACC / AHA) According to established risk factors,Such as hypertension, cholesterol, age, smoking and diabetes,Assess the risk of cardiovascular disease.These risk factors have been included in most CVD risk prediction tools (ACC / AHA[2],QRISK2[3],FRAMINGHAM[4],REYNOLDS[5]).But we still have no accurate predict the risk of cardiovascular diseases facing patients and make corresponding preventive treatment.

  We analyze the models of the above forecast CVD,That is, it is assumed that each risk factor is linear with the CVD results[7].These models may excessively simplify complex relationships,These include a nonlinear relationship between a large risk factor.Therefore, we need to explore better methods to consider a variety of risk factors.And determine the subtle relationship between risk factors and results.

  Machine learning (ML) utilizes “Big Data” to perform mode identification and calculation learning to provide a standard predictive model to solve the above limitations.This depends on the computer by minimizing the error between the prediction and observation, learning the variables all complex and nonlinear interactions[8].In addition,ML may also recognize the potential variables inference from other variables.

  so far,There is no large-scale research application machine learning to make a prognosis evaluation through conventional clinical data.The purpose of this study is to determine high-precision machine learning algorithms.And assess whether the machine learning can improve the accuracy of cardiovascular risk prediction of large-scale ordinary primary health care people.

  research method

  The forward-looking queue study,Selected 378,256 years old without cardiovascular disease history, genetic lipid metabolic disorder, 8 core baseline variables (gender, age, smoking status, shrink, blood pressure, total cholesterol)HDL and diabetes)[2],Compare four machine learning algorithms (random forests, logistic regression, gradient enhancers, neural networks) with established algorithms (US Cardiology College Guide),Prediction 10 years (2005.01.01-2015.01.The first cardiovascular event between 01).The prediction accuracy is evaluated by the area under “receiver working curve” (AUC); sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) predicts 7.5% cardiovascular risk (threshold of statins).

  8 core variables are used to export baseline risk prediction models,Use 2013 ACC / AHA to evaluate the equation published in the CVD Risk Guide[2].In addition, 9 consecutive variables,There is a missing data,At the same time, it is assumed that some clinical variables are missing (eg,Body mass index and laboratory results may indicate the correlation of these patients.Considering that the records of normal BMI values in primary health care[23],We will create a virtual variable to indicate if these continuous variables are lost.In addition to population classification variables, Tomson deprived index and race,A separate “unknown” category is given in the analysis.There are a total of 30 variables (virtual variables that do not include missing values) have been analyzed in the machine learning model before baseline (Table1).


  Machine learning algorithm

  In order to compare the machine learning risk algorithm,We randomly extract 75% of CPRD queues as a “training” queue,Export for CVD risk algorithms,The rest of 25% is “verification” queue,Used to apply and test algorithms.Used four common machine learning algorithms: logistic regression[25], random forest[26], gradient enhancers[27]and neural networks[28].Also RSTUDIO algorithm (http: // CRAN.R-Project.ORG / PACKAGEKARET) For neural networks,H2O (http: // www.H2O.AI) for the rest of the algorithm.The super parameter of each model is determined by two folded cross-validation through the grid search and training queue.

  Statistical Analysis

  This study provides the descriptive characteristics of the crowd.The number (%) and the average value (SD) of the classified variable and continuous variables, respectively.The performance of the machine learning prediction algorithm developed from the training queue,Evaluate the verification queue by calculating the Harrell’s C-Statistic[29].Calculate the total area of the receiver working characteristic curve (AUC),C-Statistic uses Jack-KNife program to calculate standard errors and 95% confidence interval[30].In addition,According to the ACC / AHA Guide[2], the 10-year CVD risk for starting fat-proof treatment is> 7.5% threshold,Cases and non-cases that are observed and expected in the verification queue are compared using binary classification analysis.This process provides sensitivity, specific, positive predictive value (PPV), and negative predictive values (NPV).Statistical analysis of algorithm performance with STATA13MP4.

  Research result

  Data Extraction

  In this study,The total of 383,592 patients in 200,000 patients met the qualification standards.Exclude 5336 cases of erroneous patients (i.e., non-numerical entry of blood pressure / cholesterol) and extreme observation?(> 5 times SD of the average value),The analysis queue consists of 378,256 patients.then,The queue randomly divided into 75% samples of 29,5267 patients.The remaining samples of the training machine learning algorithm and 82,989 patients are used to verify (Fig 1).


  In the whole queue,378,256 patients, 10 patients,There are 24,970 cases (6.6%) cardiovascular disease cases.Women in CVD cases are significantly less than men (42% f,52% M),In the case of non-CVD cases, women are only slightly more than men (52% f,48% m).The average baseline age of cardiovascular disease patient is 65.3 years,The average baseline age of patients with non-cardiovascular disease is 57.3 years old (P<0.001)。CVD 和非 CVD 患者的进一步特征见 Table2。


  The input variables of the machine learning model are listed in Table2.The importance of variables is determined by the coefficient effect size of the ACC / AHA baseline model and the machine learning Logistic regression.Random forest and gradient enhancement machine model,Based on the decision tree,Sorting the variable of the variable of the variable as the decision node,The neural network uses the overall weight of the variable in the model.Top 10 risk factors for CVD prediction algorithms see Table3.


  Standard risk factors in the ACC / AHA algorithm in gender are age, total cholesterol, high-density lipoprotein cholesterol, smoking, blood pressure and diabetes.Several risk factors (age, gender, smoking) in the ACC / AHA model are top risk factors for all four machine learning algorithms.diabetes,A significant factor in many CVD algorithms,However, in the machine learning model, it is not listed as top risk factors (although HBA1C is used as a proxy in a random forest model).Machine learning discovered some other new risk factors found by the previous risk prediction tool,Including medical conditions,Such as COPD and severe mental illness,The prescription oral corticosteroids,And biomarkers,Such as triglyceride levels.Random forests and gradient enhancers are most similar in risk factors and rankings.There are some differences in ranking order and BMI alternative systolic pressure.Logic regression and neural network prioritize medical conditions,Such as atrial fibrillation,Chronic kidney disease and rheumatoid arthritis,More than the risk factors of biometric characteristics.Neural networks also will be aged as a weighted less risk factor.It includes “body mass index deletion” as a protection risk factor for CVD.

  According to the discriminant tool (AUC statistic), there is a predictive accuracy of all models to see Table4.


  The ACC / AHA risk model is a comparison baseline (AUC 0.728,95% CI 0.723-0.735).Compared with the baseline model,All tested machine learning algorithms show significant improvements in statistical skills (increasing from random forest algorithms.7% to the neural network increase 3.6%) ACC / AHA baseline models are correctly predicted from 7404 cases of 4,643 cases.Sensitivity is 62.7%,PPV is 17.1%.The random forest algorithm predicts 191 cases of CVD cases on the basis of the baseline model.Sensitivity is increased to 65.3%,PPV is increased to 17.8%,Logistic regression predicts 324 cases of CVD cases (sensitivity 67.1%; PPV18.3%).Gradient enhancement machine and neural network performance,354 cases were predicted correctly, respectively, respectively (sensitivity 67.5%; PPV18.4%) and 355 CVDs (sensitivity 67.5%; PPV18.4%).The ACC / AHA baseline model is correctly predicted from 75,585 non-cases, 5,3106 non-cases.The results specifically 70.3%,NPV is 95.1%.Compared with the baseline ACC / AHA model,The number of non-pathological examples of the random forest algorithm increased by 191 cases,The neural network increased by 355 cases.

  Analysis conclusion

  Machine learning has significantly improved the accuracy of cardiovascular risk prediction,Compared with the established AHA / ACC risk prediction algorithm,We have found that all test machine learning algorithms can better identify will develop into CVDs and do not develop individuals with CVD.Unlike established risk prediction methods,The machine learning method used is not limited to a small risk factor,And incorporate more pre-existing medical conditions.Neural network performance is best,The prediction accuracy is increased by 3.6%.

  Advantages and limitations

  This study conducted the first time for the machine’s electronic medical records in the patient’s electronic medical record.It is confirmed that machine learning can better predict large-scale general population CVD risks.A series of machine learning algorithms used in this study showed that the models based on decision tree are similar to each other.Gradient enhancers have superior performance than random forests.Neural network and logistic regression more emphasis on classified variables and CVD related medical conditions,Patients having similar features in each group are clustered.This may help further explore different predictive risk factors.And the new risk prediction method and the development trend of algorithms.In addition, the disclosure of the deficiency or no response is ignored in conventional CVD risk prediction tools[2-5].This study shows thatLack value,Especially for conventional biometric variables,BMI,It is the independent predictor of CVD.

  It has to be recognized that machine learning algorithms,Especially the nature of the “black box” of the neural network may be difficult to explain.This refers to the internal complexity of how the risk factor variable interactions and its independent influence on the results.however,The improvement of data visualization improves the understanding of these models,Describes the importance of network connection between risk factors[35](see Fig.2 Example of visualization neural network model).


  In addition, we also realize thatAs the number of potential risk factors increases,The complexity of the model may result in excessive fitness,Produces incredible results.We solve this problem by actively and appropriately selecting the pre-training, hyperfeit selection and regularization[36].Although we have used a separate data set cross to verify the performance of the machine learning algorithm,This is a common method,Used to develop established cardiovascular risk algorithm in clinical practice[2-5,34,37]But must recognize thatJack-Knife programs may produce more accurate results,Such as genomic or proteome data sets[38,39]In addition,These established risk prediction algorithms have been developed from the binary classification framework.This usually causes the imbalance of the data set.Collective learning has proven to build a balanced data set to increase the solution of predictive performance[40].These methods are not common in clinical data concentration development risk prediction models,But their utility should be discussed in future research.

  to sum up

  With the improvement of computing power in the healthcare system,Using machine learning to improve disease risk prediction in clinical practice will be widely used[7].Compared with the established risk prediction method,This study shows thatThe machine learning algorithm can better predict cardiovascular disease cases.Increase the absolute quantity of predictive cases,A non-CVD case is successfully excluded.


  Slide the view

  1. World health Organization. Global Status Report on Noncommunicable Diseses. Geneva, Switzerland: World Health Organization, 2014.

  2. Goff DC, Lloyd-Jones DM, Bennett g, Coady s, D’Agostino RB, Gibbons R, ET Al. 2013 ACC / AHA Guideline on the assessment of cardiovascular risk: a report of the American College of Cardiology / American Heart Association Task Force on Practice Guidelines. Circulation 2013; 135 (11): 1-50.

  3. Hippisley-Cox J, Coupland C, Vinogradova Y, Robson J, Minhas R, Sheikh A, ET Al. Predicting Cardiovascular Risk In England and Wales: ProSpective DeriVation and Validation Of Qrisk2. BMJ 2008; 336 (7659): 1475-82. HTTPS: // DOI.ORG / 10.1136 / bmj.39609.449676.25 PMID: 18573856

  4. D’Agostino RB, Vasan RS, Pencina MJ, Wolf pa, Cobain M, Massaro JM, ET Al. General Cardiovascular Risk Profile for Use in Primary Care: The Framingham Heart Study. Circulation 2008; 117 (6): 743-53. HTTPS: // DOI.ORG / 10.1161 / CIRCULATIONAH.107.699579 PMID: 18212285

  5.Ridker P, BURING JE, Rifai n, Cook NR. Development and validation of improved algorithms for the assessment of global cardiovascular risk in Women: The Reynolds Risk Score. JAMA 2007; 297 (6): 611-9. HTTPS: // DOI.ORG / 10.1001 / jama.297.6.611 PMID: 17299196

  6. Ridker PM, Danielson E, Fonseca FAH, GeneSt J, Gotto AM, Kastelein JJP, ET Al. Rosuvastatin to Prevent Vascular Events in Men and Women with Elevated C-Reactive Protein. NEW ENGLAND JOURNAL OF Medicine 2008; 359 (21): 2195-207. HTTPS: // DOI.ORG / 10.1056 / nejmoa0807646 pmid: 18997196

  7. Obermeyer Z, Emanuel EJ. Predicting The Future-Big Data, Machine Learning, And Clinical Medicine. THE New England Journal of Medicine 2016; 375 (13): 1216-9. HTTPS: // DOI.ORG / 10.1056 / nejmp1606181 PMID: 27682033 8. Dreiseitl S, Ohno-machado L. Logistic Regression and Artificial Neural Network Classification Models: a Methodology Review. Journal of Biomedical Informatics 2002; 35 (5-6): 352-9. PMID: 12968784

  9. Berglund E, Lytsy P, Westerling R. Adherence to and Believe IN Lipid-Lowering Medical Treatments: A Structural Equation Modeling Approach Including The next Necessity-Concern Framework. Patient Education And Counseling 2013; 91 (1): 105-12. HTTPS: // DOI.ORG / 10.1016 / J.PEC.2012.11.001 PMID: 23218590

  10. Herrett E, Thomas SL, Schoonen WM, Smeeth L, Hall aj. Validation and validity of Diagnose IN The General Practice Research Database: a systematic review. British Journal of Clinical Pharmacology 2010; 69 (1): 4-14. HTTPS: // DOI.ORG / 10.1111 / J.1365-2125.2009.03537.x Pmid: 20078607

  11.EEG-OLOFSSON K, Cederholm J, NILSSON PM, Zethelius B, Svensson AM, GudbjornsDottir s, ET Al. NEW Aspects of HBA1C As a Risk Factor for Cardiovascular Diseases in Type 2 Diabetes: An Observational Serudy from The Swedish National Diabetes Register (NDR). Journal of Internal Medicine 2010; 268 (5): 471- 82. HTTPS: // DOI.ORG / 10.1111 / J.1365-2796.2010.02265.x Pmid: 20804517

  12. Emerge Risk Factors Collaboration. C-Reactive Protein, Fibrinogen, And Cardiovascular Disease Prediction. NEW ENGLAND JOURNAL OF Medicine 2012; 367 (14): 1310-20. HTTPS: // DOI.ORG / 10.1056 / nejmoa1107477 pmid: 23034020

  13. Jardine AG, Gaston RS, Fellstrom BC, Holdaas H. Prevention of Cardiovascular Disease In Aducents of Kidney Transplants. The lancet; 378 (9800): 1419-27.

  14. Mason Je, Starke Rd, Van Kirk JE. Gamma-Glutamyl Transferase: a Novel Cardiovascular Risk Biomarker. Preventive Cardiology 2010; 13 (1): 36-41. HTTPS: // DOI.ORG / 10.1111 / J.1751-7141.2009.00054.x Pmid: 20021625

  15. Mullerova H, Agusti A, Erqou S, Mapel DW. Cardiovascular Comorbidity In Copd: Systematic Litrate Review. CHEST 2013; 144 (4): 1163-78. HTTPS: // DOI.ORG / 10.1378 / CHEST.12-2847 PMID: 23722528

  16. Osborn DP, Hardoon S, Omar RZ, Holt ri, King M, Larsen J, ET Al. Cardiovascular Risk Prediction Models For People With SevelE Mental Illness: Results from The Prediction And Management of Cardiovascular Risk In People With Research Program. Jama Psychiatry 2015; 72 (2): 143-51. HTTPS: // DOI.ORG / 10.1001 / jamapsychiatry.2014.2133 PMID: 25536289

  17. Ray WA, Chung CP, Murray KT, Hall K, Stein cm. Atypical antipsychotic drugs and the risk of surunden cardiac death. NEW ENGLAND JOURNAL OF Medicine 2009; 360 (3): 225-35. HTTPS: // DOI.ORG / 10.1056 / nejmoa0806994 pmid: 19144938

  18. SIN DD, Wu L, MAN sf. In the LTYRIPTIC REVIEW of the LITERATIN. CHEST 2005; 127 (6): 1952-9. HTTPS: // DOI.ORG / 10.1378 / CHEST.127.6.1952 pmid: 15947307

  19. Souverein PC, BERARD A, Van Staa TP, Cooper C, EGBERTS ACG, Leufkens hgm, ET Al. Use of Oral GlucoCoticoids And Risk of Cardiovascular and CereBrovascular Disease In a population based base-control study. HEART 2004; 90 (8): 859-65. HTTPS: // DOI.ORG / 10.1136 / hrt.The 2003.020180 PMID: 15253953

  20. Wannamethee SG, Shaper AG, Perry IJ. SERM CREATINININE CONCENTRATION AND RISK OF Cardiovascular Disease: a Possible Marker for Increased Risk of Stroke. Stroke; a Journal of Cerebral Circulation 1997; 28 (3): 557-63.

  twenty one. Weng sf, Kai J, Guha in, QURESHI N. The value of askartate aminotransferase and alanine aminotransferase In Cardiovascular Disease Risk Assessment. Open health 2015; 2 (E000272): 1-10.

  twenty two. Batista Geapa, Monard MC. An Analysis of Four Missing Data Treatment Methods for Supervised Learning. Applied Artificial Intelligence 2003; 17 (5-6): 519-33.

  twenty three. Bhaskaran K, Forbes HJ, Douglas I, Leon Da, Smeeth L. RepresentativeAss And Optimal Use of Body Mass INDEX (BMI) in The UK Clinical Practice Research Datalink (CPRD). BMJ Open 2013; 3 (E003389): 1-8.

  twenty four. Assmann G, Cullen P, Schulte H. SIMPLE Scoring Scheme for Calculating The Risk of Acute Coronary Events Based on The 10-Year Follow-Up of The Prospective Cardiovascular MU ¨nster (procam) study. Circulation 2002; 105 (3): 310-5. PMID: 11804985

  25. Hosmer dw, Lemeshow s, Sturdivant RX. Applied Logistic Regression, 3rd edition. New Jersey, USA: John Wiley & Sons; 2013.

  26. Breiman L. Random. Machine Learning 2001; 45 (1): 5-32.

  27. Friedman J. Greedy Boosting Approximation: a gradient boosting machine. The Annals of Statistics 2001; 29 (5): 1189-232.

  28. Hagan M, Demuth H, Beale M, DE YESUS O. Neral Network Design, 2nd Edition. Boston: PWS Publishers; 2014.

  29. Newson R. Comparing The Predictive Power of Survival Models Using Harrell’s C or Somers’ D. THE Stata Journal 2010; 10 (3): 339-58.

  30. Newson R. Confidence Interval for Rank Statistics: Somers’ D and Extensions. THE Stata Journal 2006; 6 (3): 309-34.

  31. The Emerging Risk Factors Collaboration. C-Reactive Protein, Fibrinogen, And Cardiovascular Disease Prediction. NEW ENGLAND JOURNAL OF Medicine 2012; 367 (14): 1310-20. HTTPS: // DOI.ORG / 10.1056 / nejmoa1107477 pmid: 23034020

  32. Waljee AK, Higgins PDR, Singal AG. A primer on prandive models. CLINICAL AND TRANSLATIONAL GASTROENTROENTEROLOGY 2014; 5 (1): E44.

  33. Dybowski R, Gant V, Weller P, Chang R. . .CRIPLITRI (PHILD ENGINEERING) “-2003 www.ilib.cn. 1146 (9009): 1146-50.

  34.Voss R, Cullen P, Schulte H, Assmann G. Prediction of Risk of Coronary Events in Middle-Aged Men In The Prospective Cardiovascular MU ¨nster Study (Procam) USING NEURAL NETWORKS. INTERNATIONAL OF INAL OF Epidemiology 2002; 31 (6): 1253-62. PMID: 12540731

  35. Olden J, Jackson D. Illuminating the “Black Box”: a Randomization Approach for Understanding Variable Contributions in Artificial Neural Networks. Ecological Modelling 2002; 2002 (154): 135-50.

  36. Bengio Y. Practical Recommendations for gradient-based training of deep architectures. IN: Montavon G, ORR GB, Mu ¨ller K-R, EDS. NEURAL NETWORKS: TRICKS OF The Trade: Second Edition. Berlin, HEIDELBERG: Springer Berlin Heidelberg; 2012: 437-78.

  37. Woodward M, BRINDLE P, Tunstall-Pedoe H. Adding Social Deprivation and Family History to Cardiovascular Risk Assessment: The Assign Score from The Scottish Heart Health Extended Cohort (Shhec). HEART 2007; 93 (2): 172-6. HTTPS: // DOI.ORG / 10.1136 / hrt.2006.108167 PMID: 17090561

  38. Chen J, Long R, WANG XL, Liu B, CHOU KC. DRHP-PSERA: Detecting Remote Homology Proteins Using Profile-based Pseudo Protein Sequence and Rank AGGREGATION. SCI REP 2016; 6 (32333): 1-7.

  39. Liu B, Long R, CHOU KC. IDHS-EL: Identifying DNase I Hypersensitive Sites by Fusing Three Different Modes of Pseudo Nucleotide Composition Into An Ensemble Learning Framework. BioInformatics 2016; 32 (16): 2411-8. HTTPS: // DOI.ORG / 10.1093 / BIOINFORMATICS / BTW186 PMID: 27153623 40. Liu B, Wang S, DONG Q, Li S, Liu X. Identification of DNA-Binding Proteins by Combining Auto-Cross Covariance Transformation and Ensemble Learning. IEEE TRANS Nanobioscience 2016; 15 (4): 328-44.

  41. Kennedy EH, Wiitala WL, Hayward RA, SUSSMAN JB. Improved Cardiovascular Risk Prediction Using Nonparametric Regression and Electronic Health Record Data. MEDICAL CARE 2013; 51 (3): 251-8. HTTPS: // DOI.ORG / 10.1097 / mlr.0B013E31827DA594 PMID: 23269109

  42. National Institute for Health and Care Excellent. Cardiovascular Disease: Risk Assessment and reduuction, Including Lipid Modification. London, UK: National Institute for Health and Care Excellent, 2016.

  43. NHS England Board. Personalised Medicine Strategy. London, UK: National Health Service England (NHS England), 2015.

  44. Precision Medicine Intiative (PMI) Working Group. The Precision Medicine Initiative Cohort Program- Building a study Foundation for the 21st Century Medicine. Washington D.C.: National Institutes of Health (NIH), 2015


  Literature source

  Weng, S. F., REPS, J., Kai, J., Garibaldi, J. M., & Quereshi, N. (2017). Can Machine-Learning Improve Cardiovascular Risk PREDICTION USING ROUTINE CLINICAL DATA? Plos One, 12 (4), E0174944. DOI: 10.PONE.0174944

  Reprinted source: Southern Medical anesthesia

  This article is reproduced from other websites,Does not represent a healthy boundary viewpoint and position.If there is any content and picture of the copyright objection,Please contact us in time (Mailbox: Guikequan @ hmkx.CN)