Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Predicting 30-day readmission following total knee arthroplasty using machine learning and clinical expertise applied to clinical administrative and research registry data in an Australian cohort

Predicting 30-day readmission following total knee arthroplasty using machine learning and... Background Thirty-day readmission is an increasingly important problem for total knee arthroplasty ( TKA) patients. The aim of this study was to develop a risk prediction model using machine learning and clinical insight for 30-day readmission in primary TKA patients. Method Data used to train and internally validate a multivariable predictive model were obtained from a single tertiary referral centre for TKA located in Victoria, Australia. Hospital administrative data and clinical registry data were utilised, and predictors were selected through systematic review and subsequent consultation with clinicians caring for TKA patients. Logistic regression and random forest models were compared to one another. Calibration was evalu- ated by visual inspection of calibration curves and calculation of the integrated calibration index (ICI). Discriminative performance was evaluated using the area under the receiver operating characteristic curve (AUC-ROC). Results The models developed in this study demonstrated adequate calibration for use in the clinical setting, despite having poor discriminative performance. The best-calibrated readmission prediction model was a logistic regression model trained on administrative data using risk factors identified from systematic review and meta-analysis, which are available at the initial consultation (ICI = 0.012, AUC-ROC = 0.589). Models developed to predict complications associ- ated with readmission also had reasonable calibration (ICI = 0.012, AUC-ROC = 0.658). Conclusion Discriminative performance of the prediction models was poor, although machine learning provided a slight improvement. The models were reasonably well calibrated, meaning they provide accurate patient-specific probabilities of these outcomes. This information can be used in shared clinical decision-making for discharge plan- ning and post-discharge follow up. Keywords Readmission, Total knee arthroplasty, Machine learning, Registry data *Correspondence: Daniel J. Gould daniel.gould@unimelb.edu.au Full list of author information is available at the end of the article © The Author(s) 2023. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. Gould et al. Arthroplasty (2023) 5:30 Page 2 of 15 not planned for any other reason related to the patient’s Introduction comorbidities. Exclusion criteria: revision, unicondylar, Unplanned hospital readmission following total knee and patellofemoral arthroplasty, planned readmissions arthroplasty (TKA) disrupts the patient’s recovery and including admissions to the hospital for other procedures incurs high costs to the healthcare system [1–3]. Read- such as chemotherapy or other planned surgical proce- mission can be predicted according to the patient’s char- dures. Admissions to the rehabilitation unit or “hospital acteristics and a broad range of risk factors [4]. Although in the home” service was also excluded. reasonable accuracy can be achieved in certain clinical settings [5, 6], predicting outcomes, especially readmis- sion, is often difficult in TKA patients [4, 5]. This is due Data processing to many factors, including a lack of data of sufficient Data were randomly split into training (75%) and testing granularity to discriminate between TKA patients who (25%) sets, with testing set data kept separate from train- experience deleterious outcomes and those who have an ing set data in every step of the model development and uncomplicated postoperative course [4, 7]. evaluation processes. A seed was set to ensure reproduc- Machine learning algorithms offer an avenue for ibility in random number generation for reproducibility potential predictive performance gain given their abil- of results and to ensure the test set was not used in any ity to capture complex patterns in the data [4, 7]. This stage of the model training process. A primary TKA pro- technique does not rely on pre-specified relationships cedure was considered a case, with both primary TKA between predictors and outcomes, instead utilising com- surgeries for each individual patient with a bilateral putational and statistical principles to derive patterns TKA grouped into either the training set or testing set. without direct human input. Clinical insight into predic- All models were trained using fivefold cross-validation tor selection can be used in conjunction with machine with 10 repeats [11] to obtain a more stable estimate of a learning to increase the clinical relevance and interpret- training set performance before evaluating models on the ability of the model [8]. testing set. The aim of this study was to develop a clinically appli - The value of machine learning was also explored, in cable multivariable predictive model for 30-day read- terms of its ability to be used in conjunction with evi- mission following TKA, compliant with best practice dence from the literature and clinical insight on read- guidelines [9], to be used in shared decision-making mission risk factors to enhance predictive performance between patient and surgeon. and clinical applicability, and to determine how well these population-level risk factors translated into a risk prediction model for individualised patient prognosti- Materials and methods cation based on specific risk profiles. Knowledge of risk Patient selection factors provides population-level information regarding Inclusion criteria: all primary TKA patients identified patient characteristics that are associated with readmis- in the administrative database at the study hospital for sion, whereas predictive models provide individualised whom data were available in the St Vincent’s Melbourne patient-level probability estimates for the patient’s risk Arthroplasty Outcomes (SMART) registry, including [12]. The value of information available at discharge from simultaneous bilateral procedures, TKA for inflamma - the hospital following TKA surgery was also evaluated in tory arthropathies, and TKA for traumatic aetiologies. terms of its ability to enhance predictive performance. Administrative data are available for use in the live clini- The model was developed with the future intention of cal environment, whereas the SMART registry contains being implemented in a hospital’s existing information additional clinically relevant information not available technology infrastructure, facilitating automatic informa- in the live clinical setting which might improve predic- tion retrieval from the patient’s medical record. However, tive performance. The SMART registry is a prospective the added value of information available in a research registry comprising longitudinal data for TKA and total registry was also explored. hip arthroplasty patients at the study hospital, with 100% There were four considerations for models developed capture of elective procedures. It has been described in in this study: detail previously [10]. Unplanned 30-day readmission was defined as readmission to the hospital for a com - 1) Temporal availability of predictors: initial consulta- plication, or monitoring for a suspected complication, tion with the orthopaedic surgeon, specifically when within 30  days following discharge from the orthopae- TKA surgery is offered to the patient, or immediately dic unit after TKA surgery, for any cause. This included prior to discharge. The rationale for this was that a admission to non-orthopaedic units for any reason that model using data available at the initial consultation was not part of the routine postoperative course or was would allow the maximum amount of time to imple- G ould et al. Arthroplasty (2023) 5:30 Page 3 of 15 ment risk mitigation strategies and discharge plan- A version of the registry dataset was generated with- ning. out merging with the administrative dataset to determine 2) Model architecture: logistic regression or random whether the greater number of events (readmissions) forest. Logistic regression is commonly used for the available in the registry improved the predictive perfor- prediction of binary outcomes in healthcare, and it mance [19]. Whereas the administrative database only is a familiar and intuitive approach for clinicians, but includes data from 1 July 2002 onwards, the registry con- machine learning has the potential to improve pre- tains data from 1998. The predictor selection method and dictive performance in the orthopaedics [13]. Ran- model architecture for the best-performing model overall dom forest and logistic regression have been used in were applied to this registry-only dataset. prior literature on the readmission prediction [5]. Risk factors were selected from the systematic review 3) Dataset availability of predictors: administrative data- and meta-analysis [14] carried out by the authors of this base, or only in the registry. The rationale for this was study for which there was moderate- or high-quality evi- that a model using administrative data could poten- dence and which correlated with the readmission. To tially be integrated into the hospital’s information utilise the knowledge of clinicians [8], a modified Del - technology system for automatic data processing. phi survey and focus group study was carried out [20]. 4) Variable selection method: high importance in Del- Variables selected for the model were those with a high- phi and focus group, high and moderate importance importance vote by a simple majority of ≥ 50%. Predic- in Delphi and focus group, or systematic review tors voted as high-importance in the Delphi survey, predictors. A comparison of the different predictor despite lack of systematic review evidence for it being a selection approaches was included because readmis- readmission risk factor, included the following: preopera- sion is a complex phenomenon with many potentially tive patient-reported pain level, dementia, intensive care influential predictors [14]. Therefore, comparing a unit/high dependency unit admission prior to discharge, variety of approaches increased the likelihood of and return to theatre prior to discharge. Dementia was developing a predictive model with strong predictive the only one of these predictors which has been inves- performance as well as clinical relevance [8]. tigated in the literature, and it did not increase the risk of readmission. The following predictors were correlated with readmission in the literature but did not receive a To reduce overfitting, strategies were employed such high-importance vote in the Delphi survey: number of that there were less than 10 events (readmissions) per prior emergency department presentations (12  months), variable [15]. For logistic regression, the least absolute age, sex, low socioeconomic status, historical knee pro- shrinkage and selection operator was used. For random cedures, depression, diabetes, history of cancer, hyper- forest, models were retrained using only the highest- tension, chronic kidney disease, anaemia, coagulopathy, ranked predictors according to variable importance body mass index, arrhythmia, and peripheral vascular factors. disease. There is also evidence that length of stay is corre - Missing data were considered missing at random lated with the readmission risk [21], but it did not receive except for Veteran’s RAND 12-item health survey (VR- a majority high-importance vote. 12) [16] scores, which were only collected routinely from The Supplementary file, which has its own table of 1 January 2006. Variables with more than 20% missing contents for ease of navigation, contains the full list of data were excluded [17]. For the remaining variables, predictors (Tables S1–S3). Table  S4 contains a list of all k-nearest neighbours imputation (k = 5 nearest neigh- readmission prediction models developed in the primary bours) was used because it is considered adequate for the analysis stage of this study. Table  S5 depicts the amount purpose of prediction [18] and performs well for ≤ 20% of missingness in each variable. missing data [17]. To test the impact of k-nearest neighbours’ imputation Outcome evaluation on model performance, two logistic regression models, The majority of captured readmissions were to the index with the least absolute shrinkage and selection operator, hospital where the TKA procedure took place. How- were trained to evaluate alternative strategies for han- ever, the registry captures some non-index institution dling missingness in the VR-12 variables. One model was readmissions based on patient self-report at the routine trained with all predictors except for the VR-12 variables. six-week follow-up appointment. Details of the data The other model was trained with all predictors using collection and quality control processes carried out to data from 1 January 2006 onwards. Again, testing set data ensure accurate capture of readmissions in accordance were kept separate from training set data in every step of with these criteria have been described previously [10]. the model development and evaluation processes. Gould et al. Arthroplasty (2023) 5:30 Page 4 of 15 Results for primary readmission prediction models Model discrimination was measured using the area The training set performance of all models developed under the receiver operating characteristic curve (AUC- in the main readmission prediction model development ROC) [22]. A perfect classifier has an AUC-ROC of 1, process is contained in Table  S7 (Supplementary file). while random guessing yields an AUC-ROC of 0.5. Comparison of baseline demographics and readmission Calibration was evaluated on the test set by visual rate for variables with ≥ 10% missingness-missing vs. inspection of the calibration curve [23] and numerical non-missing are contained in Tables S8–S10. No varia- evaluation using the Integrated Calibration Index (ICI) bles had > 20% missing data. The initial consultation ran - [24]. A perfectly calibrated model has an ICI of 0. dom forest model achieved an AUC-ROC of 0.617 (95% Two existing 30-day readmission risk prediction mod- CI 0.538–0.696). The discharge random forest model els, LACE + score [25] and Ali et al. [26], were compared achieved an AUC-ROC of 0.692 (95% CI 0.621–0.764). to the bespoke models developed in this study. ROC curves for these models evaluated on the test set A logistic regression model, and a random forest are presented below (Figs.  2 and 3). Variable importance model, trained on all predictors considered throughout factors for these models are contained in Tables S11 and the model development process were also developed and S12, along with training set ROC curves in Figs. S1 and fully evaluated. S2 (Supplementary file). The predictability of the most common causes of read - Calibration curves for these models are presented mission was also compared to the prediction of readmis- below (Figs. 4 and 5). The initial consultation random for - sion as an independent outcome. est model achieved an ICI of 0.031. The discharge random Patients with and without missing data for variables forest model achieved an ICI of 0.019. The appearance of which had ≥ 10% missing data were also compared these calibration curves indicates an overestimation of according to baseline demographics and readmission risk. Precision-recall curves (Figs. S3–S4) and additional rate. performance metrics (Table  S13) are available for these The best-performing model, in terms of discriminative models in the Supplementary file. The best-calibrated performance, at initial consultation and discharge was a readmission prediction model was a logistic regression random forest model trained on systematic review pre- model trained on variables available in the administrative dictors using the combined (registry + administrative) dataset at the initial consultation. These predictors were dataset. These models were fully evaluated in the Results age, sex, hospital admissions and emergency presenta- section. tions in the past 12  months, socioeconomic status, and the number of prior knee procedures. The ROC curve and calibration curve for this model are presented below Analysis (Figs.  6 and 7, respectively). AUC-ROC was 0.589 (95% All statistical analyses were performed using R (v4.1.1) CI 0.506–0.673), and ICI was 0.012. The full performance [27]. The packages used are listed in Table  S6 (Supple - evaluation of this model is available in the Supplemen- mentary file). tary file (Figs. S5–S10, and Tables S14–S17). To test the impact of the selected strategy for handling AUC-ROC of 0.583 (0.545–0.620) was attributed to missing data on model performance, sensitivity analy- LACE + , while 0.563 (0.525–0.602) for its presented in ses were conducted using different strategies for vari - Ali et al. [26]. ICI was exhibited in LACE + and Ali et al. ables with a large proportion of missing data. The initial [26] at 0.642 and 0.100, respectively. A full performance consultation logistic regression model using systematic evaluation has also been performed (in Supplementary review variables had the best calibration of all readmis- file): these previously developed models from prior lit - sion prediction models in this study. erature (Figs. S11–S16, and Tables S18–S21), the random forest model trained on all predictor’s models (Figs. S17– S28, and Tables S22–S28), and the logistic regression Results model trained on all predictors (Figs. S29–S36, Tables Figure  1 depicts a flowchart of patients included in the S29–S34). final analysis cohort. The date range was restricted to Predictor summary statistics for the registry dataset surgeries performed prior to 30 March 2020. not merged with administrative data are contained in the The readmission rate was 6.811%. Tables  1, 2 and 3 Supplementary file (Table  S35), along with the amount contain summary statistics for predictors included in of missingness per variable (Table  S36) and the data- this study. Table  1 contains demographics and patient- set cohort creation flow diagram (Fig. S37). Full perfor - reported variables, Table  2 contains comorbidities, and mance evaluation of the random forest models trained Table 3 contains variables related to healthcare utilisation on this dataset is also contained (Figs. S38–S45, and and the index of hospital admission. G ould et al. Arthroplasty (2023) 5:30 Page 5 of 15 Fig. 1 Cohort generation flow diagram. (SMART registry: St Vincent’s Melbourne Arthroplasty Outcomes registry; KA: Knee Arthroplasty; TKA: Total Knee Arthroplasty; PAS: Patient Administration System) Tables S37–S42). The complication-specific models dem - readmission prediction models, however, discriminative onstrated comparable performance to the readmission performance still falls short of the commonly accepted prediction models, albeit generally with slightly better AUC-ROC threshold of 0.7 [29]. The ROC curve for this discriminative performance. model is presented below (Fig.  8). The best-calibrated We also evaluated the outcome definitions and predic - complication-specific model was a logistic regression tors for each readmission-related complication from the model which achieved an ICI of 0.012, indicating good literature (Table S43), causes of readmission in this study calibration overall, but the calibration curve clearly cohort (Table  S44), further information on outcome shows an underestimation of risk at higher predicted variable generation (Table  S45) and predictor variable probabilities. The calibration curve for this model is pre - generation (Tables S46–S51), predictor variable prepa- sented below (Fig. 9). ration and missingness (Table S52), comparison of base- The training set AUC-ROC for the logistic regression line characteristics for participants with ≥ 10% missing model with all predictors was 0.677 with k-nearest neigh- data for given variable (Table  S53). Baseline character- bours’ imputation, 0.655 for all predictors using data istics were also compared for patients who experienced from 1 January 2006 onwards, and 0.677 for all predictors each complication, and those who did not (Tables S54– except for VR-12 scores. S60). Full performance evaluations for all complication- specific models are also contained (Figs. S46–S83, and Discussion Tables S61–S81). In summary, the discriminative performance of all mod- The model developed using all study predictors for the els was poor, although machine learning models outper- combined outcome variable indicating any complication formed logistic regression to a small degree. However, associated with readmission achieved an AUC-ROC of the logistic regression model trained on administra- 0.658 (0.570–0.746). This was an improvement over the tive data available in the clinical environment, using Gould et al. Arthroplasty (2023) 5:30 Page 6 of 15 Table 1 Summary statistics and comparison between readmitted and non-readmitted patients: Demographics and patient-reported variables Feature Non-readmitted Readmitted P-value cases (n = 3434) cases (n = 251) Demographics Age , mean (SD) 69.539 (8.8) 69.956 (8.9) 0.472 Sex , % female 63.8% 61.0% 0.397 BMI , mean (SD) 33.1 (6.4) 34.3 (7.8) 0.017 Smoking 269 (7.8%) 21 (8.4%) 0.856 Low SES Pensioner card 1702 (49.6%) 152 (60.6%) 0.001 SEIFA score 0.045 1 374 (10.9%) 19 (7.6%) 2 228 (6.6%) 18 (7.2%) 3 273 (8.0%) 11 (4.4%) 4 289 (8.4%) 17 (6.8%) 5 404 (11.8%) 26 (10.4%) 6 264 (7.7%) 19 (7.6%) 7 589 (17.2%) 45 (17.9%) 8 371 (10.8%) 30 (12.0%) 9 450 (13.1%) 41 (16.3%) 10 192 (5.6%) 25 (10.0%) Poor access to post-op care: lives far from hospital, lack of access to allied health support, lack of 0.049 access to telehealth support Major cities in Australia 2645 (77.0%) 208 (82.9%) Inner regional Australia 649 (18.9%) 38 (15.1%) Outer regional or remote Australia 131 (3.8%) 4 (1.6%) Missing 9 (0.3%) 1 (0.4%) Patient-related biopsychosocial: lower education level, poor health literacy, non-English speak- 0.567 ing Interpreter required 567 (16.5%) 37 (14.7%) Missing 31 (0.9%) 5 (2.0%) Patient-reported variables Preoperative patient-reported level of function , mean (SD) Mental function 44.4 (15.1) 42.7 (16.3) 0.133 Physical function 24.7 (7.8) 23.9 (7.5) 0.157 Preoperative patient-reported pain levels 0.330 One 22 (0.6%) 1 (0.4%) Two 125 (3.6%) 6 (2.4%) Three 450 (13.1%) 26 (10.4%) Four 1375 (40.0%) 106 (42.2%) Five 960 (28.0%) 84 (33.5%) Missing 502 (14.6%) 28 (11.2%) Categorical variables were compared using the chi-squared test, or Fisher’s exact test in cases of counts below 10 SD Standard deviation, BMI Body mass index, SEIFA Socioeconomic Indexes for Areas [28], SES socioeconomic status Variable derived from SMART registry Variable derived from the administrative database “How much did pain interfere with your normal work?”: One = Not at all; Two = A little bit; Three = Moderately; Four = Quite a bit; Five = Extremely Continuous variables were compared using Student’s t-test G ould et al. Arthroplasty (2023) 5:30 Page 7 of 15 Table 2 Summary statistics and comparison between readmitted and non-readmitted patients: Comorbidities Feature Non-readmitted cases (n = 3434) Readmitted cases (n = 251) P-value Comorbidities Hypertension 2289 (66.7%) 165 (65.7%) 0.819 Peripheral vascular disease 129 (3.8%) 13 (5.2%) 0.337 Diabetes Diabetes = 757 (22.0%) Diabetes = 70 (27.9%) 0.080 Diabetes with end-organ damage = 15 (0.4%) Diabetes with end-organ dam- age = 1 (0.4%) Coagulopathy 19 (0.6%) 0 0.636 Charlson Comorbidity Index Zero = 1780 (51.8%) Zero = 113 (45.0%) 0.026 One = 956 (27.8%) One = 70 (27.9%) ≥ Two = 698 (20.3%) ≥ Two = 68 (27.1%) CHF 100 (2.9%) 10 (4.0%) 0.334 Liver disease 85 (2.5%) 8 (3.2%) 0.528 Depression 394 (11.5%) 36 (14.3%) 0.206 Previous stroke 209 (6.2%) 21 (8.4%) 0.191 Anaemia 59 (1.7%) 9 (3.6%) 0.047 History of cancer 340 (9.9%) 31 (12.4%) 0.256 High risk of infection: immunocompromised state, 1466 (42.7%) 126 (50.2%) 0.024 active IVDU, infection in other primary joint replacement CKD 125 (3.6%) 16 (6.4%) 0.039 Arrhythmia 18 (0.5%) 0 0.630 Pulmonary disease 174 (5.1%) 23 (9.2%) 0.008 Dementia 12 (0.3%) 0 1 Substance abuse 59 (1.7%) 3 (1.2%) 0.798 Categorical variables were compared using the chi-squared test, or Fisher’s exact test in cases of counts below 10 SD Standard deviation, SEIFA Socioeconomic Indexes for Areas [28], IVDU Intravenous drug use, CKD Chronic kidney disease Variable derived from SMART registry Continuous variables were compared using Student’s t-test systematic review predictors available at an initial con- generally requires a high volume of complex data to fully sultation, was reasonably well calibrated. This is use - unlock its potential [7]. In many cases, deep learning is ful because it suggests that interventions to mitigate or not guaranteed to improve predictive performance com- respond to readmission risk could be implemented at a pared with other modelling techniques [31]. As data cap- much earlier point in time than at discharge following ture continues to expand in orthopaedics, it is possible TKA surgery [30]. These findings are in keeping with there will be improvements in predictive performance, prior literature demonstrating the difficulty of develop - which in turn could improve the quality of a shared ing predictive models capable of distinguishing between clinical decision-making [32]. One thing is clear: artifi - readmitted and non-readmitted patients in various clini- cial intelligence and machine learning are here to stay in cal populations, especially following surgery and specifi - the orthopaedic field [32, 33]. It is important to temper cally TKA [5]. Comparable performance to the primary expectations [34] and focus more on the human interac- model development procedure was achieved in the sen- tion between patient and clinician as they work together sitivity analysis pertaining to different strategies for han - to achieve the best possible surgical outcome [33]. dling missingness in the VR-12 data, providing support Some risk factors were consistently associated with for the use of k-nearest neighbours imputation. readmission. Presented in this section are the pre- One particular type of machine learning which has dictors with the largest regression coefficients in the received a large amount of attention in the literature per- LACE + model and the model developed by Ali et al. [26], taining to the prediction of surgical outcomes, includ- compared with the strongest predictors in the bespoke ing in orthopaedics and knee arthroplasty, specifically, is models developed in this study. In both of these models, the deep learning [4]. This type of machine learning has length of stay and number of prior emergency depart- demonstrated potential in terms of improved discrimina- ment visits were among the top five strongest predictors. tive performance for outcomes post-TKA [4], however, it Length of stay was also consistently among the top five Gould et al. Arthroplasty (2023) 5:30 Page 8 of 15 Table 3 Summary statistics and comparison between readmitted and non-readmitted patients: Healthcare utilisation and index hospital admission Feature Non-readmitted cases Readmitted cases (n = 251) P-value (n = 3434) Prior healthcare utilisation Increasing number of previous admissions Zero = 3223 (93.9%) Zero = 230 (91.6%) 0.234 One = 116 (3.4%) One = 10 (4.0%) Two = 50 (1.5%) Two = 4 (1.6%) ≥ Three = 45 (1.3%) ≥ Three = 7 (2.8%) Number of prior ED presentations (12 months) Zero = 3298 (96.0%) Zero = 234 (93.2%) 0.083 One = 81 (2.4%) One = 11 (4.4%) ≥ Two = 55 (1.6%) ≥ Two = 6 (2.4%) Historical knee procedures Zero = 1234 (35.9%) Zero = 82 (32.7%) < 0.001 One = 1324 (38.6%) One = 77 (30.7%) Two = 760 (22.1%) Two = 51 (20.3%) ≥ Three = 116 (3.4%) ≥ Three = 41 (16.3%) Variables related to index hospital admission In-hospital complication (any) during index admission 477 (13.9%) 76 (30.3%) < 0.001 ICU/HDU admission during index admission Zero = 3303 (96.2%) Zero = 237 (94.4%) 0.109 One = 64 (1.9%) One = 4 (1.6%) ≥ Two = 67 (2.0%) Two = 10 (4.0%) Return to theatre during index admission 10 (29.1%) 6 (2.4%) < 0.001 Length of stay in days, mean (SD) 8.993 (4.4) 11.283 (8.9) < 0.001 Duration of operation in minutes, mean (SD) 119.796 (34.5) 120.928 (36.3) 0.632 Wound class (not clean) 7 (20.4%) 0 1 Transfusion during surgery in number of packed red blood cells, Zero = 3110 (90.6%) Zero = 212 (84.5%) < 0.001 mean (SD) One = 63 (1.8%) One = 5 (1.0%) Two = 206 (6.0%) Two = 21 (8.4%) ≥ Three = 55 (1.6%) ≥ Three = 13 (5.2%) Categorical variables were compared using chi-squared test, or Fisher’s exact test in cases of counts below 10 SD Standard deviation, BMI Body mass index, SEIFA Socioeconomic Indexes for Areas [28], ED Emergency department, ICU Intensive care unit, HDU High dependency unit Variable derived from SMART registry Variable derived from administrative database Continuous variables were compared using Student’s t-test strongest predictors in the models developed in the cur- consultation in this study, with length of stay replacing rent study, while the number of prior emergency depart- it in the model developed using the same model archi- ment visits was one of the strongest predictors in the tecture using predictors available at discharge. Admis- initial consultation administrative database model that sions in the past 12  months, though not specifically exhibited the best overall calibration. Older age was the urgent admissions, was one of the strongest predictors other strongest predictor in the Ali et al. model, and age in the initial consultation administrative database model as a continuous predictor was also among the strongest that achieved the best overall calibration. Male sex was predictors in the random forest models developed in the not a strong predictor in any of the models developed in current study which demonstrated the best discrimi- this study. There were also newly identified predictors native performance. On the other hand, the remaining for readmission: number of historical knee procedures, top predictors in the LACE + model were urgent admis- socioeconomic status, and body mass index (BMI). There sions in the previous year, Charlson Comorbidity Index, was evidence from the systematic review and meta-anal- and the male sex. Charlson Comorbidity Index was also ysis [14] that these risk factors correlated with readmis- among the strongest predictors in the main random for- sion, however, BMI and low socioeconomic status only est model developed using data available at the initial received a majority vote of moderate importance in the G ould et al. Arthroplasty (2023) 5:30 Page 9 of 15 Fig. 2 ROC curve—initial consultation random forest model trained on systematic review predictors using the combined (registry + administrative) dataset Fig. 3 ROC curve—discharge random forest model trained on systematic review predictors using the combined (registry + administrative) dataset Gould et al. Arthroplasty (2023) 5:30 Page 10 of 15 Fig. 4 Calibration curve—initial consultation random forest model trained on systematic review predictors using the combined (registry + administrative) dataset Fig. 5 Calibration curve—discharge random forest model trained on systematic review predictors using the combined (registry + administrative) dataset G ould et al. Arthroplasty (2023) 5:30 Page 11 of 15 Fig. 6 ROC curve—initial consultation logistic regression model using systematic review predictors in the administrative dataset Fig. 7 Calibration curve—initial consultation logistic regression model using systematic review predictors in the administrative dataset Gould et al. Arthroplasty (2023) 5:30 Page 12 of 15 Fig. 8 ROC curve – discharge random forest model using all study predictors in the combined dataset to predict any complication associated with readmission Fig. 9 Calibration curve – initial consultation logistic regression model using all study predictors in the administrative dataset to predict any complication associated with readmission G ould et al. Arthroplasty (2023) 5:30 Page 13 of 15 Delphi survey, and the number of historical knee proce- predicted any complication using all predictors in this dures received a majority low importance vote [20]. study available in the administrative database at the ini- Models trained on all predictors had similar perfor- tial consultation: sex, age, rurality, socioeconomic status, mance to primary study models. This suggests that using number of hospital admissions and emergency presenta- clinical insight instead of purely relying on statistical or tions in the past 12  months, and number of prior knee machine learning predictor selection has value in terms procedures. of increasing clinical relevance/applicability without sac- The most well-calibrated models developed in this study, rificing predictive performance. The model trained only for both readmission prediction and prediction of compli- on clinical registry data also performed similarly to the cations associated with readmission, were developed using primary models developed using both administrative and data captured routinely in the live clinical environment registry data. available at the initial consultation. This facilitates auto - The models developed in prior studies did not perform mated data processing by the predictive model. The result well on the datasets used in this study. These were the can be displayed to the patients and surgeons alongside the LACE + score [25] and the model developed by Ali et al. incidence for the whole cohort of patients at the institution [26]. In accordance with Stessel et  al. [35], compromises to compare the patient’s risk to that of other patients. Well- had to be made when applying these models because not calibrated models that do not have strong discriminative all variables were available in the dataset used for this performance can still be useful in shared decision-making, study and some proxy variables had to be generated based due to their ability to calculate individualised probabilistic on what was available in the datasets used in this study. estimates of readmission [39]. Provided here is an exam- These models performed poorly on discrimination and ple of how the model can be used in the process of shared calibration. These findings are in keeping with prior liter - clinical decision-making. Imagine there is a patient with ature in which bespoke models have outperformed exist- a predicted probability of 0.33 for readmission, using the ing models such as LACE [36]. Important considerations best-calibrated model developed for readmission in this when interpreting the poor performance of these models study. The highest predicted probability calculated by this include the following: the current study was not a formal model is 0.4 (see the x-axis of Fig. 7), so a predicted prob- external validation study, there was incomplete variable ability of 0.33 is towards the higher end of possible individ- availability, and both models were developed outside ualised predicted probabilities. The clinician might opt to Australia (Ali et al. in the UK, LACE + in Canada), the Ali provide the percentage value, 33%, or a natural frequency, et  al.’s model was developed for risk factor identification in this case, 1 in 3, to describe the predicted probability rather than prediction, and the LACE + model was not and explain that this is the proportion of patients just like developed specifically for TKA patients. them who would be readmitted following TKA surgery. The most common causes of readmission were iden - They can inform the patient that this is almost five times tified from prior literature [37, 38]. These were surgical as high as the average readmission rate for the cohort in site infection, venous thromboembolism, joint-specific this study, which was 6.8% or approximately 1 in 15. The complications, gastrointestinal complications, cardiac patient and clinician can then decide whether they believe complications, and infection (non-surgical site). Causes the patient’s discharge planning should include flagging of readmission in this study cohort are listed in Table S44 them for additional follow-up at one or more checkpoints (Supplementary file). These outcome variables were within the 30 days following discharge after TKA surgery generated based on definitions derived from the litera - [40]. The output of calibrated predictive models such as ture and the variables available in the data for each out- that developed in this study should not dictate decisions come category. There are multiple advantages to using made between patient and clinician, but should instead a general readmission prediction model implemented empower both parties in the shared clinical decision-mak- alongside complication-specific models. It enables the ing process which still requires intuition and consideration identification of patients with high readmission risk and of the human elements that cannot be captured by a statis- can provide insight into their risk of specific complica - tical tool [41]. tions. It also facilitates the identification of patients who Strengths of this study include a comprehensive predic- are at high risk for readmission but not for any specific tor selection strategy which involved clinical input and common cause. These readmissions might be unex - machine learning while prioritising model parsimony. The pected from a clinical point of view but nonetheless can model development, internal validation, and evaluation be anticipated and prepared for through post-discharge processes were in line with the guidelines [9]. The models follow-up. In line with the readmission prediction model were bespoke [36] and developed on a well-described and evaluation, the best-calibrated complication prediction diverse clinical population which is demographically rep- model was described. This logistic regression model resentative of the broader Australian TKA population [10]. Gould et al. Arthroplasty (2023) 5:30 Page 14 of 15 Acknowledgements Comprehensive information on the data used, as well as We acknowledge the following contributors: Sharmala Thuraisingam, for information required by readers to apply the models in dif- assisting the first author in understanding various conceptual aspects of pre - ferent clinical settings or replicate this process to develop dictive model development and evaluation; Aaron Stork and Nicolas McInnes for facilitating access to the administrative database; Bede McKenna, Amanda their own bespoke model [42], was provided. The cor - Lee, and Spira Stojanovik for constructing SQL (Structured Query Language) responding author can also be contacted for information queries to extract data from the administrative database. and clarification if necessary. The limitations of this study Authors’ contributions include that this was a single-institution study. The only D.J.G. coordinated the study and drafted the manuscript, with P.F.M.C., M.M.D., way to fully capture non-index institution readmissions T.S., J.A.B., and S.B. providing intellectual content. P.F.M.C., M.M.D., J.A.B., and T.S. would be through linkage to external datasets. The main co-designed the study with D.J.G.. D.J.G., M.M.D., and P.F.M.C. contributed to the data acquisition. T.S. and J.A.B. contributed to the statistical analysis of the data. limitation was that the model does not have strong dis- D.J.G., P.F.M.C., and M.M.D. contributed to the clinical interpretation of the find- criminative performance, therefore it should not be used ings. All authors contributed to revising the manuscript prior to submission and to distinguish between patients perceived to be at high risk have all reviewed and approved the final manuscript. All authors agree to be accountable for all aspects of the manuscript and will work together to ensure of readmission in a binary manner. Rather, it can be used questions relating to the accuracy and integrity of any part of it are appropri- to inform decision-making given it was well-calibrated. ately investigated and resolved. In order to improve the discriminative performance of Funding the model, future work could focus on expanding data No funding was received directly for this study. D.J.G., S.B., and T.S. receive no capture to facilitate the utilisation of strong predictors for funding. P.F.M.C. had the following funding sources to declare: Royalties from readmission or associated complications in this patient Johnson and Johnson, Consultancy with Johnson & Johnson, Consultancy with Stryker Corportation (paid personally); Australian National Health & Medi- population that are currently not captured in the data- cal Research Council Practitioner Fellowship (paid to institution), HCF Founda- bases available for the development of predictive models. tion, BUPA Foundation, St. Vincents Health Australia, Australian Research Before being deployed, the model will need to be pilot Council, (Grant support provided to institution for research unrelated to the current manuscript); Axcelda cartilage regeneration project, Patent applied for tested in the clinical environment to determine whether device, composition of matter and process (institution and personally). M.M.D. it can be implemented into existing workflows. had the following funding sources to declare: National Health and Medical Research Council, HCF Foundation, BUPA Foundation, St. Vincents Health Aus- tralia, Australian Research Council, (Grant support provided to my institution Conclusions for research unrelated to the current manuscript)—Paid to institution. J.A.B. had the following funding sources to declare: National Health and Medical The discriminative performance of the readmission pre - Research Council, Australian Research Council, (Grant support provided to my diction and complication prediction models was poor, institution for research unrelated to the current manuscript); patent applica- although machine learning models had slightly better dis- tion no PCT/AU2020/050926 titled “System and Method for Audio Annotation” Khan, Velloso and Bailey. criminative performance than logistic regression models. The model developed using administrative data available Availability of data and materials at the initial consultation between the patient and ortho- Individual patient data are not publicly available. Requests for additional information can be sent to the corresponding author. paedic surgeon was reasonably well calibrated. Models developed to predict complications commonly associated Declarations with readmission were also reasonably well-calibrated and can be used in conjunction with readmission predic- Ethics approval and consent to participate tion models in shared clinical decision-making. Ethical approval for this study was obtained from the St Vincent’s Hospital Melbourne (SVHM) Human Research Ethics Committee (reference number: HREC/76656/SVHM-2021-272152(v2)). Abbreviations Consent for publication TKA Total knee arthroplasty Not applicable. SMART St Vincent’s Melbourne Arthroplasty Outcomes VR-12 V eteran’s RAND 12-item health survey Competing interests AUC-ROC Ar ea under the receiver operating characteristic curve PC: Royalties from Johnson and Johnson, Consultancy with Johnson & ICI Integrated Calibration Index Johnson; Consultancy with Stryker Corporation; Emeritus Board Member HDU High dependency unit Musculoskeletal Australia; Chair, Research Committee, Australian Orthopaedic LACE Length of stay (L), acuity of the admission (A), comorbidity of the Association (now completed term). MD: Research support paid to my institu- patient (C), and emergency department use in the duration of tion for Investigator Initiated Research from: Medacta, Medibank, HCF Founda- 6 months before admission (E) tion; Chair, Australian Orthopaedic Association Research Foundation, Research BMI Body mass index Advisory Committee. No other competing interests for any authors. Supplementary Information Author details Department of Surgery, St Vincent’s Hospital Melbourne, University The online version contains supplementary material available at https:// doi. of Melbourne, Level 2 Clinical Sciences Building, 29 Regent Street, Fitzroy, VIC org/ 10. 1186/ s42836- 023- 00186-3. 3065, Australia. School of Computing and Information Systems, University of Melbourne, Doug McDonell Building, Parkville, VIC 3052, Australia. School Additional file 1. of Health Sciences and Social Work, Griffith University, Nathan Campus, G ould et al. Arthroplasty (2023) 5:30 Page 15 of 15 Nathan, QLD 4111, Australia. Department of Orthopaedics, St. Vincent’s Hos- 22. Steyerberg EW, Vergouwe Y. Towards better clinical prediction models: pital Melbourne, Level 3/35 Victoria Parade, Fitzroy, VIC 3065, Australia. seven steps for development and an ABCD for validation. Eur Heart J. 2014;35(29):1925–31. Received: 14 December 2022 Accepted: 10 April 2023 23. Austin PC, Steyerberg EW. Graphical assessment of internal and external calibration of logistic regression models by using loess smoothers. Stat Med. 2014;33(3):517–35. 24. Austin PC, Steyerberg EW. The Integrated Calibration Index (ICI) and related metrics for quantifying the calibration of logistic regression mod- els. Stat Med. 2019;38(21):4051–65. References 25. van Walraven C, Wong J, Forster AJ. LACE+ index: extension of a validated 1. Jencks SF, Williams MV, Coleman EA. Rehospitalizations among index to predict early death or urgent readmission after hospital dis- patients in the Medicare fee-for-service program. N Engl J Med. charge using administrative data. Open Med. 2012;6(3):e80. 2009;360(14):1418–28. 26. Ali AM, Loeffler MD, Aylin P, Bottle A. Predictors of 30-day readmission 2. ACSQHC. Avoidable Hospital Readmissions: report on Australian and after total knee arthroplasty: analysis of 566,323 procedures in the United International indicators, their use and the efficacy of interventions to Kingdom. J Arthroplasty. 2019;34(2):242-8.e1. reduce readmissions. Sydney: Australian Commission on Safety and Qual- 27. R Core Team. R: A language and environment for statistical computing. ity in Health Care; 2019. Vienna, Austria: R Foundation for Statistical Computing; 2013. http:// 3. McIlvennan CK, Eapen ZJ, Allen LA. Hospital readmissions reduction www.R- proje ct. org/. program. Circulation. 2015;131(20):1796–803. 28. Statistics ABO. Socio-economic indexes for areas (SEIFA). Canberra: Aus- 4. Lopez CD, Gazgalis A, Boddapati V, Shah RP, Cooper HJ, Geller JA. Artificial tralian Bureau of Statistics; 2011. learning and machine learning decision guidance applications in total 29. Yang S, Berdine G. The receiver operating characteristic (ROC) curve. hip and knee arthroplasty: a systematic review. Arthroplast Today. Southwest Respir Crit Care Chron. 2017;5(19):34–6. 2021;11:103–12. 30. Amarasingham R, Moore BJ, Tabak YP, Drazner MH, Clark CA, Zhang S, 5. Futoma J, Morris J, Lucas J. A comparison of models for predicting early et al. An automated model to identify heart failure patients at risk for hospital readmissions. J Biomed Inform. 2015;56:229–38. 30-day readmission or death using electronic medical record data. Med 6. Ashfaq A, Sant’Anna A, Lingman M, Nowaczyk S. Readmission prediction using Care. 2010;48(11):981–8. deep learning on electronic health records. J Biomed Inform. 2019;97:103256. 31. Christodoulou E, Ma J, Collins GS, Steyerberg EW, Verbakel JY, Van Calster 7. Hinterwimmer F, Lazic I, Suren C, Hirschmann MT, Pohlig F, Rueckert D, B. A systematic review shows no performance benefit of machine Burgkart R, von Eisenhart-Rothe R. Machine learning in knee arthroplasty: learning over logistic regression for clinical prediction models. J Clin specific data are key—a systematic review. Knee Surgery, Sports Trauma- Epidemiol. 2019;110:12–22. tology, Arthroscopy. 2022;30(2):376-88. 32. Younis MU. Impact of artificial intelligence integration on surgical out - 8. Steyerberg EW. Clinical prediction models. CH (Switzerland): Springer come. J Dow Univ Health Sci. 2021;15(2):103–9. Nature Switzerland AG; 2019. https:// doi. org/ 10. 1007/ 978-3- 030- 16399-0. 33. Kumar V, Patel S, Baburaj V, Vardhan A, Singh PK, Vaishya R. Current under- 9. Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of standing on artificial intelligence and machine learning in orthopaedics– a multivariable prediction model for individual prognosis or diagnosis a scoping review. J Orthop. 2022;34:201–6. ( TRIPOD): the TRIPOD statement. J Br Surg. 2015;102(3):148–58. 34. Wellington IJ, Cote MP. Editorial Commentary: Machine Learning in 10. Gould D, Thuraisingam S, Shadbolt C, Knight J, Young J, Schilling C, Orthopaedics: Venturing Into the Valley of Despair. Arthroscopy: The et al. Cohort profile: the St Vincent’s Melbourne Arthroplasty Out - Journal of Arthroscopic & Related Surgery. 2022;38(9):2767-8. comes (SMART ) Registry, a pragmatic prospective database defining 35. Stessel B, Fiddelers AA, Marcus MA, van Kuijk SM, Joosten EA, Peters ML, outcomes in total hip and knee replacement patients. BMJ Open. et al. External validation and modification of a predictive model for acute 2021;11(1):e040408. postsurgical pain at home after day surgery. Clin J Pain. 2017;33(5):405. 11. Refaeilzadeh P, Tang L, Liu H. Cross-validation. Encycl Database Syst. 36. Yu S, Farooq F, Van Esbroeck A, Fung G, Anand V, Krishnapuram B. Predict- 2009;5:532–8. ing readmission risk with institution-specific prediction models. Artif Intell 12. Manning DW, Edelstein AI, Alvi HM. Risk prediction tools for hip and knee Med. 2015;65(2):89–96. arthroplasty. J Am Acad Orthop Surg. 2016;24(1):19–27. 37. Curtis GL, Jawad M, Samuel LT, George J, Higuera-Rueda CA, Little BE, 13. Oosterhoff JH, Gravesteijn BY, Karhade AV, Jaarsma RL, Kerkhoffs GM, Ring et al. Incidence, causes, and timing of 30-day readmission following total D, et al. Feasibility of machine learning and logistic regression algorithms to knee arthroplasty. J Arthroplasty. 2019;34(11):2632–6. predict outcome in orthopaedic trauma surgery. JBJS. 2022;104(6):544–51. 38. Ramkumar PN, Chu C, Harris J, Athiviraham A, Harrington M, White D, 14. Gould D, Dowsey MM, Spelman T, Jo O, Kabir W, Trieu J, et al. Patient- et al. Causes and rates of unplanned readmissions after elective primary related risk factors for unplanned 30-day hospital readmission following total joint arthroplasty: a systematic review and meta-analysis. Am J primary and revision total knee arthroplasty: a systematic review and Orthop. 2015;44(9):397–405. meta-analysis. J Clin Med. 2021;10(1):134. 39. Munn JS, Lanting BA, MacDonald SJ, Somerville LE, Marsh JD, Bryant DM, 15. Pavlou M, Ambler G, Seaman SR, Guttmann O, Elliott P, King M, et al. How et al. Logistic regression and machine learning models cannot discrimi- to develop a more accurate risk prediction model when there are few nate between satisfied and dissatisfied total knee arthroplasty patients. events. BMJ (Clinical research ed). 2015;351:h3868. J Arthroplasty. 2022;37(2):267–73. 16. Kazis LE, Miller DR, Skinner KM, Lee A, Ren XS, Clark JA, et al. Applications 40. Hamar GB, Coberley C, Pope JE, Cottrill A, Verrall S, Larkin S, et al. Eec ff t of of methodologies of the Veterans Health Study in the VA healthcare sys- post-hospital discharge telephonic intervention on hospital readmis- tem: conclusions and summary. J Ambul Care Manag. 2006;29(2):182–8. sions in a privately insured population in Australia. Aust Health Rev. 17. Emmanuel T, Maupong T, Mpoeleng D, Semong T, Mphago B, Tabona O. 2017;42(3):241–7. A survey on missing data in machine learning. J Big Data. 2021;8(1):1–37. 41. Bonner C, Trevena LJ, Gaissmaier W, Han PK, Okan Y, Ozanne E, et al. 18. Choudhury A, Kosorok MR. Missing data imputation for classification Current best practice for presenting probabilities in patient decision aids: problems. arXiv preprint arXiv:200210709. 2020. fundamental principles. Med Decis Making. 2021;41(7):821–33. 19. Van Calster B, Nieboer D, Vergouwe Y, De Cock B, Pencina MJ, Steyerberg 42. Fujimori R, Liu K, Soeno S, Naraba H, Ogura K, Hara K, et al. Acceptance, EW. A calibration hierarchy for risk models was defined: from utopia to barriers, and facilitators to implementing artificial intelligence-based empirical data. J Clin Epidemiol. 2016;74:167–76. decision support systems in emergency departments: quantitative and 20. Gould D, Dowsey M, Spelman T, Bailey J, Bunzli S, Rele S, et al. Established qualitative evaluation. JMIR Form Res. 2022;6(6):e36501. and novel risk factors for 30-day readmission following total knee arthro- plasty: a modified Delphi and focus group study to identify clinically important predictors. J Clin Med. 2023;12(3):747. Publisher’s Note 21. Mahajan SM, Nguyen C, Bui J, Kunde E, Abbott BT, Mahajan AS. Risk fac- Springer Nature remains neutral with regard to jurisdictional claims in pub- tors for readmission after knee arthroplasty based on predictive models: a lished maps and institutional affiliations. systematic review. Arthroplast Today. 2020;6(3):390–404. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Arthroplasty Springer Journals

Predicting 30-day readmission following total knee arthroplasty using machine learning and clinical expertise applied to clinical administrative and research registry data in an Australian cohort

Loading next page...
 
/lp/springer-journals/predicting-30-day-readmission-following-total-knee-arthroplasty-using-iw5pnyFlVo

References (55)

Publisher
Springer Journals
Copyright
Copyright © The Author(s) 2023
eISSN
2524-7948
DOI
10.1186/s42836-023-00186-3
Publisher site
See Article on Publisher Site

Abstract

Background Thirty-day readmission is an increasingly important problem for total knee arthroplasty ( TKA) patients. The aim of this study was to develop a risk prediction model using machine learning and clinical insight for 30-day readmission in primary TKA patients. Method Data used to train and internally validate a multivariable predictive model were obtained from a single tertiary referral centre for TKA located in Victoria, Australia. Hospital administrative data and clinical registry data were utilised, and predictors were selected through systematic review and subsequent consultation with clinicians caring for TKA patients. Logistic regression and random forest models were compared to one another. Calibration was evalu- ated by visual inspection of calibration curves and calculation of the integrated calibration index (ICI). Discriminative performance was evaluated using the area under the receiver operating characteristic curve (AUC-ROC). Results The models developed in this study demonstrated adequate calibration for use in the clinical setting, despite having poor discriminative performance. The best-calibrated readmission prediction model was a logistic regression model trained on administrative data using risk factors identified from systematic review and meta-analysis, which are available at the initial consultation (ICI = 0.012, AUC-ROC = 0.589). Models developed to predict complications associ- ated with readmission also had reasonable calibration (ICI = 0.012, AUC-ROC = 0.658). Conclusion Discriminative performance of the prediction models was poor, although machine learning provided a slight improvement. The models were reasonably well calibrated, meaning they provide accurate patient-specific probabilities of these outcomes. This information can be used in shared clinical decision-making for discharge plan- ning and post-discharge follow up. Keywords Readmission, Total knee arthroplasty, Machine learning, Registry data *Correspondence: Daniel J. Gould daniel.gould@unimelb.edu.au Full list of author information is available at the end of the article © The Author(s) 2023. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. Gould et al. Arthroplasty (2023) 5:30 Page 2 of 15 not planned for any other reason related to the patient’s Introduction comorbidities. Exclusion criteria: revision, unicondylar, Unplanned hospital readmission following total knee and patellofemoral arthroplasty, planned readmissions arthroplasty (TKA) disrupts the patient’s recovery and including admissions to the hospital for other procedures incurs high costs to the healthcare system [1–3]. Read- such as chemotherapy or other planned surgical proce- mission can be predicted according to the patient’s char- dures. Admissions to the rehabilitation unit or “hospital acteristics and a broad range of risk factors [4]. Although in the home” service was also excluded. reasonable accuracy can be achieved in certain clinical settings [5, 6], predicting outcomes, especially readmis- sion, is often difficult in TKA patients [4, 5]. This is due Data processing to many factors, including a lack of data of sufficient Data were randomly split into training (75%) and testing granularity to discriminate between TKA patients who (25%) sets, with testing set data kept separate from train- experience deleterious outcomes and those who have an ing set data in every step of the model development and uncomplicated postoperative course [4, 7]. evaluation processes. A seed was set to ensure reproduc- Machine learning algorithms offer an avenue for ibility in random number generation for reproducibility potential predictive performance gain given their abil- of results and to ensure the test set was not used in any ity to capture complex patterns in the data [4, 7]. This stage of the model training process. A primary TKA pro- technique does not rely on pre-specified relationships cedure was considered a case, with both primary TKA between predictors and outcomes, instead utilising com- surgeries for each individual patient with a bilateral putational and statistical principles to derive patterns TKA grouped into either the training set or testing set. without direct human input. Clinical insight into predic- All models were trained using fivefold cross-validation tor selection can be used in conjunction with machine with 10 repeats [11] to obtain a more stable estimate of a learning to increase the clinical relevance and interpret- training set performance before evaluating models on the ability of the model [8]. testing set. The aim of this study was to develop a clinically appli - The value of machine learning was also explored, in cable multivariable predictive model for 30-day read- terms of its ability to be used in conjunction with evi- mission following TKA, compliant with best practice dence from the literature and clinical insight on read- guidelines [9], to be used in shared decision-making mission risk factors to enhance predictive performance between patient and surgeon. and clinical applicability, and to determine how well these population-level risk factors translated into a risk prediction model for individualised patient prognosti- Materials and methods cation based on specific risk profiles. Knowledge of risk Patient selection factors provides population-level information regarding Inclusion criteria: all primary TKA patients identified patient characteristics that are associated with readmis- in the administrative database at the study hospital for sion, whereas predictive models provide individualised whom data were available in the St Vincent’s Melbourne patient-level probability estimates for the patient’s risk Arthroplasty Outcomes (SMART) registry, including [12]. The value of information available at discharge from simultaneous bilateral procedures, TKA for inflamma - the hospital following TKA surgery was also evaluated in tory arthropathies, and TKA for traumatic aetiologies. terms of its ability to enhance predictive performance. Administrative data are available for use in the live clini- The model was developed with the future intention of cal environment, whereas the SMART registry contains being implemented in a hospital’s existing information additional clinically relevant information not available technology infrastructure, facilitating automatic informa- in the live clinical setting which might improve predic- tion retrieval from the patient’s medical record. However, tive performance. The SMART registry is a prospective the added value of information available in a research registry comprising longitudinal data for TKA and total registry was also explored. hip arthroplasty patients at the study hospital, with 100% There were four considerations for models developed capture of elective procedures. It has been described in in this study: detail previously [10]. Unplanned 30-day readmission was defined as readmission to the hospital for a com - 1) Temporal availability of predictors: initial consulta- plication, or monitoring for a suspected complication, tion with the orthopaedic surgeon, specifically when within 30  days following discharge from the orthopae- TKA surgery is offered to the patient, or immediately dic unit after TKA surgery, for any cause. This included prior to discharge. The rationale for this was that a admission to non-orthopaedic units for any reason that model using data available at the initial consultation was not part of the routine postoperative course or was would allow the maximum amount of time to imple- G ould et al. Arthroplasty (2023) 5:30 Page 3 of 15 ment risk mitigation strategies and discharge plan- A version of the registry dataset was generated with- ning. out merging with the administrative dataset to determine 2) Model architecture: logistic regression or random whether the greater number of events (readmissions) forest. Logistic regression is commonly used for the available in the registry improved the predictive perfor- prediction of binary outcomes in healthcare, and it mance [19]. Whereas the administrative database only is a familiar and intuitive approach for clinicians, but includes data from 1 July 2002 onwards, the registry con- machine learning has the potential to improve pre- tains data from 1998. The predictor selection method and dictive performance in the orthopaedics [13]. Ran- model architecture for the best-performing model overall dom forest and logistic regression have been used in were applied to this registry-only dataset. prior literature on the readmission prediction [5]. Risk factors were selected from the systematic review 3) Dataset availability of predictors: administrative data- and meta-analysis [14] carried out by the authors of this base, or only in the registry. The rationale for this was study for which there was moderate- or high-quality evi- that a model using administrative data could poten- dence and which correlated with the readmission. To tially be integrated into the hospital’s information utilise the knowledge of clinicians [8], a modified Del - technology system for automatic data processing. phi survey and focus group study was carried out [20]. 4) Variable selection method: high importance in Del- Variables selected for the model were those with a high- phi and focus group, high and moderate importance importance vote by a simple majority of ≥ 50%. Predic- in Delphi and focus group, or systematic review tors voted as high-importance in the Delphi survey, predictors. A comparison of the different predictor despite lack of systematic review evidence for it being a selection approaches was included because readmis- readmission risk factor, included the following: preopera- sion is a complex phenomenon with many potentially tive patient-reported pain level, dementia, intensive care influential predictors [14]. Therefore, comparing a unit/high dependency unit admission prior to discharge, variety of approaches increased the likelihood of and return to theatre prior to discharge. Dementia was developing a predictive model with strong predictive the only one of these predictors which has been inves- performance as well as clinical relevance [8]. tigated in the literature, and it did not increase the risk of readmission. The following predictors were correlated with readmission in the literature but did not receive a To reduce overfitting, strategies were employed such high-importance vote in the Delphi survey: number of that there were less than 10 events (readmissions) per prior emergency department presentations (12  months), variable [15]. For logistic regression, the least absolute age, sex, low socioeconomic status, historical knee pro- shrinkage and selection operator was used. For random cedures, depression, diabetes, history of cancer, hyper- forest, models were retrained using only the highest- tension, chronic kidney disease, anaemia, coagulopathy, ranked predictors according to variable importance body mass index, arrhythmia, and peripheral vascular factors. disease. There is also evidence that length of stay is corre - Missing data were considered missing at random lated with the readmission risk [21], but it did not receive except for Veteran’s RAND 12-item health survey (VR- a majority high-importance vote. 12) [16] scores, which were only collected routinely from The Supplementary file, which has its own table of 1 January 2006. Variables with more than 20% missing contents for ease of navigation, contains the full list of data were excluded [17]. For the remaining variables, predictors (Tables S1–S3). Table  S4 contains a list of all k-nearest neighbours imputation (k = 5 nearest neigh- readmission prediction models developed in the primary bours) was used because it is considered adequate for the analysis stage of this study. Table  S5 depicts the amount purpose of prediction [18] and performs well for ≤ 20% of missingness in each variable. missing data [17]. To test the impact of k-nearest neighbours’ imputation Outcome evaluation on model performance, two logistic regression models, The majority of captured readmissions were to the index with the least absolute shrinkage and selection operator, hospital where the TKA procedure took place. How- were trained to evaluate alternative strategies for han- ever, the registry captures some non-index institution dling missingness in the VR-12 variables. One model was readmissions based on patient self-report at the routine trained with all predictors except for the VR-12 variables. six-week follow-up appointment. Details of the data The other model was trained with all predictors using collection and quality control processes carried out to data from 1 January 2006 onwards. Again, testing set data ensure accurate capture of readmissions in accordance were kept separate from training set data in every step of with these criteria have been described previously [10]. the model development and evaluation processes. Gould et al. Arthroplasty (2023) 5:30 Page 4 of 15 Results for primary readmission prediction models Model discrimination was measured using the area The training set performance of all models developed under the receiver operating characteristic curve (AUC- in the main readmission prediction model development ROC) [22]. A perfect classifier has an AUC-ROC of 1, process is contained in Table  S7 (Supplementary file). while random guessing yields an AUC-ROC of 0.5. Comparison of baseline demographics and readmission Calibration was evaluated on the test set by visual rate for variables with ≥ 10% missingness-missing vs. inspection of the calibration curve [23] and numerical non-missing are contained in Tables S8–S10. No varia- evaluation using the Integrated Calibration Index (ICI) bles had > 20% missing data. The initial consultation ran - [24]. A perfectly calibrated model has an ICI of 0. dom forest model achieved an AUC-ROC of 0.617 (95% Two existing 30-day readmission risk prediction mod- CI 0.538–0.696). The discharge random forest model els, LACE + score [25] and Ali et al. [26], were compared achieved an AUC-ROC of 0.692 (95% CI 0.621–0.764). to the bespoke models developed in this study. ROC curves for these models evaluated on the test set A logistic regression model, and a random forest are presented below (Figs.  2 and 3). Variable importance model, trained on all predictors considered throughout factors for these models are contained in Tables S11 and the model development process were also developed and S12, along with training set ROC curves in Figs. S1 and fully evaluated. S2 (Supplementary file). The predictability of the most common causes of read - Calibration curves for these models are presented mission was also compared to the prediction of readmis- below (Figs. 4 and 5). The initial consultation random for - sion as an independent outcome. est model achieved an ICI of 0.031. The discharge random Patients with and without missing data for variables forest model achieved an ICI of 0.019. The appearance of which had ≥ 10% missing data were also compared these calibration curves indicates an overestimation of according to baseline demographics and readmission risk. Precision-recall curves (Figs. S3–S4) and additional rate. performance metrics (Table  S13) are available for these The best-performing model, in terms of discriminative models in the Supplementary file. The best-calibrated performance, at initial consultation and discharge was a readmission prediction model was a logistic regression random forest model trained on systematic review pre- model trained on variables available in the administrative dictors using the combined (registry + administrative) dataset at the initial consultation. These predictors were dataset. These models were fully evaluated in the Results age, sex, hospital admissions and emergency presenta- section. tions in the past 12  months, socioeconomic status, and the number of prior knee procedures. The ROC curve and calibration curve for this model are presented below Analysis (Figs.  6 and 7, respectively). AUC-ROC was 0.589 (95% All statistical analyses were performed using R (v4.1.1) CI 0.506–0.673), and ICI was 0.012. The full performance [27]. The packages used are listed in Table  S6 (Supple - evaluation of this model is available in the Supplemen- mentary file). tary file (Figs. S5–S10, and Tables S14–S17). To test the impact of the selected strategy for handling AUC-ROC of 0.583 (0.545–0.620) was attributed to missing data on model performance, sensitivity analy- LACE + , while 0.563 (0.525–0.602) for its presented in ses were conducted using different strategies for vari - Ali et al. [26]. ICI was exhibited in LACE + and Ali et al. ables with a large proportion of missing data. The initial [26] at 0.642 and 0.100, respectively. A full performance consultation logistic regression model using systematic evaluation has also been performed (in Supplementary review variables had the best calibration of all readmis- file): these previously developed models from prior lit - sion prediction models in this study. erature (Figs. S11–S16, and Tables S18–S21), the random forest model trained on all predictor’s models (Figs. S17– S28, and Tables S22–S28), and the logistic regression Results model trained on all predictors (Figs. S29–S36, Tables Figure  1 depicts a flowchart of patients included in the S29–S34). final analysis cohort. The date range was restricted to Predictor summary statistics for the registry dataset surgeries performed prior to 30 March 2020. not merged with administrative data are contained in the The readmission rate was 6.811%. Tables  1, 2 and 3 Supplementary file (Table  S35), along with the amount contain summary statistics for predictors included in of missingness per variable (Table  S36) and the data- this study. Table  1 contains demographics and patient- set cohort creation flow diagram (Fig. S37). Full perfor - reported variables, Table  2 contains comorbidities, and mance evaluation of the random forest models trained Table 3 contains variables related to healthcare utilisation on this dataset is also contained (Figs. S38–S45, and and the index of hospital admission. G ould et al. Arthroplasty (2023) 5:30 Page 5 of 15 Fig. 1 Cohort generation flow diagram. (SMART registry: St Vincent’s Melbourne Arthroplasty Outcomes registry; KA: Knee Arthroplasty; TKA: Total Knee Arthroplasty; PAS: Patient Administration System) Tables S37–S42). The complication-specific models dem - readmission prediction models, however, discriminative onstrated comparable performance to the readmission performance still falls short of the commonly accepted prediction models, albeit generally with slightly better AUC-ROC threshold of 0.7 [29]. The ROC curve for this discriminative performance. model is presented below (Fig.  8). The best-calibrated We also evaluated the outcome definitions and predic - complication-specific model was a logistic regression tors for each readmission-related complication from the model which achieved an ICI of 0.012, indicating good literature (Table S43), causes of readmission in this study calibration overall, but the calibration curve clearly cohort (Table  S44), further information on outcome shows an underestimation of risk at higher predicted variable generation (Table  S45) and predictor variable probabilities. The calibration curve for this model is pre - generation (Tables S46–S51), predictor variable prepa- sented below (Fig. 9). ration and missingness (Table S52), comparison of base- The training set AUC-ROC for the logistic regression line characteristics for participants with ≥ 10% missing model with all predictors was 0.677 with k-nearest neigh- data for given variable (Table  S53). Baseline character- bours’ imputation, 0.655 for all predictors using data istics were also compared for patients who experienced from 1 January 2006 onwards, and 0.677 for all predictors each complication, and those who did not (Tables S54– except for VR-12 scores. S60). Full performance evaluations for all complication- specific models are also contained (Figs. S46–S83, and Discussion Tables S61–S81). In summary, the discriminative performance of all mod- The model developed using all study predictors for the els was poor, although machine learning models outper- combined outcome variable indicating any complication formed logistic regression to a small degree. However, associated with readmission achieved an AUC-ROC of the logistic regression model trained on administra- 0.658 (0.570–0.746). This was an improvement over the tive data available in the clinical environment, using Gould et al. Arthroplasty (2023) 5:30 Page 6 of 15 Table 1 Summary statistics and comparison between readmitted and non-readmitted patients: Demographics and patient-reported variables Feature Non-readmitted Readmitted P-value cases (n = 3434) cases (n = 251) Demographics Age , mean (SD) 69.539 (8.8) 69.956 (8.9) 0.472 Sex , % female 63.8% 61.0% 0.397 BMI , mean (SD) 33.1 (6.4) 34.3 (7.8) 0.017 Smoking 269 (7.8%) 21 (8.4%) 0.856 Low SES Pensioner card 1702 (49.6%) 152 (60.6%) 0.001 SEIFA score 0.045 1 374 (10.9%) 19 (7.6%) 2 228 (6.6%) 18 (7.2%) 3 273 (8.0%) 11 (4.4%) 4 289 (8.4%) 17 (6.8%) 5 404 (11.8%) 26 (10.4%) 6 264 (7.7%) 19 (7.6%) 7 589 (17.2%) 45 (17.9%) 8 371 (10.8%) 30 (12.0%) 9 450 (13.1%) 41 (16.3%) 10 192 (5.6%) 25 (10.0%) Poor access to post-op care: lives far from hospital, lack of access to allied health support, lack of 0.049 access to telehealth support Major cities in Australia 2645 (77.0%) 208 (82.9%) Inner regional Australia 649 (18.9%) 38 (15.1%) Outer regional or remote Australia 131 (3.8%) 4 (1.6%) Missing 9 (0.3%) 1 (0.4%) Patient-related biopsychosocial: lower education level, poor health literacy, non-English speak- 0.567 ing Interpreter required 567 (16.5%) 37 (14.7%) Missing 31 (0.9%) 5 (2.0%) Patient-reported variables Preoperative patient-reported level of function , mean (SD) Mental function 44.4 (15.1) 42.7 (16.3) 0.133 Physical function 24.7 (7.8) 23.9 (7.5) 0.157 Preoperative patient-reported pain levels 0.330 One 22 (0.6%) 1 (0.4%) Two 125 (3.6%) 6 (2.4%) Three 450 (13.1%) 26 (10.4%) Four 1375 (40.0%) 106 (42.2%) Five 960 (28.0%) 84 (33.5%) Missing 502 (14.6%) 28 (11.2%) Categorical variables were compared using the chi-squared test, or Fisher’s exact test in cases of counts below 10 SD Standard deviation, BMI Body mass index, SEIFA Socioeconomic Indexes for Areas [28], SES socioeconomic status Variable derived from SMART registry Variable derived from the administrative database “How much did pain interfere with your normal work?”: One = Not at all; Two = A little bit; Three = Moderately; Four = Quite a bit; Five = Extremely Continuous variables were compared using Student’s t-test G ould et al. Arthroplasty (2023) 5:30 Page 7 of 15 Table 2 Summary statistics and comparison between readmitted and non-readmitted patients: Comorbidities Feature Non-readmitted cases (n = 3434) Readmitted cases (n = 251) P-value Comorbidities Hypertension 2289 (66.7%) 165 (65.7%) 0.819 Peripheral vascular disease 129 (3.8%) 13 (5.2%) 0.337 Diabetes Diabetes = 757 (22.0%) Diabetes = 70 (27.9%) 0.080 Diabetes with end-organ damage = 15 (0.4%) Diabetes with end-organ dam- age = 1 (0.4%) Coagulopathy 19 (0.6%) 0 0.636 Charlson Comorbidity Index Zero = 1780 (51.8%) Zero = 113 (45.0%) 0.026 One = 956 (27.8%) One = 70 (27.9%) ≥ Two = 698 (20.3%) ≥ Two = 68 (27.1%) CHF 100 (2.9%) 10 (4.0%) 0.334 Liver disease 85 (2.5%) 8 (3.2%) 0.528 Depression 394 (11.5%) 36 (14.3%) 0.206 Previous stroke 209 (6.2%) 21 (8.4%) 0.191 Anaemia 59 (1.7%) 9 (3.6%) 0.047 History of cancer 340 (9.9%) 31 (12.4%) 0.256 High risk of infection: immunocompromised state, 1466 (42.7%) 126 (50.2%) 0.024 active IVDU, infection in other primary joint replacement CKD 125 (3.6%) 16 (6.4%) 0.039 Arrhythmia 18 (0.5%) 0 0.630 Pulmonary disease 174 (5.1%) 23 (9.2%) 0.008 Dementia 12 (0.3%) 0 1 Substance abuse 59 (1.7%) 3 (1.2%) 0.798 Categorical variables were compared using the chi-squared test, or Fisher’s exact test in cases of counts below 10 SD Standard deviation, SEIFA Socioeconomic Indexes for Areas [28], IVDU Intravenous drug use, CKD Chronic kidney disease Variable derived from SMART registry Continuous variables were compared using Student’s t-test systematic review predictors available at an initial con- generally requires a high volume of complex data to fully sultation, was reasonably well calibrated. This is use - unlock its potential [7]. In many cases, deep learning is ful because it suggests that interventions to mitigate or not guaranteed to improve predictive performance com- respond to readmission risk could be implemented at a pared with other modelling techniques [31]. As data cap- much earlier point in time than at discharge following ture continues to expand in orthopaedics, it is possible TKA surgery [30]. These findings are in keeping with there will be improvements in predictive performance, prior literature demonstrating the difficulty of develop - which in turn could improve the quality of a shared ing predictive models capable of distinguishing between clinical decision-making [32]. One thing is clear: artifi - readmitted and non-readmitted patients in various clini- cial intelligence and machine learning are here to stay in cal populations, especially following surgery and specifi - the orthopaedic field [32, 33]. It is important to temper cally TKA [5]. Comparable performance to the primary expectations [34] and focus more on the human interac- model development procedure was achieved in the sen- tion between patient and clinician as they work together sitivity analysis pertaining to different strategies for han - to achieve the best possible surgical outcome [33]. dling missingness in the VR-12 data, providing support Some risk factors were consistently associated with for the use of k-nearest neighbours imputation. readmission. Presented in this section are the pre- One particular type of machine learning which has dictors with the largest regression coefficients in the received a large amount of attention in the literature per- LACE + model and the model developed by Ali et al. [26], taining to the prediction of surgical outcomes, includ- compared with the strongest predictors in the bespoke ing in orthopaedics and knee arthroplasty, specifically, is models developed in this study. In both of these models, the deep learning [4]. This type of machine learning has length of stay and number of prior emergency depart- demonstrated potential in terms of improved discrimina- ment visits were among the top five strongest predictors. tive performance for outcomes post-TKA [4], however, it Length of stay was also consistently among the top five Gould et al. Arthroplasty (2023) 5:30 Page 8 of 15 Table 3 Summary statistics and comparison between readmitted and non-readmitted patients: Healthcare utilisation and index hospital admission Feature Non-readmitted cases Readmitted cases (n = 251) P-value (n = 3434) Prior healthcare utilisation Increasing number of previous admissions Zero = 3223 (93.9%) Zero = 230 (91.6%) 0.234 One = 116 (3.4%) One = 10 (4.0%) Two = 50 (1.5%) Two = 4 (1.6%) ≥ Three = 45 (1.3%) ≥ Three = 7 (2.8%) Number of prior ED presentations (12 months) Zero = 3298 (96.0%) Zero = 234 (93.2%) 0.083 One = 81 (2.4%) One = 11 (4.4%) ≥ Two = 55 (1.6%) ≥ Two = 6 (2.4%) Historical knee procedures Zero = 1234 (35.9%) Zero = 82 (32.7%) < 0.001 One = 1324 (38.6%) One = 77 (30.7%) Two = 760 (22.1%) Two = 51 (20.3%) ≥ Three = 116 (3.4%) ≥ Three = 41 (16.3%) Variables related to index hospital admission In-hospital complication (any) during index admission 477 (13.9%) 76 (30.3%) < 0.001 ICU/HDU admission during index admission Zero = 3303 (96.2%) Zero = 237 (94.4%) 0.109 One = 64 (1.9%) One = 4 (1.6%) ≥ Two = 67 (2.0%) Two = 10 (4.0%) Return to theatre during index admission 10 (29.1%) 6 (2.4%) < 0.001 Length of stay in days, mean (SD) 8.993 (4.4) 11.283 (8.9) < 0.001 Duration of operation in minutes, mean (SD) 119.796 (34.5) 120.928 (36.3) 0.632 Wound class (not clean) 7 (20.4%) 0 1 Transfusion during surgery in number of packed red blood cells, Zero = 3110 (90.6%) Zero = 212 (84.5%) < 0.001 mean (SD) One = 63 (1.8%) One = 5 (1.0%) Two = 206 (6.0%) Two = 21 (8.4%) ≥ Three = 55 (1.6%) ≥ Three = 13 (5.2%) Categorical variables were compared using chi-squared test, or Fisher’s exact test in cases of counts below 10 SD Standard deviation, BMI Body mass index, SEIFA Socioeconomic Indexes for Areas [28], ED Emergency department, ICU Intensive care unit, HDU High dependency unit Variable derived from SMART registry Variable derived from administrative database Continuous variables were compared using Student’s t-test strongest predictors in the models developed in the cur- consultation in this study, with length of stay replacing rent study, while the number of prior emergency depart- it in the model developed using the same model archi- ment visits was one of the strongest predictors in the tecture using predictors available at discharge. Admis- initial consultation administrative database model that sions in the past 12  months, though not specifically exhibited the best overall calibration. Older age was the urgent admissions, was one of the strongest predictors other strongest predictor in the Ali et al. model, and age in the initial consultation administrative database model as a continuous predictor was also among the strongest that achieved the best overall calibration. Male sex was predictors in the random forest models developed in the not a strong predictor in any of the models developed in current study which demonstrated the best discrimi- this study. There were also newly identified predictors native performance. On the other hand, the remaining for readmission: number of historical knee procedures, top predictors in the LACE + model were urgent admis- socioeconomic status, and body mass index (BMI). There sions in the previous year, Charlson Comorbidity Index, was evidence from the systematic review and meta-anal- and the male sex. Charlson Comorbidity Index was also ysis [14] that these risk factors correlated with readmis- among the strongest predictors in the main random for- sion, however, BMI and low socioeconomic status only est model developed using data available at the initial received a majority vote of moderate importance in the G ould et al. Arthroplasty (2023) 5:30 Page 9 of 15 Fig. 2 ROC curve—initial consultation random forest model trained on systematic review predictors using the combined (registry + administrative) dataset Fig. 3 ROC curve—discharge random forest model trained on systematic review predictors using the combined (registry + administrative) dataset Gould et al. Arthroplasty (2023) 5:30 Page 10 of 15 Fig. 4 Calibration curve—initial consultation random forest model trained on systematic review predictors using the combined (registry + administrative) dataset Fig. 5 Calibration curve—discharge random forest model trained on systematic review predictors using the combined (registry + administrative) dataset G ould et al. Arthroplasty (2023) 5:30 Page 11 of 15 Fig. 6 ROC curve—initial consultation logistic regression model using systematic review predictors in the administrative dataset Fig. 7 Calibration curve—initial consultation logistic regression model using systematic review predictors in the administrative dataset Gould et al. Arthroplasty (2023) 5:30 Page 12 of 15 Fig. 8 ROC curve – discharge random forest model using all study predictors in the combined dataset to predict any complication associated with readmission Fig. 9 Calibration curve – initial consultation logistic regression model using all study predictors in the administrative dataset to predict any complication associated with readmission G ould et al. Arthroplasty (2023) 5:30 Page 13 of 15 Delphi survey, and the number of historical knee proce- predicted any complication using all predictors in this dures received a majority low importance vote [20]. study available in the administrative database at the ini- Models trained on all predictors had similar perfor- tial consultation: sex, age, rurality, socioeconomic status, mance to primary study models. This suggests that using number of hospital admissions and emergency presenta- clinical insight instead of purely relying on statistical or tions in the past 12  months, and number of prior knee machine learning predictor selection has value in terms procedures. of increasing clinical relevance/applicability without sac- The most well-calibrated models developed in this study, rificing predictive performance. The model trained only for both readmission prediction and prediction of compli- on clinical registry data also performed similarly to the cations associated with readmission, were developed using primary models developed using both administrative and data captured routinely in the live clinical environment registry data. available at the initial consultation. This facilitates auto - The models developed in prior studies did not perform mated data processing by the predictive model. The result well on the datasets used in this study. These were the can be displayed to the patients and surgeons alongside the LACE + score [25] and the model developed by Ali et al. incidence for the whole cohort of patients at the institution [26]. In accordance with Stessel et  al. [35], compromises to compare the patient’s risk to that of other patients. Well- had to be made when applying these models because not calibrated models that do not have strong discriminative all variables were available in the dataset used for this performance can still be useful in shared decision-making, study and some proxy variables had to be generated based due to their ability to calculate individualised probabilistic on what was available in the datasets used in this study. estimates of readmission [39]. Provided here is an exam- These models performed poorly on discrimination and ple of how the model can be used in the process of shared calibration. These findings are in keeping with prior liter - clinical decision-making. Imagine there is a patient with ature in which bespoke models have outperformed exist- a predicted probability of 0.33 for readmission, using the ing models such as LACE [36]. Important considerations best-calibrated model developed for readmission in this when interpreting the poor performance of these models study. The highest predicted probability calculated by this include the following: the current study was not a formal model is 0.4 (see the x-axis of Fig. 7), so a predicted prob- external validation study, there was incomplete variable ability of 0.33 is towards the higher end of possible individ- availability, and both models were developed outside ualised predicted probabilities. The clinician might opt to Australia (Ali et al. in the UK, LACE + in Canada), the Ali provide the percentage value, 33%, or a natural frequency, et  al.’s model was developed for risk factor identification in this case, 1 in 3, to describe the predicted probability rather than prediction, and the LACE + model was not and explain that this is the proportion of patients just like developed specifically for TKA patients. them who would be readmitted following TKA surgery. The most common causes of readmission were iden - They can inform the patient that this is almost five times tified from prior literature [37, 38]. These were surgical as high as the average readmission rate for the cohort in site infection, venous thromboembolism, joint-specific this study, which was 6.8% or approximately 1 in 15. The complications, gastrointestinal complications, cardiac patient and clinician can then decide whether they believe complications, and infection (non-surgical site). Causes the patient’s discharge planning should include flagging of readmission in this study cohort are listed in Table S44 them for additional follow-up at one or more checkpoints (Supplementary file). These outcome variables were within the 30 days following discharge after TKA surgery generated based on definitions derived from the litera - [40]. The output of calibrated predictive models such as ture and the variables available in the data for each out- that developed in this study should not dictate decisions come category. There are multiple advantages to using made between patient and clinician, but should instead a general readmission prediction model implemented empower both parties in the shared clinical decision-mak- alongside complication-specific models. It enables the ing process which still requires intuition and consideration identification of patients with high readmission risk and of the human elements that cannot be captured by a statis- can provide insight into their risk of specific complica - tical tool [41]. tions. It also facilitates the identification of patients who Strengths of this study include a comprehensive predic- are at high risk for readmission but not for any specific tor selection strategy which involved clinical input and common cause. These readmissions might be unex - machine learning while prioritising model parsimony. The pected from a clinical point of view but nonetheless can model development, internal validation, and evaluation be anticipated and prepared for through post-discharge processes were in line with the guidelines [9]. The models follow-up. In line with the readmission prediction model were bespoke [36] and developed on a well-described and evaluation, the best-calibrated complication prediction diverse clinical population which is demographically rep- model was described. This logistic regression model resentative of the broader Australian TKA population [10]. Gould et al. Arthroplasty (2023) 5:30 Page 14 of 15 Acknowledgements Comprehensive information on the data used, as well as We acknowledge the following contributors: Sharmala Thuraisingam, for information required by readers to apply the models in dif- assisting the first author in understanding various conceptual aspects of pre - ferent clinical settings or replicate this process to develop dictive model development and evaluation; Aaron Stork and Nicolas McInnes for facilitating access to the administrative database; Bede McKenna, Amanda their own bespoke model [42], was provided. The cor - Lee, and Spira Stojanovik for constructing SQL (Structured Query Language) responding author can also be contacted for information queries to extract data from the administrative database. and clarification if necessary. The limitations of this study Authors’ contributions include that this was a single-institution study. The only D.J.G. coordinated the study and drafted the manuscript, with P.F.M.C., M.M.D., way to fully capture non-index institution readmissions T.S., J.A.B., and S.B. providing intellectual content. P.F.M.C., M.M.D., J.A.B., and T.S. would be through linkage to external datasets. The main co-designed the study with D.J.G.. D.J.G., M.M.D., and P.F.M.C. contributed to the data acquisition. T.S. and J.A.B. contributed to the statistical analysis of the data. limitation was that the model does not have strong dis- D.J.G., P.F.M.C., and M.M.D. contributed to the clinical interpretation of the find- criminative performance, therefore it should not be used ings. All authors contributed to revising the manuscript prior to submission and to distinguish between patients perceived to be at high risk have all reviewed and approved the final manuscript. All authors agree to be accountable for all aspects of the manuscript and will work together to ensure of readmission in a binary manner. Rather, it can be used questions relating to the accuracy and integrity of any part of it are appropri- to inform decision-making given it was well-calibrated. ately investigated and resolved. In order to improve the discriminative performance of Funding the model, future work could focus on expanding data No funding was received directly for this study. D.J.G., S.B., and T.S. receive no capture to facilitate the utilisation of strong predictors for funding. P.F.M.C. had the following funding sources to declare: Royalties from readmission or associated complications in this patient Johnson and Johnson, Consultancy with Johnson & Johnson, Consultancy with Stryker Corportation (paid personally); Australian National Health & Medi- population that are currently not captured in the data- cal Research Council Practitioner Fellowship (paid to institution), HCF Founda- bases available for the development of predictive models. tion, BUPA Foundation, St. Vincents Health Australia, Australian Research Before being deployed, the model will need to be pilot Council, (Grant support provided to institution for research unrelated to the current manuscript); Axcelda cartilage regeneration project, Patent applied for tested in the clinical environment to determine whether device, composition of matter and process (institution and personally). M.M.D. it can be implemented into existing workflows. had the following funding sources to declare: National Health and Medical Research Council, HCF Foundation, BUPA Foundation, St. Vincents Health Aus- tralia, Australian Research Council, (Grant support provided to my institution Conclusions for research unrelated to the current manuscript)—Paid to institution. J.A.B. had the following funding sources to declare: National Health and Medical The discriminative performance of the readmission pre - Research Council, Australian Research Council, (Grant support provided to my diction and complication prediction models was poor, institution for research unrelated to the current manuscript); patent applica- although machine learning models had slightly better dis- tion no PCT/AU2020/050926 titled “System and Method for Audio Annotation” Khan, Velloso and Bailey. criminative performance than logistic regression models. The model developed using administrative data available Availability of data and materials at the initial consultation between the patient and ortho- Individual patient data are not publicly available. Requests for additional information can be sent to the corresponding author. paedic surgeon was reasonably well calibrated. Models developed to predict complications commonly associated Declarations with readmission were also reasonably well-calibrated and can be used in conjunction with readmission predic- Ethics approval and consent to participate tion models in shared clinical decision-making. Ethical approval for this study was obtained from the St Vincent’s Hospital Melbourne (SVHM) Human Research Ethics Committee (reference number: HREC/76656/SVHM-2021-272152(v2)). Abbreviations Consent for publication TKA Total knee arthroplasty Not applicable. SMART St Vincent’s Melbourne Arthroplasty Outcomes VR-12 V eteran’s RAND 12-item health survey Competing interests AUC-ROC Ar ea under the receiver operating characteristic curve PC: Royalties from Johnson and Johnson, Consultancy with Johnson & ICI Integrated Calibration Index Johnson; Consultancy with Stryker Corporation; Emeritus Board Member HDU High dependency unit Musculoskeletal Australia; Chair, Research Committee, Australian Orthopaedic LACE Length of stay (L), acuity of the admission (A), comorbidity of the Association (now completed term). MD: Research support paid to my institu- patient (C), and emergency department use in the duration of tion for Investigator Initiated Research from: Medacta, Medibank, HCF Founda- 6 months before admission (E) tion; Chair, Australian Orthopaedic Association Research Foundation, Research BMI Body mass index Advisory Committee. No other competing interests for any authors. Supplementary Information Author details Department of Surgery, St Vincent’s Hospital Melbourne, University The online version contains supplementary material available at https:// doi. of Melbourne, Level 2 Clinical Sciences Building, 29 Regent Street, Fitzroy, VIC org/ 10. 1186/ s42836- 023- 00186-3. 3065, Australia. School of Computing and Information Systems, University of Melbourne, Doug McDonell Building, Parkville, VIC 3052, Australia. School Additional file 1. of Health Sciences and Social Work, Griffith University, Nathan Campus, G ould et al. Arthroplasty (2023) 5:30 Page 15 of 15 Nathan, QLD 4111, Australia. Department of Orthopaedics, St. Vincent’s Hos- 22. Steyerberg EW, Vergouwe Y. Towards better clinical prediction models: pital Melbourne, Level 3/35 Victoria Parade, Fitzroy, VIC 3065, Australia. seven steps for development and an ABCD for validation. Eur Heart J. 2014;35(29):1925–31. Received: 14 December 2022 Accepted: 10 April 2023 23. Austin PC, Steyerberg EW. Graphical assessment of internal and external calibration of logistic regression models by using loess smoothers. Stat Med. 2014;33(3):517–35. 24. Austin PC, Steyerberg EW. The Integrated Calibration Index (ICI) and related metrics for quantifying the calibration of logistic regression mod- els. Stat Med. 2019;38(21):4051–65. References 25. van Walraven C, Wong J, Forster AJ. LACE+ index: extension of a validated 1. Jencks SF, Williams MV, Coleman EA. Rehospitalizations among index to predict early death or urgent readmission after hospital dis- patients in the Medicare fee-for-service program. N Engl J Med. charge using administrative data. Open Med. 2012;6(3):e80. 2009;360(14):1418–28. 26. Ali AM, Loeffler MD, Aylin P, Bottle A. Predictors of 30-day readmission 2. ACSQHC. Avoidable Hospital Readmissions: report on Australian and after total knee arthroplasty: analysis of 566,323 procedures in the United International indicators, their use and the efficacy of interventions to Kingdom. J Arthroplasty. 2019;34(2):242-8.e1. reduce readmissions. Sydney: Australian Commission on Safety and Qual- 27. R Core Team. R: A language and environment for statistical computing. ity in Health Care; 2019. Vienna, Austria: R Foundation for Statistical Computing; 2013. http:// 3. McIlvennan CK, Eapen ZJ, Allen LA. Hospital readmissions reduction www.R- proje ct. org/. program. Circulation. 2015;131(20):1796–803. 28. Statistics ABO. Socio-economic indexes for areas (SEIFA). Canberra: Aus- 4. Lopez CD, Gazgalis A, Boddapati V, Shah RP, Cooper HJ, Geller JA. Artificial tralian Bureau of Statistics; 2011. learning and machine learning decision guidance applications in total 29. Yang S, Berdine G. The receiver operating characteristic (ROC) curve. hip and knee arthroplasty: a systematic review. Arthroplast Today. Southwest Respir Crit Care Chron. 2017;5(19):34–6. 2021;11:103–12. 30. Amarasingham R, Moore BJ, Tabak YP, Drazner MH, Clark CA, Zhang S, 5. Futoma J, Morris J, Lucas J. A comparison of models for predicting early et al. An automated model to identify heart failure patients at risk for hospital readmissions. J Biomed Inform. 2015;56:229–38. 30-day readmission or death using electronic medical record data. Med 6. Ashfaq A, Sant’Anna A, Lingman M, Nowaczyk S. Readmission prediction using Care. 2010;48(11):981–8. deep learning on electronic health records. J Biomed Inform. 2019;97:103256. 31. Christodoulou E, Ma J, Collins GS, Steyerberg EW, Verbakel JY, Van Calster 7. Hinterwimmer F, Lazic I, Suren C, Hirschmann MT, Pohlig F, Rueckert D, B. A systematic review shows no performance benefit of machine Burgkart R, von Eisenhart-Rothe R. Machine learning in knee arthroplasty: learning over logistic regression for clinical prediction models. J Clin specific data are key—a systematic review. Knee Surgery, Sports Trauma- Epidemiol. 2019;110:12–22. tology, Arthroscopy. 2022;30(2):376-88. 32. Younis MU. Impact of artificial intelligence integration on surgical out - 8. Steyerberg EW. Clinical prediction models. CH (Switzerland): Springer come. J Dow Univ Health Sci. 2021;15(2):103–9. Nature Switzerland AG; 2019. https:// doi. org/ 10. 1007/ 978-3- 030- 16399-0. 33. Kumar V, Patel S, Baburaj V, Vardhan A, Singh PK, Vaishya R. Current under- 9. Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of standing on artificial intelligence and machine learning in orthopaedics– a multivariable prediction model for individual prognosis or diagnosis a scoping review. J Orthop. 2022;34:201–6. ( TRIPOD): the TRIPOD statement. J Br Surg. 2015;102(3):148–58. 34. Wellington IJ, Cote MP. Editorial Commentary: Machine Learning in 10. Gould D, Thuraisingam S, Shadbolt C, Knight J, Young J, Schilling C, Orthopaedics: Venturing Into the Valley of Despair. Arthroscopy: The et al. Cohort profile: the St Vincent’s Melbourne Arthroplasty Out - Journal of Arthroscopic & Related Surgery. 2022;38(9):2767-8. comes (SMART ) Registry, a pragmatic prospective database defining 35. Stessel B, Fiddelers AA, Marcus MA, van Kuijk SM, Joosten EA, Peters ML, outcomes in total hip and knee replacement patients. BMJ Open. et al. External validation and modification of a predictive model for acute 2021;11(1):e040408. postsurgical pain at home after day surgery. Clin J Pain. 2017;33(5):405. 11. Refaeilzadeh P, Tang L, Liu H. Cross-validation. Encycl Database Syst. 36. Yu S, Farooq F, Van Esbroeck A, Fung G, Anand V, Krishnapuram B. Predict- 2009;5:532–8. ing readmission risk with institution-specific prediction models. Artif Intell 12. Manning DW, Edelstein AI, Alvi HM. Risk prediction tools for hip and knee Med. 2015;65(2):89–96. arthroplasty. J Am Acad Orthop Surg. 2016;24(1):19–27. 37. Curtis GL, Jawad M, Samuel LT, George J, Higuera-Rueda CA, Little BE, 13. Oosterhoff JH, Gravesteijn BY, Karhade AV, Jaarsma RL, Kerkhoffs GM, Ring et al. Incidence, causes, and timing of 30-day readmission following total D, et al. Feasibility of machine learning and logistic regression algorithms to knee arthroplasty. J Arthroplasty. 2019;34(11):2632–6. predict outcome in orthopaedic trauma surgery. JBJS. 2022;104(6):544–51. 38. Ramkumar PN, Chu C, Harris J, Athiviraham A, Harrington M, White D, 14. Gould D, Dowsey MM, Spelman T, Jo O, Kabir W, Trieu J, et al. Patient- et al. Causes and rates of unplanned readmissions after elective primary related risk factors for unplanned 30-day hospital readmission following total joint arthroplasty: a systematic review and meta-analysis. Am J primary and revision total knee arthroplasty: a systematic review and Orthop. 2015;44(9):397–405. meta-analysis. J Clin Med. 2021;10(1):134. 39. Munn JS, Lanting BA, MacDonald SJ, Somerville LE, Marsh JD, Bryant DM, 15. Pavlou M, Ambler G, Seaman SR, Guttmann O, Elliott P, King M, et al. How et al. Logistic regression and machine learning models cannot discrimi- to develop a more accurate risk prediction model when there are few nate between satisfied and dissatisfied total knee arthroplasty patients. events. BMJ (Clinical research ed). 2015;351:h3868. J Arthroplasty. 2022;37(2):267–73. 16. Kazis LE, Miller DR, Skinner KM, Lee A, Ren XS, Clark JA, et al. Applications 40. Hamar GB, Coberley C, Pope JE, Cottrill A, Verrall S, Larkin S, et al. Eec ff t of of methodologies of the Veterans Health Study in the VA healthcare sys- post-hospital discharge telephonic intervention on hospital readmis- tem: conclusions and summary. J Ambul Care Manag. 2006;29(2):182–8. sions in a privately insured population in Australia. Aust Health Rev. 17. Emmanuel T, Maupong T, Mpoeleng D, Semong T, Mphago B, Tabona O. 2017;42(3):241–7. A survey on missing data in machine learning. J Big Data. 2021;8(1):1–37. 41. Bonner C, Trevena LJ, Gaissmaier W, Han PK, Okan Y, Ozanne E, et al. 18. Choudhury A, Kosorok MR. Missing data imputation for classification Current best practice for presenting probabilities in patient decision aids: problems. arXiv preprint arXiv:200210709. 2020. fundamental principles. Med Decis Making. 2021;41(7):821–33. 19. Van Calster B, Nieboer D, Vergouwe Y, De Cock B, Pencina MJ, Steyerberg 42. Fujimori R, Liu K, Soeno S, Naraba H, Ogura K, Hara K, et al. Acceptance, EW. A calibration hierarchy for risk models was defined: from utopia to barriers, and facilitators to implementing artificial intelligence-based empirical data. J Clin Epidemiol. 2016;74:167–76. decision support systems in emergency departments: quantitative and 20. Gould D, Dowsey M, Spelman T, Bailey J, Bunzli S, Rele S, et al. Established qualitative evaluation. JMIR Form Res. 2022;6(6):e36501. and novel risk factors for 30-day readmission following total knee arthro- plasty: a modified Delphi and focus group study to identify clinically important predictors. J Clin Med. 2023;12(3):747. Publisher’s Note 21. Mahajan SM, Nguyen C, Bui J, Kunde E, Abbott BT, Mahajan AS. Risk fac- Springer Nature remains neutral with regard to jurisdictional claims in pub- tors for readmission after knee arthroplasty based on predictive models: a lished maps and institutional affiliations. systematic review. Arthroplast Today. 2020;6(3):390–404.

Journal

ArthroplastySpringer Journals

Published: Jun 1, 2023

Keywords: Readmission; Total knee arthroplasty; Machine learning; Registry data

There are no references for this article.