Access the full text.
Sign up today, get DeepDyve free for 14 days.
D. Kaplan, S. Yavuz (2019)
An Approach to Addressing Multiple Imputation Model Uncertainty Using Bayesian Model AveragingMultivariate Behavioral Research, 55
Atanu Bhattacharjee (2020)
Longitudinal Data AnalysisBayesian Approaches in Oncology Using R and OpenBUGS
A Gelman (2006)
Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper)Bayesian Analysis, 1
A. Rotnitzky, D. Heitjan, M. Gomes (2009)
Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis by DANIELS, M. J. and HOGAN, J. WBiometrics, 65
R. Little (1995)
Modeling the Drop-Out Mechanism in Repeated-Measures StudiesJournal of the American Statistical Association, 90
H. Thijs, G. Molenberghs, B. Michiels, G. Verbeke, D. Curran (2002)
Strategies to fit pattern-mixture models.Biostatistics, 3 2
D Hedeker, RD Gibbons (1997)
Application of random-effects pattern-mixture models for missing data in longitudinal studiesPsychological Methods, 2
MJ Daniels, JW Hogan (2008)
10.1201/9781420011180Missing data in longitudinal studies: Strategies for Bayesian modelling and sensitivity analysis
Margaret Wu, K. Bailey (1988)
Analysing changes in the presence of informative right censoring caused by death and withdrawal.Statistics in medicine, 7 1-2
R. Little (1994)
A Class of Pattern-Mixture Models for Normal Incomplete DataBiometrika, 81
Margaret Wu, R. Carroll (1988)
Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring processBiometrics, 44
(1964)
Phenothiazine treatment of acute schizophrenia: EffectivenessArchives of General Psychiatry, 10
J. Schafer (1997)
Analysis of Incomplete Multivariate Data
JW Graham, BJ Taylor, AE Olchowski, PE Cumsille (2006)
Planned missing data designs in psychological researchPsychological Methods, 11
R. Little, D. Rubin (2002)
Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data
Wensheng Guo, S. Ratcliffe, T. Have (2004)
A Random Pattern-Mixture Model for Longitudinal Data With DropoutsJournal of the American Statistical Association, 99
GM Fitzmaurice, M Davidian, G Verbeke, M Molenberghs (2008)
10.1201/9781420011579Longitudinal data analysis
R. Little, Linda Yau (1996)
Intent-to-treat analysis for longitudinal studies with drop-outs.Biometrics, 52 4
D. Spiegelhalter, N. Best, B. Carlin, A. Linde (2002)
Bayesian measures of model complexity and fitJournal of the Royal Statistical Society: Series B (Statistical Methodology), 64
(1964)
Phenothiazine Treatment in Acute Schizophrenia
G. Fitzmaurice, N. Laird, J. Ware (2004)
Applied Longitudinal Analysis
JJ Heckman (1976)
The common structure of statistical models of truncation, sample selection, and limited dependent variables and a simple estimator for such modelsAnnals of Economic and Social Measurement, 5
Margaret Wu, Kent Bailey (1989)
Estimation and comparison of changes in the presence of informative right censoring: conditional linear model.Biometrics, 45 3
R. Little (1993)
Pattern-Mixture Models for Multivariate Incomplete DataJournal of the American Statistical Association, 88
AE Gelfand, SE Hills, A Racine-Poon, AFM Smith (1990)
Illustration of Bayesian inference in normal data models using Gibbs samplingJournal of the American Statistical Association, 85
M. Hinne, Q. Gronau, D. Bergh, E. Wagenmakers (2019)
A Conceptual Introduction to Bayesian Model AveragingAdvances in Methods and Practices in Psychological Science, 3
N. Laird (1988)
Missing data in longitudinal studies.Statistics in medicine, 7 1-2
(1967)
Differences in clinical effects in three phenothiazines in acute schizophreniaDiseases of the Nervous System, 28
G. Molenberghs, M. Kenward (2007)
Missing Data in Clinical Studies
JW Hogan, NM Laird (1997)
Mixture models for the joint distribution of repeated measures and event timesStatistics in Medicine, 16
Valid inference can be drawn from a random-effects model for repeated measures that are incomplete if whether the data are missing or not, known as missingness, is independent of the missing data. Data that are missing completely at random or missing at random are two data types for which missingness is ignorable. Given ignorable missingness, statistical inference can proceed without addressing the source of the missing data in the model. If the missingness is not ignorable, however, recommendations are to fit multiple models that represent different plausible explanations of the missing data. A popular choice in methods for evaluating nonignorable missingness is a random-effects pattern-mixture model that extends a random- effects model to include one or more between-subjects variables that represent fixed patterns of missing data. Generally straightforward to implement, a fixed pattern-mixture model is one among several options for assessing nonignorable miss- ingness, and when it is used as the sole model to address nonignorable missingness, understanding the impact of missingness is greatly limited. This paper considers alternatives to a fixed pattern-mixture model for nonignorable missingness that are generally straightforward to fit and encourage researchers to give greater attention to the possible impact of nonignorable missingness in longitudinal data analysis. Patterns of both monotonic and non-monotonic (intermittently) missing data are addressed. Empirical longitudinal psychiatric data are used to illustrate the models. A small Monte Carlo data simulation study is presented to help illustrate the utility of such methods. Keywords Nonignorable missingness · Nonlinear mixed-effects models · Three-level hierarchical models Psychological and behavioural data are gathered at multiple A potentially serious problem with nonignorable missing- points in time to study how variables change or develop. ness is that model inference can be biased, making it essen- Despite efforts to obtain complete data in a longitudinal tial to address the missingness (Little & Rubin, 2002). For study, missing data can sometimes be unavoidable. Further, example, if the data for participants who drop from a study measures can be taken at different times for different sub- tend to differ from those who completed the study, then jects, making the data analysis complex. As random-effects accounting for subject attrition is informative when model- models naturally allow for responses to be observed at dif- ling the longitudinal data. A problem in evaluating the mech- ferent points in time between subjects, the models naturally anism giving rise to missing data, however, is that any model handle missing response data. If some values are missing, applied to empirical data is sensitive to unverifiable assump- valid inference depends on whether the missingness (i.e., tions. Indeed, when drawing inference from a random-effects whether or not the data are missing) is independent of the model, certainty about whether the missingness is ignorable missing response (Laird, 1988). If independent, the missing- or not is problematic because the analyst has access to only ness is ignorable. Missingness is not ignorable, however, if the observed data, but ignorable missingness involves the the missingness is related to the missing data, even if after missing data. Further, the fit of a given model to data that conditioning on model covariates (Little & Rubin, 2002). are not complete is based on how well the model fits the observed data and not the unobserved data, creating a chal- lenge in efforts to assess possible nonignorable missingness. * Shelley A. Blozis For these reasons, recommendations include fitting multiple sablozis@ucdavis.edu models that represent plausible explanations of the missing Department of Psychology, University of California, Davis, data in a given application (Molenberghs & Kenward, 2007). Davis, CA, USA Vol.:(0123456789) 1 3 Behavior Research Methods Importantly, the fit of any model is not testable given that MAR, the missingness is dependent on Y but is independ- oi only observed data are available for analysis. Inference may ent of Y . Under MNAR, the missingness is dependent on mi proceed by making comparisons between models that differ Y , whether or not it is dependent on Y . These mechanisms mi oi in their assumptions about the missing data process, while can be understood by factorization of the joint density in (1). assessing the sensitivity of inference about the longitudinal To that end, factorizations of the joint density based on three process under different but plausible mechanisms for the major modelling frameworks for missing data are reviewed missing data. first before returning to add further clarification to the three missing data mechanisms. Missingness in longitudinal data Modelling frameworks for missing data Several approaches are available for addressing missingness in longitudinal data, and before describing some of the major Three major modelling frameworks for missing data are the selection model, pattern-mixture model and shared parameter frameworks for this, it is useful to introduce notation for the data model of a longitudinal outcome and a separate model model, each distinguished by their factorization of the joint density in (1). For the selection model (cf. Heckman, 1976), for non-response. Consider longitudinal data for a normal outcome variable Y = (Y , …, Y ) , where measures for all i i i1 in f Y , R X , , = f Y X , f R Y , X , , i i i y r i i y i i i r = 1, …, N individuals are planned for n occasions. Interest in Y often concerns how the response depends on time and where the first factor is the marginal density of the longitu- possibly covariates that may vary by occasion or the indi- dinal process that depends on covariates, and the second is vidual, and thus, a data model is generated based on these the marginal density of the missingness process that is con- considerations. Letting X be an n × p matrix that contains ditional on the longitudinal response and covariates. For the study design information (e.g., measures of time when Y pattern-mixture model (cf. Little, 1993, 1994, Little, 1995), was observed and covariates), the multivariate density of Y is conditional on X and a set of unknown parameters i i f Y , R X , , = f Y R , X , f R X , , i i i y r i i i y i i r γ that link Y to X : f(Y | X , γ ). Primary interest gener- y i i i i y ally lies in the inference about the elements contained in where the first factor specifies that the marginal density γ . In a longitudinal study, it is common for measures of of the longitudinal outcome depends on indicators of miss- the outcome variable to have patterns of incomplete data ingness and covariates. The second factor specifies that the that vary between individuals, and so a separate model for missingness depends on covariates but not on the longitu- non-response can be specified to clarify those patterns. Let dinal response. For the shared-parameter model (cf. Wu & R = (R , …, R ) be a set of variables for individual i that Carroll, 1988; Wu & Bailey, 1988, 1989), i 1i ni indicates missingness in the outcome variable at each occa- f Y , R X , , , b = f Y X , , b f R Y , X , , b , i i i y r i i i y i i i i r i sion t, t = 1, …, n, where R = 1 if Y is observed, and R = 0 ti ti ti if Y is missing. In a given problem, several factors could ti where the first factor is the marginal density of Y that affect non-response, including the outcome Y and covari- depends on X and random effect b , and the second is the i i ates. Similar to the data model for Y , a multivariate density marginal density of the missingness process that is condi- of R is defined: f (R | Y , X , γ ), where γ is a set of unknown i i i i r r tional on Y , X and random effect b . Random effects con- i i i parameters that links indicators of non-response to the lon- tained in b vary by individual, such as a random intercept gitudinal response and covariates. Assuming incomplete or random slope of a linear growth model for Y . Clearly, longitudinal data, let Y now be a full data vector that is the factorization of the shared parameter model is based on comprised of a set of observed values Y and missing values oi the selection model with the addition that both factors share Y , where the number of missing and complete values can mi the random effect b . For example, a data model based on a vary between individuals. Taken together, the joint density random-effects growth model that includes a random inter - of Y and R is i i cept and random slope could specify that the missingness also depends on these two random effects. f Y , R X , , . (1) i i i y r Returning to Rubin’s (1976) framework for the three Rubin (1976) provided a framework for three types of missing data mechanisms, the mechanisms can be clarified missing data mechanisms, namely missing completely at using the selection model framework, though it is noted that random (MCAR), missing at random (MAR) and missing the missing data mechanisms are not dependent on a par- not at random (MNAR). Under MCAR, the missingness is ticular framework. Under MCAR, the joint density of Y and independent of the observed and missing values of Y . Under R can be specified as i i 1 3 Behavior Research Methods that, conditional on the pattern, the missingness is ignorable. f Y , R X , , = f Y X , f R X , . i i i y r i i y i i r Pattern indicators can also be used to predict coefficients of a The implication of MCAR is that valid inference from growth model, such as a random slope, similar to how other the data model can be made independent of the missing data subject-level attributes are used to predict characteristics process. Under MAR, the full data vector Y is partitioned of change in a longitudinal response. Again, conditional on into its observed Y and missing Y components, and then the pattern of missing data, the missingness is assumed to oi mi the joint density is be ignorable. Random-effects pattern-mixture models are generally straightforward to estimate using computer soft- f Y , R X , , = f Y X , f R Y , X , , i i i y r i i y i oi i r ware designed to estimate random-effects models and have been shown to be effective in addressing nonignorable miss- such that the missing data process depends on observed ing data in longitudinal research (Fitzmaurice et al., 2008; values of the outcome variable but not those that are miss- Molenberghs & Kenward, 2007). ing. The implication of MAR is that valid inference of the The random-effects pattern-mixture model is widely longitudinal process can be made independent of the missing applied in the behavioural sciences, possibly due to its ease data process provided that all available data are analysed. of application. For instance, in a review of articles that cited Finally, under MNAR, the joint density is Hedeker and Gibbons’ (1997) article on applications of this model, about 200 peer-reviewed articles reported an applica- f Y , R X , , = f Y X , f R Y , X , , i i i y r i i y i i i r tion of a random-effects pattern-mixture to address data that were possibly MNAR. Of those articles, nearly all applied such that the missing data process depends on the full data vector that includes observed and missing values of the a single model that represented the missingness by a sin- gle variable that denoted each subject’s completion status outcome variable. The implication of MNAR is that valid inference of the longitudinal process cannot be made inde- (e.g., a binary indicator of whether or not a study participant completed the planned assessments or a variable equal to pendent of the missing data process. Herein lies the problem of MNAR in that the observed data alone are not enough to the number of assessments completed), and no alternative models to address the missing data were considered. This inform the analyst about the missingness. It is therefore up to the analyst to carefully consider possible mechanisms that is counter, however, to recommendations that researchers consider multiple models that represent plausible explana- may account for missing data in a given problem and to then proceed with an understanding that inferences under these tions about the missing data. Thus, the purpose of this paper is to consider extensions of the random-effects pattern- assumptions depend on the observed, and not the missing, data. mixture model and illustrate their applications in assessing the impact of missing data on the statistical inference of a random-effects model for longitudinal data. Pattern‑mixture models Illness severity in a sample of patients Among the major frameworks for addressing nonignorable missingness in longitudinal data is a random-effects pattern- with schizophrenia mixture model in which one or more between-subject indica- tor variables are created to represent fixed patterns of miss- Prior to describing models aimed to address nonignorable missingness, a longitudinal study is described that moti- ing data and individuals are classified according to these patterns (Hedeker & Gibbons, 1997; Little, 1995; Molen- vates a practice of relying on multiple models that differ in their assumptions about a missing data process and use of berghs & Kenward, 2007). Under the pattern-mixture model factorization of the joint density between the outcome vari- a sensitivity analysis to assess the impact of nonignorable missingness under different mechanisms. The data are first able and indicators of missingness, the longitudinal outcome is dependent on the missingness. Thus, under this frame- analysed by fitting different growth models assuming ignor - able missingness. Using the best-fitting model among those work, the longitudinal response is conditioned on patterns that describe when the longitudinal outcome is observed and considered, models that make different assumptions about the missing data process are applied and compared. when it is missing. Specifically, between-subject indicator variables are created to represent patterns of missing data. The longitudinal study was designed to examine the effects of psychiatric medications in the treatment of men- For example, an indicator variable could represent whether or not a subject has complete data versus any pattern of tal illness in a sample of patients with schizophrenia. The National Institute of Mental Health Schizophrenia Collabo- missing data. In a random-effects pattern-mixture model, pattern effects are fixed. Fixed pattern effects are included in rative Study was a nationwide controlled study of a psy- chopharmacological treatment (phenothiazine treatment) in the model for the longitudinal response with the assumption 1 3 Behavior Research Methods Table 1 Descriptive statistics for mental illness scores (n = 437) Week Proportion with missing Illness severity score data Mean SD Minimum Maximum Placebo Baseline .01 5.4 .83 3 7 Week 1 .03 5.0 1.2 1 7 Week 2 .95 5.8 .57 5 6.5 Week 3 .19 4.7 1.2 1.5 6.5 Week 4 .98 5.5 .71 5 6 Week 5 .98 4.3 2.5 2.5 6 Week 6 .35 4.2 1.4 1.5 6.5 Drug Baseline .01 5.4 .88 2 7 Week 1 .02 4.4 1.2 1 7 Week 2 .97 3.3 1.6 1 6 Week 3 .13 3.8 1.4 1 7 Week 4 .97 2.5 1.3 1 5 Week 5 .98 2.9 1.6 1.5 6 Week 6 .19 3.1 1.4 1 7 acute schizophrenia (National Institute of Mental Health, the placebo or a drug, for week = baseline, 1, 2,…,6. Fig- Psychopharmacology Service Center Collaborative Study ure 1 displays observed trajectories of nine individuals from Group, 1964; National Institute of Mental Health, Psychop- each of the two patient groups. Displays of all cases are in harmacology Research Branch Collaborative Study Group, Hedeker and Gibbons (1997). Individual differences in the 1967). Data were collected from nine public, private and observed trajectories are evident, as is the nonlinearity in the university hospitals. Within hospitals, newly admitted form of change for some patients. patients diagnosed with schizophrenia and who met the study criteria were randomly assigned to one of four study medications, including a placebo, using a double-blind Patterns of missing data assignment process. Patients were first stratified according to sex, and within each sex, were randomly assigned to a According to the study’s description, the plan was to obtain drug treatment. illness measures at baseline and 1, 3 and 6 weeks following Data for the 437 patients reported in Fitzmaurice et al. baseline. For a small number of patients (2–5% of the total (2011) and Hedeker and Gibbons (1997) are studied here. subject count), scores were also obtained at weeks 2, 4 and 5. A global rating of illness severity was measured by Item 79 Given the data collection plan, it seemed reasonable to assume of the Inpatient Multidimensional Psychiatric Scale (IMPS) that missing scores at weeks 2, 4 and 5 for the vast majority of (Lorr & Klett, 1966) and assessed using a 7-point ordinal patients were missing by design, and thus, missing completely scale. The response scale had values denoting the severity of at random. Data missing by design are missing completely at illness as: 1 = normal, not at all ill; 2 = borderline mentally random if the missingness is independent of both the observed ill; 3 = mildly ill; 4 = moderately ill; 5 = markedly ill; 6 = and the missing data (Graham et al., 2006). Assuming miss- severely ill and 7 = among the most extremely ill. Observed ingness in weeks 2, 4 and 5 is ignorable, the analysis proceeds ratings, which include non-integer values falling between here in addressing patterns of missing data with regard to values of the measurement scale, were obtained beginning baseline and weeks 1, 3 and 6, noting that data from all weeks, with a baseline assessment and follow-ups spanning up to as available, are included in the reported analyses. 6 weeks thereafter. Patients who received any psychiatric Indicators of patterns of missing data were generated drug were combined into one group because there were no based on data at baseline and weeks 1, 3 and 6, resulting detectable differences in illness ratings between these groups in nine patterns (see Table 2). Pattern 1 ref lects complete (see Hedeker & Gibbons). The score is analysed here as a data at baseline and weeks 1, 3 and 6 and so corresponds continuous variable. Table 1 provides sample descriptives to a pattern assumed to ref lect ignorable missingness, of illness scores, separately for patients assigned to receive as discussed earlier. Patterns 2, 3 and 7 correspond to 1 3 Behavior Research Methods Fig. 1 IMPS79 Scores for subsamples of nine patients by group (left: drug; right: placebo) Table 2 Patterns of missing data (n = 437) Week Pattern Frequency (expected count) Total Baseline 1 3 6 Placebo Drug X X X X 1 64 (77) 248 (235) 312 X X X . 2 19 (13) 34 (40) 53 X X . . 3 18 (11) 27 (34) 45 X X . X 4 3 (3) 10 (10) 13 X . X X 5 2 (1) 3 (4) 5 . X X X 6 1 (1) 2 (2) 3 X . . . 7 0 (1) 3 (2) 3 X . . X 8 0 (0) 2 (2) 2 X . X . 9 1 (0) 0 (1) 1 patterns of monotonic dropout. Patterns 4, 5, 6 and 8 Growth models assuming ignorable missingness show intermittently missing data. Pattern 9 has a pattern of both intermittently missing data and possible attrition. Different growth models were first fit to the illness ratings Fisher’s exact test of independence between pattern and under the assumption that the missingness was ignorable with treatment group was statistically significant at the .05 the goal of characterizing scores across weeks for the two level (p = .016). Based on the observed and expected patient groups. Let y denote an illness rating at week t for ti cell counts provided in Table 2, the placebo group has patient i. Week was centred to the baseline assessment: week = lower than expected counts of patients with complete 0, 1, …, 6. A drug treatment indicator denoted whether or not data at baseline and weeks 1, 3 and 6, whereas the drug a patient received one of four psychiatric medications (Drug group tends to have lower than expected patient counts = 0 if patient i was given a placebo; Drug = 1 if a patient was for patterns involving missing data. Based on this, it may given a psychiatric drug) and was used to predict each coef- be important to address the missing data when studying ficient of a given growth model at the subject level. Illness differences in illness ratings between the patient groups. ratings were assumed to follow a random-effects model: 1 3 Behavior Research Methods Table 3 Four growth models fitted to illness ratings (n = 437) Growth form Level 1 q Min(ESS) DIC Linear β + β week 7 8769 4378.320 0i 1i ti √ √ 7 8769 4146.871 + week Linear with week 0i 1i ti Quadratic 8 8804 4288.535 + week + week 0i 1i ti 2i ti Exponential β − (β − β ) exp {−β week } 8 8804 4204.464 1i 1i 0i 2i ti Level 2 regressions β = γ + γ Drug + u 0i 00 01 i 0i β = γ + γ Drug + u 1i 10 11 i 1i = + Drug 20 21 i 2i Notes: q is the total number of model parameters. “Min(ESS)” is the lower bound on effective sample size calculated using the R package mcmcse (Flegal et al., 2021). DIC is the deviance information criteria. In each function, ‘week’ is centred to the baseline occasion: week = 0,1…,6. The random coefficients of each model were regressed on Drug at the second level. The residuals of the first two level-2 equations (e.g., u in the level 2 regression of a model’s intercept) could covary with each other. was assumed to be non-randomly varying. The residual vari- 0i 2i ance at level 1 could differ between patient groups in each model fitted: = exp + Drug . 0 1 i ei Level 1: y = f(β , week ) + e To help decipher the information in Table 3, the linear ti i ti ti growth model is described as an example. At level 1, the Level 2: β = g(γ, Drug ) + u i i i growth function is β + β week , and at level 2, each growth 0i 1i ti coefficient is regressed on Drug: β = γ + γ Drug + u and where, at the first level, f (∙) denotes a function of a set of 0i 00 01 i 0i random coefficients in β and the week of assessment, and β = γ + γ Drug + u . The intercept of each level 2 regres- 1i 10 11 i 1i sion is the fixed growth coefficient (intercept and slope, e is the residual. At the second level, g(∙) denotes a vector- ti valued function in which the random coefficients at the first respectively) for the placebo group, and the effect of Drug is the difference in the growth coefficient between the pla - level are regressed on the treatment indicator, Drug , where γ is a set of fixed effects that link the random coefficients cebo and drug group. For example, γ is the expected illness rating for the placebo group at baseline, and γ is the and the treatment indicator; u is the set of residuals corre- i 01 sponding to the level-2 regressions. The conditional random expected baseline difference in ratings between the placebo and drug group. The residual of each level 2 regression is effects u were assumed to be bivariate normal with means equal to zero and variance-covariance matrix Φ : the random effect conditional on Drug. For example, u is 0i the residual corresponding to the subject-specific intercept 0 of the growth model after conditioning on Drug. Although not u u u shown in Table 2, the level 1 residual variance was modelled 1 0 1 using an exponential function to permit heterogeneity of vari- Conditional on the treatment effects, the missingness was ance between treatment groups: = exp + Drug . 0 1 i assumed to be ignorable. Four different growth functions (see Table 3) were fit to model change in ratings: linear growth, quadratic growth, Estimation linear growth using a square root transformation of week, and exponential growth, with all but the linear model used Maximum likelihood and Bayesian estimation are the to address possible nonlinearity in the form of change. The major approaches to the estimation of random-effects mod- quadratic function allows for non-constant change rates, but els. Here, estimation is carried out using PROC MCMC due to the parabolic shape of the function, scores would be for SAS/STAT software version 9.4 . Bayesian estimation expected to decrease and later increase with time. The linear was selected for the current analysis due to the greater flex- function that uses the square root of week changes the time ibility in how a random-effects model may be specified, a metric and helps to linearize the form of the relationship between the outcome and time. The exponential function includes a lower asymptote to capture stability in illness ratings if it is expected that patients would initially show SAS System for Windows. Copyright © 2016 by SAS Institute Inc. SAS and all other SAS Institute Inc. product or service names are improvement by a decreasing trend in their clinical ratings registered trademarks or trademarks of SAS Institute Inc., Cary, NC, and later achieve stability in their illness status. USA. 1 3 Behavior Research Methods feature that will become evident when models for nonignor- criterion (DIC) value. Assuming that a square-root trans- able missingness are described. PROC MCMC is a flexible formation of week provided a more suitable representation simulation procedure for which Bayesian estimation is car- than the exponential growth model that included a lower ried out by repeatedly sampling from a posterior distribution asymptote could be an indication that illness ratings, across using the Markov chain Monte Carlo (MCMC) approach the patient samples, were not tending towards clinical stabi- (for details, see Gelfand et al., 1990). The primary sampling lization. Going forward, the illness ratings are described by mechanism for PROC MCMC is a self-tuned random walk the linear growth model using the square root transformation Metropolis algorithm (Chen, 2009). The samples drawn are of week. used to estimate the posterior marginal distributions from Estimates from models assuming ignorable missingness which statistical inference of the model parameters may be are summarized in the first column of estimates in Table 4. drawn. In fitting a random-effects model using Bayesian Posterior means of fixed effects and posterior medians for methods, the fixed effects and the random effects are treated variance parameters, with 95% HPDI, are reported. Based on as random variables. the estimates, patient groups differed in the expected change Weakly informative prior distributions were specified for over time ( =−0.65, 95% HPDI: (−0.81,−0.49)) but not most of the model parameters. Fixed effects were assumed in their expected baseline levels ( =0.05, 95% HPDI: to have Gaussian priors with mean = 0 and variance = 1000. (−0.14,0.27)). The residual variance of illness ratings was The prior distribution of the intercept of the growth model greater for those assigned to a drug relative to those assigned was restricted to have a lower bound of 1 and an upper bound to the placebo ( =0.28, HPDI: (0.07,0.50)). Between-sub- of 7, given that the illness rating scale was bounded between ject heterogeneity of variance in the expected baseline levels ̂ ̂ 1 and 7. The prior of the variance-covariance matrix of the ( 𝜙 =0.47, HPDI: (0.36, 0.59)) and in the slopes ( 𝜙 =0.32, u u 0 1 random coefficients at the subject level and those at the pat- HPDI: (0.25, 0.39)) is notable. tern level (when applicable) was assumed to be an inverse- Wishart with small degrees of freedom (e.g., quadratic model assumed 3 df). As an exponential model was used Models for nonignorable missingness for the residual variance at the first level, the parameters of the variance model were assumed to have Gaussian pri- As described earlier, about 23.3% of patients are consid- ors with mean = 0 and variance = 1000. A lower bound ered to have (unplanned) incomplete data. Reasons for the for effective sample size (ESS) was calculated using the R missing data are not described in the documentation cited package mcmcse (Flegal et al., 2021) assuming a 95% confi- previously for the National Institute of Mental Health Schiz- dence level. The observed ESS of each model parameter was ophrenia Collaborative Study, so it is reasonable to con- compared to this minimum criterion value. Markov chains sider different scenarios that might account for the sources were run for 10,000,000 iterations with 50,000 burn-in itera- of the missing data, and importantly, study how inferences tions and thinning set to 1000, with thinning done to reduce about the longitudinal process are sensitive to different memory requirements. Specifications were set to meet the assumptions. minimum ESS needed across the planned models. Model The patterns in Table 2 are used to formulate models that convergence was judged by inspecting trace and autocor- represent possible missing data processes. Three of the nine relation plots and meeting the lower bound ESS for a given patterns (patterns 2, 3 and 7) reflect a monotonic dropout model. The posterior mean of fixed effects (assuming sym- pattern, and four others (patterns 4, 5, 6 and 8) reflect inter - metric posterior distributions) and the posterior mode for mittently missing data. The last pattern (pattern 9) reflects variance parameters (assuming non-symmetric posterior dis- intermittently missing data but possible attrition near the tribution), along with 95% highest posterior density intervals end of the planned observation period. Based on the result (HPDI), for parameter estimates are reported. of a Fisher’s exact test of independence between pattern and treatment group, the patterns of missing data may be informative in the analysis of the illness ratings. The models Results: Growth models assuming ignorable considered here use information from the patterns in mul- missingness tiple ways with a goal of representing multiple plausible models for the missing data. The goal in fitting multiple Indices of fit for the four growth models are given in Table 3. The models take into account possible differences between The deviance information criterion (Spiegelhalter et al., 2002) is the treatment groups by including the effects of Drug and calculated using the posterior mean estimates of the model param- assume that the missingness is ignorable. Conditioning on eters where both fixed and random effects are treated as random vari- the effects of Drug, the linear model using a square-root ables; this is in contrast to a marginal information criterion that is transformation of week had the lowest deviance information conditional on the fixed effects alone. 1 3 ̂𝜏 ̂𝛼 ̂𝛼 Behavior Research Methods 1 3 Table 4 Bayesian estimates of a growth model for illness ratings under different missingness mechanisms (n=437) Ignorable missingness Fixed dropout pattern Fixed dropout pattern Random pattern-mix- Random pattern-mix- Timing of dropout and Timing of dropout after averaging effects ture on Pattern 1 ture on Pattern 2 growth as independent depends on growth across patterns processes parameters Parameter M (95% Int) M (95% Int) M (95% Int) M (95% Int) M (95% Int) M (95% Int) M (95% Int) Intercept, γ 5.34 (5.16,5.51) 5.22 (5.02,5.43) 5.29 (5.12,5.47) 5.32 (5.10,5.53) 5.32 (5.06,5.56) 5.34 (5.16,5.51) 5.35 (5.18,5.52) Drug, γ 0.05 (−0.14,0.27) 0.20 (−0.02,0.46) 0.11 (−0.09,0.32) 0.05 (−0.14,0.24) 0.06 (−0.14,0.24) 0.05 (−0.14,0.27) 0.05 (−0.14,0.25) Drop, γ 0.32(-−0.05,0.68) −0.33 (−0.46,−0.19) −0.39 (−0.55,−0.36) −0.33 (−0.48,−0.21) −0.34 (−0.52,−0.15) −0.32 (−0.44,0.25) −0.33 (−0.46,−0.19) −0.34 (−0.48,−0.21) week , γ −0.65 (−0.81,−0.49) −0.65 (−0.81,−0.49) −0.69 (−0.85,−0.53) −0.66 (−0.80,−0.52) −0.66 (−0.83,−0.51) −0.65 (−0.81,−0.49) −0.65 (−0.82,−0.50) week*Drug, γ 0.26(−0.07,0.56) week*Drop, γ WS model for : Intercept, τ −0.81 (−1.01,−0.62) −0.82 (−1.02,−0.63) −0.82 (−1.02,−0.63) −0.74 (−0.95,−0.55) −0.74 (−1.01,−0.61) −0.81 (−1.01,−0.62) −0.81 (−1.00,−0.61) Drug, τ 0.28 (0.07,0.50) 0.28 (0.06,0.50) 0.28 (0.06,0.50) 0.28 (0.06,0.49) 0.28 (0.05,0.50) 0.28 (0.07,0.50) 0.28 (0.06,0.50) BS model: 0.47 (0.36,0.59) 0.47 (0.36,0.58) 0.47 (0.36,0.58) 0.34 (0.21,0.47) 0.34 (0.36,0.58) 0.47 (0.36,0.59) 0.47 (0.36,0.59) −0.03 (−0.09,0.04) −0.04 (−0.10,0.02) −0.04 (−0.10,0.02) 0.04 (−0.03,0.11) 0.04 (−0.09,0.03) −0.03 (−0.09,0.04) −0.03 (−0.09,0.04) u u 1 0 0.32 (0.25,0.39) 0.31 (0.24,0.38) 0.31 (0.24,0.38) 0.22 (0.16,0.29) 0.23 (0.25,0.39) 0.32 (0.25,0.39) 0.32 (0.25,0.39) Random pattern: 0.01 (0.00,0.03) 0.02 (0.00,0.05) −0.00 (−0.02,0.01) −0.00 (−0.03,0.02) v v 1 0 0.01 (0.00,0.03) 0.01 (0.00,0.05) Week-of-last-observation model: Intercept, α 1.63 (1.54,1.72) 1.63 (1.54,1.72) Drug, α 0.13 (0.03,0.24) 0.13 (0.03,0.23) Random intercept, α −0.01 (−0.10,0.07) Random slope, α 0.09 (−0.05,0.22) WS model for : Intercept, κ −1.49 (−1.75,−1.23) −1.47 (−1.75,−1.18) Drug, κ −0.31 (−0.64,−0.02) −0.36 (−0.70,−0.04) Note: M = posterior mean for fixed effects, M = posterior mode for variance parameters. Under the growth model, “WS model” refers to the within-subject residual variance model and “BS model” refers to the between-subject residual variance-covariance model. Under the week-of-last-observation model, “WS model” refers to the within-subject residual variance model. Behavior Research Methods models that reflect plausible missing data processes was to = + Drug + Drop + Drug ∗ Drop + u 0ik 00 01 ik 02 ik 03 ik ik 0ik evaluate if the parameters of the longitudinal model for the illness ratings were sensitive to the assumptions made about = + Drug + Drop + Drug ∗ Drop + u . 1ik 10 11 ik 12 ik 13 ik ik 1ik the missingness. The first model considered is a pattern-mixture random- The coefficients of the growth model are functions of effects model with a single fixed pattern of dropout. This Drug, Drop and their interaction. The residuals u and u 0ik 1ik th model uses the 6 week of observation to indicate whether of the level 2 equations are conditional random effects. As or not a patient provided data at the final planned assess- was done in the model that assumed ignorable missingness ment. Thus, this single indicator groups the 101 individuals and all forthcoming models that assume nonignorable miss- with patterns of monotonic dropout and the one individual ingness, the two random effects could covary. with a combination of intermittently missing data and drop- The growth coefficients were allowed to differ between out patterns. This model assumes that patterns of intermit- groups based on the indicator of dropout. To ease the com- tently missing data are not important and that the effects of parison of estimates from this model to other models, the monotonic dropout, regardless of the timing, do not differ overall population effects are calculated by averaging across from each other. The second and third models are random patterns, weighted by the proportions of subjects within pat- pattern-mixture models that treat the missing data pattern terns (cf. Little, 1993; 1995; Hogan & Laird, 1997): as a random effect. Specifically, the second model uses all Drop=0 Drop=0 Drop=1 Drop=1 = + nine patterns and assumes that monotonic dropout and inter- mittently missing data patterns are important, and the third where γ is a fixed growth coefficient, such as the model’s model uses five of the nine patterns to include only mono- Drop = 0 intercept, π is the population proportion of individu- tonic dropout patterns. Drop = 1 als with no pattern of dropout, and π is the population Subject attrition is common in longitudinal investiga- proportion of individuals with a pattern of dropout. Using tions, and so additional models were specified in which the the sample proportion of patients with a pattern of dropout growth model for illness ratings was estimated jointly with (.2334), estimates of the population averages were obtained a model for the week when the patient was last observed. In for the model’s fixed intercept and the effects of week , the first model, the timing of dropout was regressed on Drug, Drug and the interaction between Drug and week. and in the second model, was additionally regressed on the random intercept and slope of the growth model. Thus, the Random pattern‑mixture model Next, random pattern-mix- latter model links the timing of dropout to the illness ratings ture models were specified. Two pattern sets were tested, through the random growth coefficients that characterize each set assumed to be from a population of missing data change in the illness ratings, and the former assumes the two patterns. The first, Pattern Set 1, included patterns of inter - processes are independent. The second of these two models mittently missing data, patterns of monotonic dropout and is known as a shared parameter model in which coefficients a combination of the two (pattern 9). If intermittently miss- of one model are shared with those of the other model and ing data were missing at random, and thus, ignorable, then where estimation of the two models is done jointly (Albert it would not be important to include those patterns in an & Follman, 2009; Wu & Carroll, 1988). analysis. So, a second set, Pattern Set 2, included only the monotonic pattern of missingness (patterns 2, 3, 7 and 9; as Fixed patternm ‑ ixture model The first model was a random- pattern 9 possibly has a pattern of attrition, it was included effects model with a single fixed-pattern effect. An indicator here as a pattern of dropout). Note that Pattern Set 2 is com- of dropout was assumed to account for differences in the prised of patterns used to make the indicator Dropout, but longitudinal trajectories between those who completed treat- in this model, the pattern effect is assumed to be random, ment and those who did not, defined by whether or not the and as such, the model permits differences in effects due to th patient was observed at the 6 week. The indicator, hence- differences in the timing of when a patient dropped from the forth called Drop, was equal to 1 for patients with patterns study. If a patient was considered to have complete data, then k = 2, 3, 7 and 9 in Table 2 (n = 102 (23.3%) patients) and missingness was assumed to be ignorable and their model otherwise was equal to 0 (see Hedeker & Gibbons, 1997). To was specified by Eqs. (1 ) and (2). Otherwise, patients who fit this model, the model in Eq. (1 ) was extended to include had a pattern of missing data had a longitudinal model that the effect of Drop and its interaction with Drug: included the random patterns effect v and v : 0k 1k y = + week + e tik 0ik 1ik tik tik, y = + week + e , tik 0ik 1ik tik tik where where, at the subject level, 1 3 Behavior Research Methods intercept and slope of the model for illness ratings were = + Drug + u 0ik 00k 01 ik 0ik shared parameters in the model for ln(MaxWeek) : ik = + Drug + u 1ik 10k 11 ik 1ik, y = + week + e tik 0ik 1ik tik tik and at the pattern level, where = + v 00k 00 0k = + Drug + u 0ik 00 01 ik 0ik = + v . 10k 10 1k = + Drug + u , 1ik 10 11 ik 1ik Conditional on a random pattern of missing data, miss- and ingness was assumed to be ignorable. The random pattern ln (MaxWeek) = + Drug + + + , effects v and v were assumed to be bivariate normal with ik 0 1 i 2 0ik 3 1ik ik 0k 1k means equal to zero and variance-covariance matrix Φ : where α and α are the effects of the random intercept β 2 3 0ik and slope β of the longitudinal model on the timing of 1ik = . dropout. Under this model, it is assumed that the timing of v v v 1 0 1 dropout is dependent on the subject-specific aspects of the Estimation of the data model for illness ratings was also longitudinal trajectory. Thus, nonignorable missingness is carried out simultaneously with a model that predicted accounted for through the relationship between the timing the log-transformed (to reduce positive skew) value of the of dropout and aspects of change in the illness ratings that week when a patient was last observed, henceforth called characterize the observed and the missing illness ratings. ln(MaxWeek). The variable MaxWeek might be considered Conditional on the random coefficients β and β , the lon- 0ik 1ik a proxy for the actual time of dropout from the study. A gitudinal response y and the week of the last observation tik higher value of ln(MaxWeek) indicates greater time spent ln(MaxWeek) are independent. Finally, the residual in the ik in the study. As the models that aim to address nonignorable model for the timing of dropout was assumed to be normally missingness take into account a patient’s pattern of missing- distributed with mean equal to 0 and a variance that could ness, the outcome y includes an added subscript k to denote different between treatment groups. Specifically, similar to tik the missing pattern for the individual. The joint model is the model used to represent the residual variance of the presented as growth model for illness ratings, an exponential function was √ used to model the residual variance for the regression of the y = + week + e (1) 2 tik 0ik 1ik tik tik timing of dropout: = exp + Drug . 0 1 i where = + Drug + u 0ik 00 01 ik 0ik Results = + Drug + u 1ik 10 11 ik 1ik, Results from fitting the model that assumed ignorable and missingness (described earlier) and those that assumed a mechanism of nonignorable missingness are summarized ln (MaxWeek) = + Drug + . (2) ik 0 1 i ik in Table 4. The posterior mean for fixed effects and the pos- terior median for variance parameters, with 95% HPDI, are The model for ln(MaxWeek) includes an intercept, α , ik 0 reported for the parameter estimates. For the pattern-mixture a treatment effect, α and the residual of the regression, ε . 1 ik model with a fixed pattern effect that reflected whether or Similar to the residual variance of the model for illness rat- not a patient had complete data, estimates are given for the ings, the residual variance of the model for the last week of model with estimates based on how the model was param- observation was allowed to differ between treatment groups eterized (discussed earlier), as well as a set of estimates for by using an exponential model: = exp + Drug . 0 1 i which the fixed effects are the population-averaged esti- Thus, under this model, the longitudinal process and the mates, as previously discussed. For the random pattern- timing of dropout are assumed to be independent. mixture models, estimates are provided for the model that used Pattern Set 1 to define the pattern effects in which the Shared parameter, random‑effects model Last, a shared effects related to attrition and intermittently missing data, parameter random-effects model was fit in which the random and for the model that used Pattern Set 2 to define the pattern 1 3 Behavior Research Methods effects in which the effects related only to patterns of attri- = + X + X + u , 1i 10 11 1i 12 2i 1i tion. Finally, estimates are provided for the joint model for the longitudinal outcome and the timing of dropout, fol- where γ = 1, γ = 0.5, γ = 1, γ = 2, γ = 0.2 and γ = 0.5. 00 01 02 10 11 12 lowed by estimates of the shared parameter model. Further, X = .5X + e . 1i 2i xi From the table of parameter estimates, it is clear that sim- The residual at the first level was assumed to be inde- ilar conclusions can be drawn about the marginal growth pendent and identically distributed (i.i.d.) normal as 2 2 parameters for the two patient populations. That is, whether e ∼ 0, I , where = 0.3 and I was a 6 × 6 identity i 6 6 e e the missingness is considered to be ignorable or nonignor- matrix. The residuals at the second level were assumed to able, similar conclusions are reached about treatment ee ff cts be independent between subjects and multivariate normal: on the illness trajectories of the two patient groups. The pat- u 0 0i tern-mixture model with a fixed pattern effect showed differ - 0 ∼ mvn , , u 0 1i u u u ences in the expected change between those with complete 1 0 1 data versus those with incomplete data. The average effects, where = 1 , = 0.5 , and = 0.1. u u u u 0 1 1 0 however, yield estimated population parameters that are Missingness in y was generated according to a logistic ti close to the estimates resulting from all other models. This regression model for a set of binary dependent variables that is similar to the results reported in Hedeker and Gibbons represented missing (R = 1) or not missing (R = 0) in y at ti ti ti (1997). Estimates between the two random pattern-mixture waves t = 2,…,6, where missingness depended on the covar- models were comparable whether patterns of intermittently iates X and X . Letting η denote the logit at wave t of the 1i 2i t missing data were included or not when defining the ran- probability that y was missing, η by wave was specified as ti t dom pattern effect, and estimates from these two models were comparable to those from all other models that were =−1 + 0.2X + 0.3X 2 1i 2i fit. Under the shared parameter model in which timing of dropout was regressed on the random intercept and slope of =−1 + 0.4X + 0.6X 3 1i 2i the longitudinal model, dropout was not dependent on the random coefficients of the growth model, a result that also =−1 + 0.8X + 1.2X 4 1i 2i suggests ignorable missingness. =−1 + 1.6X + 2.4X 5 1i 2i A simulation study =−1 + 3.2X + 4.8X . 6 1i 2i To validate the random pattern-mixture model as a viable The probability that y was missing when X = 0 and ti 1i approach to addressing non-ignorable missingness, a small X = 0 was P(1/(1 + exp {1})) = .27, with the probability of 2i simulation study was conducted. A set of 100 data sets was missingness increasing over waves for X = 1 and increased 1i generated for 400 subjects, measured from one to up to six values of X according to the coefficients specified in the 2i occasions (coded as wave = 1,…6) under a random-effects equations relating to the logit. This data-generating model linear growth model for a single normal variable y with a therefore generated both monotonic and non-monotonic subject-level binary covariate X (simulated as X ~N(0, 1) 1i 1i missingness in y . ti with a cutpoint at 0) and a subject-level continuous covari- Four linear growth models, with X as a predictor of the 1i ate X (X ~N(0, 1)). Unlike the covariate X that was used 2i 2i 1i random intercept and slope, were fitted to the simulated data. in the generating model and the models fitted for analysis, The first assumed that the missingness was ignorable. The X was only used in the data-generating model to simulate 2i second model included an indicator variable that denoted an unmeasured covariate that was related to y through the ti whether or not an individual had any pattern of monotonic parameters of the growth model, related to the covariate X , 1i dropout (drop = 1; drop = 0 otherwise). Thus, this model i i and predicted missingness in y . The response y at wave t ti ti ignored patterns of intermittently missing data and assumed for subject i was generated by that the effects of monotonic missingness were equal. The third model included a numeric covariate that denoted the y = + wave − 1 + e ti 0i 1i ti ti timing of monotonic dropout (timing ), with this model also ignoring patterns of intermittently missing data. The where fourth model clustered subjects by pattern of missing data. = + X + X + u 0i 00 01 1i 02 2i 0i For each parameter, the average 95% credible interval is reported along with bias in Table 5. As shown in Table 5, the magnitude of parameter bias is consistently lowest under 1 3 Behavior Research Methods the random pattern-mixture model that captures patterns of with nonignorable missingness. The aim was to show how monotonic and non-monotonic missingness in y . Though to compare estimates of a longitudinal model under a range ti none of the fitted models was the model that generated the of different possible mechanisms of the missing data to data, the random pattern-mixture model provided parameter assess whether inference from the model was sensitive to the estimates of the growth model that were closest to the true assumptions made about the missing data (Daniels & Hogan, parameter values. 2008). A pattern-mixture model was applied in which the pattern of missing data was fixed in one version of the model and assumed to be random in a different version. The version in which the pattern was fixed is the same as one of the mod- Discussion els reported in Hedeker and Gibbons (1997). In the version of a pattern-mixture model that assumed a random pattern Missing data in longitudinal studies are common, making effect, it was possible to account for patterns of intermit- it important in many problems for the analysis to address tently missing data and patterns of attrition. Although a ran- reasons why data are missing, and importantly, how they dom pattern-mixture model has been previously considered may impact inference from a longitudinal model. The issue by Guo et al. (2004) (for a different set of empirical data), is that the data from subjects with missing data may be their application of the model was one in which the random different from those of subjects with complete data. If pattern was designed to capture the effects of subject attri- that is the case, then inference from a longitudinal model tion. The application here proposes use of a random pattern- that assumes ignorable missingness will not reflect the mixture model as a tool for evaluating whether intermittently full population that would include a combination of sub- missing data are possibly nonignorable. jects with either complete or incomplete data. If data are In addition to fitting a pattern-mixture model with either a missing solely by design (Graham et al., 2006), then it is fixed or random pattern effect, a shared parameter model was reasonable to assume that the missingness is ignorable, applied in which the timing of dropout was dependent on leaving no need to also account for the missingness in the random coefficients of the longitudinal model in what is the analysis. If data show patterns of subject attrition, called a shared parameter model. A shared parameter model however, then it is advisable to consider different sce - in the context of missing data is a special case of a class of narios about the source of the missingness. If correlates models known as selection models in which missingness of the missingness or the missing data are available, then is predicted by the observed and the missing values of the such variables can be included in the analysis, such as measured response of the growth models (Albert & Foll- by including correlates as covariates. In other situations man, 2009; Wu & Carroll, 1988). Here, the link between where correlates of the missingness or the missing values missingness and the illness ratings was made through a vari- are not available, then the researcher may consider models able representing the last week a patient was observed and that reflect nonignorable missingness, including the use of the random effects of the growth model that represented the pattern-mixture, selection models and shared parameter illness ratings. Thus, in this nonignorable model, the miss- models. This may be done by specifying a pattern-mixture ingness was specified to be related to both the observed and model with one or more fixed pattern effects that allow the missing values of the primary response through the ran- the marginal effects of a growth model to differ accord- dom coefficients of the growth model. In a selection model, ing to groups defined by a finite number of missing data the assumption is that the parameters of the longitudinal patterns (Hedeker & Gibbons, 1997). Alternatively, the response and dropout are independent after conditioning pattern effect may be random (Guo et al., 2004), an option each on the random effects of a growth model. that may be more suitable to problems involving many Comparisons were made between the estimated fixed patterns of missing data, including patterns reflective of effects of the growth model, and inference about the mar - intermittently missing data and subject attrition. Another ginal growth parameters did not differ greatly between option is to specify a shared parameter model in which the models. Under the random pattern-mixture model in par- missingness depends on the observed and missing values ticular, it did not matter whether the random pattern effect of the measured outcome (Albert & Follman, 2009; Wu included patterns of attrition alone or a combination of pat- & Carroll, 1988). Here, for example, the missingness was terns reflecting attrition and intermittently missing data. represented by the last week that a patient was observed, Thus, including additional patterns to reflect intermittently and the observed and missing values of the measured ill- missing data did not result in a different conclusion about ness ratings were presented by the random coefficients of group-level change in the illness ratings. Inference from the the growth model. marginal longitudinal model also did not differ if dropout Using a set of empirical data from a longitudinal study, was allowed to depend on the random effects of the growth this paper illustrated these major frameworks for dealing model. Thus, conditioning on the drug treatment effects, the 1 3 Behavior Research Methods Table 5 Results from 100 simulated data sets under different assumed missingness data mechanisms (n=400 subjects) Ignorable missingness Single dropout pattern Timing of dropout Random pattern- mixture Parameter True value Bias AIW Bias AIW Bias AIW Bias AIW Intercept, γ 1 −0.24 0.76 −0.42 0.58 −0.37 0.63 0.05 1.05 X , γ 0.5 0.48 0.98 0.20 0.70 0.34 0.84 −0.18 0.32 1 01 X , γ 1 2 02 drop, 1.05 timing 0.22 wave, γ 2 −0.14 1.86 −0.22 1.78 −0.19 1.81 0.02 2.02 X ∗ wave, γ 0.2 0.22 0.42 0.11 0.31 0.16 0.36 −0.10 0.10 1 11 X ∗ wave, γ 0.5 2 12 drop ∗ wave 0.49 timing ∗ wave 0.11 WS model for : Intercept, τ −1.20 −0.01 −1.21 −0.01 −1.21 −0.01 −1.21 −0.01 −1.21 BS model: 1 0.94 1.94 0.73 1.73 0.84 1.84 0.36 1.36 0.1 0.45 0.55 0.36 0.46 0.40 0.50 0.18 0.28 u u 1 0 0.5 0.23 0.73 0.19 0.69 0.20 0.70 0.09 0.59 Random pattern: 0.15 0.07 v v 1 0 0.04 Note: Bias = AIW [average interval width] − true value, where AIW = average 95% credible interval. “WS model” refers to the within-subject residual variance model and “BS model” refers to the between-subject residual variance-covariance model. Bias is not applicable to the effects of ‘drop’ or ‘timing’, as well as the estimated three-level variances and covariance of the random pattern effects, because these parameters were not part of the data-generating model. missingness is arguably ignorable for the marginal aspects of The empirical example presented in this paper was used illness ratings. This is consistent with the conclusions about to illustrate different ways in which one might address non- the missingness for this particular data set that was presented ignorable missingness in a longitudinal data analysis that in Hedeker and Gibbons (1997). (For examples of a sensitiv- uses a random-effects model. Naturally, there are variations ity analysis that does result in differences between models, in the specific models that were tested here, such as those please see Molenberghs & Kenward, 2007). considered for a pattern-mixture model with a fixed pattern The small collection of models considered here are used effect (see Hedeker & Gibbons, 1997). For example, for the as a means to model nonignorable missingness in applica- illness ratings that were analysed in this report, the patterns tions of random-effects models for longitudinal data. The of missing data included those reflective of attrition, as well work here relied on Bayesian estimation, instead of maxi- as those reflective of intermittently missing data. Thus, in mum likelihood estimation that is more common. Bayesian the analysis of this data set, models for nonignorable miss- estimation was used primarily because this approach pro- ingness were specified to reflect both patterns. Finally, it is vides a great deal of flexibility in an analysis, which seemed also important to mention that although different models particularly applicable to the estimation of the pattern-mix- may be considered to represent a missing data process, the ture model that assumed a random pattern effect. If the num- fact that an analysis may suggest that the missingness is ber of patterns of missing data is small, then using Bayesian ignorable does not imply certainty in that conclusion. That estimation can permit testing of a model that assumes that is, the models that one uses to represent a nonignorable the standard deviation of a single random pattern effect fol- process may not capture the true underlying process. For lows a half t distribution (Chen et al., 2016). This kind of this reason, researchers must carefully consider including problem is analogous to fitting three-level models for which additional variables that may be correlated with either the the number of random subjects at the highest level is small missingness or the primary variables of a data model. If (Gelman, 2006). these added variables are correlated with the missingness or 1 3 Behavior Research Methods Supplementary Information The online version contains supplemen- the variables of the data model, then conditioning on their tary material available at https://doi. or g/10. 3758/ s13428- 023- 02128-y . effects may help to account for the missingness (Little & Rubin, 2002). Data availability A dataset and scripts for analyses presented in the This paper focused on some of the major frameworks for study are included as Supplementary Materials. analysing longitudinal data that are MNAR. These methods Declarations represent different ways in which an analyst can model miss- ingness and its possible impact on inference from the sub- Conflicts of interest The author has no relevant financial or non-finan- stantive model that is often the primary interest. An impor- cial interests to disclose. tant shortcoming from the application of any one framework Open Access This article is licensed under a Creative Commons Attri- in which an analyst then chooses to model missingness by a bution 4.0 International License, which permits use, sharing, adapta- single model is that inference from the substantive model is tion, distribution and reproduction in any medium or format, as long done under the assumption that the model for missingness as you give appropriate credit to the original author(s) and the source, generated the missing data. Obviously, this strategy ignores provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are the uncertainty about the true source of the missing data, in included in the article's Creative Commons licence, unless indicated part from only having observed data available for analysis, otherwise in a credit line to the material. If material is not included in but also from considering only a single model to represent the article's Creative Commons licence and your intended use is not the missingness. It is therefore recommended that data be permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a analysed under multiple models of plausible mechanisms copy of this licence, visit http://cr eativ ecommons. or g/licen ses/ b y/4.0/ . of missingness, as was done in this paper, with an under- standing of the likely possibility that no one model of those considered accurately captures the missing data process. One major strategy for handling missing data that was References not considered here is multiple imputation (MI). The central idea of MI is to replace missing values in a data set with a set Albert, P.S., & Follman, D.A. (2009). Shared-parameter models. In: of multiple plausible values from which inference is drawn Ftizmaurice, Verbeke & Molenberghs (ed). Longitudinal data analysis. : Chapman & Hall / CRC Press. pp. 433–452. about the parameters of the marginal model. The process Chen, F. (2009). Bayesian modeling using the MCMC procedure. SAS by which data are imputed is not necessarily dependent on Institute Inc.. the specification of a missingness process, although some Chen, F., Brown, G., & Stokes, M. (2016). Fitting your favorite mixed models with PROC MCMC. SAS Institute Inc.. approaches to using MI have included aspects of a pattern- Daniels, M. J., & Hogan, J. W. (2008). Missing data in longitudinal mixture model to den fi e the imputation model (e.g., Little & studies: Strategies for Bayesian modelling and sensitivity analysis. Yau, 1996; Thijs, Molenberghs, Verbeke, & Curran, 2002). Chapman & Hall. A benefit of MI methods is that they can ease the constraints Fitzmaurice, G. M., Davidian, M., Verbeke, G., & Molenberghs, M. (2008). Longitudinal data analysis. Chapman and Hall. in how the mechanism for missingness is represented in the Fitzmaurice, G., Laird, N., & Ware, J. (2011). Applied longitudinal imputation model, and importantly, permit the inclusion of nd analysis (2 ed.). John Wiley & Sons, Inc. variables that predict missingness in the imputation process. Flegal, J. M., Hughes, J., Vats, D., Dai, N., Gupta, K, & Maji, U. This is helpful in situations in which there is no interest in (2021). Mcmcse: Monte Carlo standard errors for MCMC. R including these particular variables as covariates in a lon- package version 1.5-0. Gelfand, A. E., Hills, S. E., Racine-Poon, A., & Smith, A. F. M. (1990). gitudinal model. That is, these auxiliary variables can pro- Illustration of Bayesian inference in normal data models using vide valuable information about the missing data during the Gibbs sampling. Journal of the American Statistical Association, imputation process but will not interfere with the goals of 85(412), 972–985. https:// doi. org/ 10. 2307/ 22895 94 Gelman, A. (2006). Prior distributions for variance parameters in modelling the longitudinal outcome. hierarchical models (comment on article by Browne and Draper). MI is indeed a Bayesian approach to missing data Bayesian Analysis, 1(3), 515–534. https:// doi. or g/ 10. 1214/ (Schafer, 1997). MI methods have the desirable aspect of 06- BA117A being able to include many auxiliary variables to address Graham, J. W., Taylor, B. J., Olchowski, A. E., & Cumsille, P. E. (2006). Planned missing data designs in psychological research. sources of nonignorable missingness and are naturally not Psychological Methods, 11(4), 323–343. https://doi. or g/10. 1037/ dependent on model specifications regarding particular 1082- 989X. 11.4. 323 missing data processes that are inherent to the frameworks Guo, W., Ratcliffe, S. J., & Ten Have, T. T. (2004). A random pattern- considered in this paper. That said, MI also involves uncer- mixture model for longitudinal data with dropouts. Journal of the American Statistical Association, 99(468), 929–937. https:// doi. tainty in the imputation model itself, and methods have org/ 10. 1198/ 01621 45040 00000 674 been designed to address this (Hinne et al., 2020; Kaplan & Heckman, J. J. (1976). The common structure of statistical models Yavuz, 2020) that might also be applied in accounting for of truncation, sample selection, and limited dependent variables nonignorable missingness in the context of longitudinal data. and a simple estimator for such models. Annals of Economic and Social Measurement, 5(4), 475–492. 1 3 Behavior Research Methods Hedeker, D., & Gibbons, R. D. (1997). Application of random-effects National Institute of Mental Health Psychopharmacology Research pattern-mixture models for missing data in longitudinal studies. Branch Collaborative Study Group. (1967). Differences in clinical Psychological Methods, 2(1), 64–78 https:// psycn et. apa. org/ buy/ effects in three phenothiazines in acute schizophrenia. Diseases of 1997- 07778- 004 the Nervous System, 28, 369–383. Hinne, M., Gronau, Q. F., van den Bergh, D., & Wagenmakers, E.-J. Rubin, D. B. (1976). Inference and missing data. Biometrika, 63(3), (2020). A conceptual introduction to Bayesian model averaging. 581–592. https:// doi. org/ 10. 2307/ 23357 39 Advances in Methods and Practices in Psychological Science, Schafer, J. L. (1997). Analysis of incomplete multivariate data. Chap- 3(2), 200–215. man and Hall/CRC. Hogan, J. W., & Laird, N. M. (1997). Mixture models for the joint Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & Van der Linde, A. distribution of repeated measures and event times. Statistics in (2002). Bayesian measures of model complexity and fit. Journal Medicine, 16(3), 239–258. https:// doi.or g/ 10. 1002/ (SICI) 1097- of the Royal Statistical Society, Series B, 64, 583–616. 0258(19970 215) Thijs, H., Molenberghs, G., Michiels, B., Verbeke, G., Curran, D. Kaplan, D., & Yavuz, S. (2020). An approach to addressing multiple (2002). Strategies to fit pattern‐ mixture models. Biostatistics, imputation model uncertainty using Bayesian model averaging. 3(2), 245–265. https:// doi. org/ 10. 1093/ biost atist ics/3. 2. 245 Multivariate Behavioral Research, 55(4), 553–567. https:// doi. Wu, M. C., & Bailey, K. (1988). Analyzing changes in the presence org/ 10. 1080/ 00273 171. 2019. 16577 90 of informative right censoring caused by death and withdrawal. Laird, N. M. (1988). Missing data in longitudinal studies. Statistics in Statistics in Medicine, 7(1-2), 337–346. https://d oi. org/ 10. 1002/ Medicine, 7(1-2), 305–315. https://doi. or g/10. 1002/ sim. 47800 70131 sim. 47800 70134 Little, R. J. A. (1993). Pattern-mixture models for multivariate incom- Wu, M. C., & Bailey, K. R. (1989). Estimation and comparison of plete data. Journal of the American Statistical Association, 88(421), changes in the presence of informative right censoring: Condi- 125–134. https:// doi. org/ 10. 1080/ 01621 459. 1993. 10594 302 tional linear model. Biometrics, 45(3), 939–955. https:// doi. org/ Little, R. J. A. (1994). A class of pattern-mixture models for normal 10. 2307/ 25316 94 incomplete data. Biometrika, 81(3), 471–483. Wu, M. C., & Carroll, R. J. (1988). Estimation and comparison of Little, R. J. A. (1995). Modeling the drop-out mechanism in longitudinal changes in the presence of informative right censoring by mod- studies. Journal of the American Statistical Association, 90(431), eling the censoring process. Biometrics, 44(1), 175–188. https:// 1112–1121. https:// doi. org/ 10. 1080/ 01621 459. 1995. 10476 615doi. org/ 10. 2307/ 25319 05 Little, R., & Yau, L. (1996). Intent-to-treat analysis for longitudinal studies with drop-outs. Biometrics, 52(4), 1324–1333. https://doi. Open Practices Statement: The data set used in the examples is org/ 10. 2307/ 25328 47 available in Fitzmaurice et al. (2011) and Hedeker and Gibbons (1997) Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing and as a supplemental file. Scripts to analyse the empirical data and to nd data (2 ed). : John Wiley & Sons, Inc. generate and analyse the simulated data are available as supplemental Lorr, M., & Klett, C. J. (1966). Inpatient multidimensional psychiatric files. scale: Manual. Consulting Psychologists Press. Molenberghs, G., & Kenward, M. G. (2007). Missing data in clinical Publisher’s note Springer Nature remains neutral with regard to studies. : John Wiley & Sons, Ltd. jurisdictional claims in published maps and institutional affiliations. National Institute of Mental Health Psychopharmacology Service Center Collaborative Study Group. (1964). Phenothiazine treatment of acute schizophrenia: Effectiveness. Archives of General Psychiatry, 10, 246–261 https:// pubmed. ncbi. nlm. nih. gov/ 14089 354/ 1 3
Behavior Research Methods – Springer Journals
Published: May 23, 2023
Keywords: Nonignorable missingness; Nonlinear mixed-effects models; Three-level hierarchical models
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.