Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

The World Health Organization Composite International Diagnostic Interview short‐form (CIDI‐SF)

The World Health Organization Composite International Diagnostic Interview short‐form (CIDI‐SF) RONALD C. KESSLER, Department of Health Care Policy, Harvard Medical School, Boston, MA, USA GAVIN ANDREWS, Clinical Research Unit for Anxiety Disorders, University of NSW at St Vincent’s Hospital, Sydney, Australia DANIEL MROCZEK, Department of Psychology, Fordham University, Bronx, NY, USA BEDIRHAN USTUN, Division of Epidemiology, Classification and Assessment, World Health Organization, Geneva, Switzerland HANS-ULRICH WITTCHEN, Max Planck Institute of Psychiatry, Munich, Germany ABSTRACT Data are reported on a series of short-form (SF) screening scales of DSM-III-R psychiatric disorders developed from the World Health Organization’s Composite International Diagnostic Interview (CIDI). A multi-step procedure was used to generate CIDI-SF screening scales for each of eight DSM disorders from the US National Comorbidity Survey (NCS). This procedure began with the subsample of respondents who endorsed the CIDI diagnostic stem question for a given disorder and then used a series of stepwise regression analyses to select a subset of screening questions to maximize reproduction of the full CIDI diagnosis. A small number of screening questions, between three and eight for each disorder, was found to account for the significant associations between symptom ratings and CIDI diagnoses. Summary scales made up of these symptom questions correctly classify between 77% and 100% of CIDI cases and between 94% and 99% of CIDI non-cases in the NCS depending on the diagnosis. Overall classification accuracy ranged from a low of 93% for major depressive episode to a high of over 99% for generalized anxiety disorder. Pilot testing in a nationally representative telephone survey found that the full set of CIDI-SF scales can be administered in an average of seven minutes compared to over an hour for the full CIDI. The results are quite encouraging in suggesting that diagnostic classifications made in the full CIDI can be reproduced with excellent accuracy with the CIDI-SF scales. Independent verification of this reproduction accuracy, however, is needed in a data set other than the one in which the CIDI-SF was developed. Key words: Composite International Diagnostic Interior (CIDI), psychiatric diagnosis, diagnostic interviews, psychometrics, screening scales This report introduces the World Health Organization (WHO) Composite International Diagnostic Interview (CIDI) Short-Form (CIDI-SF), a fully structured set of scales developed from the larger CIDI (WHO, 1990). The goal of the CIDI-SF is to provide a quick screen (average administration time of 10 minutes) for the most commonly occurring psychiatric disorders assessed in the CIDI. The CIDI-SF was originally developed for use in the redesigned US National Health Interview Survey (NHIS; Adams and Morano, 1995), a large annual survey carried out in the United States under the auspices of the US government’s National Center for Health Statistics (NCHS). Prior to a major redesign scheduled for implementation before the year 2000, the NHIS made no consistent effort to assess psychiatric disorders. However, such an assessment was planned for inclusion in the redesigned NHIS if brief and accurate diagnosis-specific screening measures could be developed. The CIDI-SF, which was developed with support from the NCHS as part of the NHIS redesign effort, meets these requirements. The original version of the CIDI-SF was designed to screen for DSM-III-R (APA, 1987) disorders. A subsequent revision of the CIDI-SF was made to generate DSM-IV (APA, 1994) diagnoses. This revision was carried out by Andrews and Kessler et al. not clinicians. Although the CIDI is capable of generating diagnoses according to the definitions and criteria of both the DSM and ICD systems, our initial work focused on DSM-III-R diagnoses. We attempted to develop short-form scales for eight CIDI syndromes: major depressive episodes (MDE), generalized anxiety disorder (GAD), simple phobia (SiP), social phobia (SoP), agoraphobia with or without panic (AG), panic attacks (PA), alcohol dependence (AD), and drug dependence (DD). Both WHO Field Trials of the CIDI (Wittchen, 1994) and NCS clinical reappraisal studies (Kessler et al., 1998) documented acceptable validity of these CIDI diagnoses in comparison to blind clinical reinterviews. We experimented with the development of shortform scales for some of the other syndromes assessed in the NCS but decided not to produce final versions of such scales because of low validity of these syndromes in the NCS clinical reappraisal study. The latter include dysthymia (Kessler et al., 1998), bipolar disorder (Kessler et al., 1997), and non-affective psychosis (Kendler et al., 1996). In addition, we excluded other syndromes from CIDI-SF development because the full assessments in the CIDI are already quite brief. The two diagnoses of this sort that were assessed in the NCS are alcohol abuse and drug abuse. The CIDI uses only two questions to assess for each of these disorders, so the development of a short form was not necessary. The eight CIDI-SF scales were defined without DSM diagnostic hierarchy rules. This was due to the fact that some of the disorders higher in the hierarchy (such as bipolar disorder and non-affective psychosis for MDE and GAD) were not assessed in the CIDI-SF and the fact that the logistic complexities of assessing other hierarchy relationships (such as GAD only within episodes of MDE or PA only occurring on exposure to phobic stimuli) were beyond the scope of what could be accomplished in the NHIS. Finally, the decision was made to assess panic attacks rather than panic disorder in the CIDI-SF due to the fact that the designation of panic attacks as part of a panic disorder requires a determination of the temporal clustering of these attacks in a single month (four or more attacks within a single month or a month of persistent worry about having another attack) at any time in the respondent’s life. This requirement for lifetime assessment went beyond the time frame that could be considered in the NHIS. Kessler and consisted of adding items to implement the new DSM-IV requirement that symptoms lead to either ‘clinically significant distress or impairment in social, occupational, or other important areas of functioning.’ In addition, a screening scale for obsessivecompulsive disorder based on pilot data collected in Australia was added to the eight scales previously developed in the NCS. This modified CIDI-SF is scheduled for use in the conditions module of the redesigned NHIS once it is fielded. Further modifications to generate ICD-10 (WHO, 1992) diagnoses are currently under way for use in other surveys. In this report we describe the data analysis strategies used to develop the CIDI-SF, we present data on the relationship between the CIDI-SF and the full CIDI, and we discuss special data-analysis issues involved in coding the CIDI-SF in continuous versus categorical form. More detailed appendix tables and a complete copy of the CIDI-SF are available on the International Consortium of Psychiatric Epidemiology (ICPE) home page, which can be accessed over the Internet through the URL http://www.hcp.med.harvard.edu/icpe. Method Sample The initial CIDI-SF development was carried out in the US National Comorbidity Survey (NCS) (Kessler et al., 1994). The NCS was a nationally representative household survey administered in 1990-1992 to a sample of 8098 household respondents in the age range 15–54 in the coterminous United States. Interviews were conducted face-to-face in the homes of respondents and were administered in two parts. Only the Part I data were used in the CIDI-SF development. Part I was a modified version of CIDI 1.0 (WHO, 1990), which took an average of 65 minutes to complete. The response rate was 82.6%. The Part I data were weighted to adjust for differential non-response and variation in within-household probabilities of selection and post-stratified to match the crossclassification of a wide range of sociodemographic variables obtained in the NHIS (Kessler et al. 1995). All the NCS results reported here are based on these weighted data. The Composite International Diagnostic Interview (CIDI) The CIDI is a fully structured diagnostic interview designed to be used by trained interviewers who are The WHO CIDI Short-Form (CIDI-SF) Analysis procedures Most CIDI diagnostic sections have a stem-branch structure. These sections begin by asking one or two initial questions to determine whether the master symptoms of the syndrome have occurred. For example, the stem question for panic asks whether the respondent ever had ‘a spell or attack when all of a sudden you felt frightened, anxious or very uneasy in situations when most people would not be afraid or anxious’. Failure to endorse a single question leads to an automatic classification of not having the syndrome and a skip to the next section without being asked additional questions about the syndrome. Endorsement of the stem question, by comparison, leads to additional questions (the branch questions) about associated symptoms that might have clustered with the symptom assessed in the stem question into a clinically significant syndrome. Staying with the panic example, respondents who endorse the stem question are asked several follow-up questions designed to elicit an example of the situations in which they have attacks in order to make sure the attacks really occur in situations where most people would not be afraid. Respondents are then asked 18 questions about symptoms of autonomic arousal that occur in panic attacks. These questions are followed by questions about whether the arousal symptoms usually begin rapidly on exposure to the phobic stimulus. Responses to all these questions are used to decide whether the respondent meets criteria for having a panic attack. Additional questions are then asked to determine whether the attacks cluster into a panic disorder and, if so, questions are asked about age of onset, recency, course, and overlap with other diagnoses for purposes of making hierarchy exclusions. As it turns out, only a minority of the respondents who endorse a given diagnostic stem question in general population surveys go on to meet the full criteria for the syndrome. In the case of panic in the NCS, for example, 27.9% of respondents endorsed the stem question and 26.0% of those endorsers (7.3% of the total sample) were subsequently classified as ever having a true panic attack. It is noteworthy that one very quick way of screening for psychiatric disorders with a subset of CIDI questions is to administer only the diagnostic stem questions and to assign each respondent a score equivalent to the probability of caseness based on a benchmark surveys like the NCS. This set of questions could be administered in less than three minutes. The resulting weighted set of dichotomous variables would not have nearly as much precision as more thorough assessments. However, it could be very useful for certain purposes. For example, if such variables were available as part of a large prospective epidemiological study of risk factors for cardiovascular disease they would provide enough information to evaluate whether a more detailed psychiatric assessment would be useful. This kind of rough-and-ready evaluation could be extremely useful in calling attention to the importance of psychiatric disorders as risk factors. And the administration time of only three minutes might be short enough to convince researchers who are interested in other topics to take a provisional look at psychiatric disorders as possibly important risk factors. Preliminary analysis of the NCS data showed that the predicted probabilities of caseness among respondents who endorsed diagnostic stem questions differ significantly as a function of sociodemographic and symptom characteristics. This means that the inclusion of additional information about these predictors could improve classification of predicted probabilities of caseness over use of the stem questions alone. The CIDI-SF development process built on this insight by carrying out empirical analyses of the predictors of caseness in the subsample of NCS respondents who endorsed diagnostic stem questions. We attempted to select an optimal set of questions to maximize caseness classification accuracy. Both sociodemographic variables (such as age, gender, marital status) and symptom (branch) questions were used in these analyses. However, we concentrated on symptom questions in developing the short-form measures as exploratory analysis showed that the significant predictors of caseness were almost always symptom questions rather than sociodemographics. Selection of short-form items proceeded in four steps. First, a series of ordinary least squares (OLS) stepwise linear regression models was estimated for each CIDI/DSM-III-R diagnosis in the subsample of respondents who endorsed a stem question for that diagnosis. All symptom questions for the diagnosis were included as potential predictors. The use of OLS was appropriate despite the fact that the outcomes were dichotomous because the probabilities of caseness were all in the 20–80% range in the subsamples of stem endorsers. Plots of increments in explained variance were inspected to select the number of predictors that Kessler et al. Comparison of coefficients of multiple determination across best-fitting non-linear models for each of the equivalent subsets of predictors for a given outcome failed to find any one set of predictors that clearly stood out as superior to the others for any of the eight syndromes. Therefore, a fourth step was implemented to make a final decision among the alternatives. This step began with the use of receiver operator characteristic (Kraemer, 1992) analysis to select the optimal dichotomous caseness cutpoint for the scales generated by each of the models assuming equal importance of false positives and false negatives. Subgroup differences were then examined in the associations between these short-form dichotomies and the full CIDI caseness measures. Subgroups were defined on the basis of age (15–34 versus 35–54), gender (male, female), race/ ethnicity (non-Hispanic white, non-Hispanic black, Hispanic), years of education (0–12, 13+), region of the country (northeast, midwest, south, west) and urbanicity (major metropolitan area, nonmetropolitan urbanized area, rural area). Our goal was to determine whether one subset of items yielded a screening scale with more consistently high sensitivity or specificity in relation to the full CIDI than others across the full range of subgroups. A fairly clear basis for deciding on one final set of short-form questions for each of the eight syndromes was obtained from this step in the evaluation. Results The relationships between stem questions and lifetime diagnoses Table 1 presents the CIDI stem questions for each of the eight syndromes assessed in the CIDI-SF. Table 2 presents data on the proportions of NCS respondents who endorsed these stem questions along with conditional probabilities of meeting full DSM-III-R criteria for each of the CIDI syndromes among stem endorsers. The syndromes are presented in rankorder of stem question endorsement probabilities, from lowest to highest. As shown in the first column of Table 2, there is enormous variation in the proportion of NCS respondents who endorsed stem questions for the different disorders, from a low of 13.9% for GAD to a high of 71.3% for AD. The range of conditional probabilities of meeting full lifetime diagnostic criteria among stem endorsers is much more narrow, from a low of 0.198 for AD to a high of 0.370 for GAD. yielded optimal information. A parallel set of recursive partitioning models (Breiman et al. 1984) was estimated to search for significant non-additivities in the data. However, the additive specification was always found to yield better results in terms of cross-validated prediction accuracy. So we focused on additive models in further steps of model building. Second, once the optimal number of predictors was selected, a series of all possible subsets regression models was estimated from the full set of predictors included in the stepwise procedure. All possible subsets regression is a procedure for generating a number of roughly equivalent models of pre-specified complexity from a larger set of predictors. To take a hypothetical example, let’s say we began with 20 predictors and discovered through stepwise analysis that no more than six of these were needed to capture the significant additive effects on the outcome. There are 1140 different subsets of six items among 20 predictors. Many of these subsets will explain nearly as much variance in the outcome as the optimal subset. In order to investigate this for the CIDI symptom questions, we investigated the top 10 prediction equations, in terms of explained variance, for each syndrome. In each case we found that all 10 sets had coefficients of multiple determination that differed only trivially (i.e. in the fourth decimal place). Third, in order to select among these functionally equivalent subsets, we carried out a series of more complex analyses of each one. These included the use of constrained regression analysis to test for the equivalence of slopes across items and comparisons of linear versus non-linear specifications of the functional form between symptom counts and the outcome. The bestfitting models in virtually all cases were additive with equal slopes. Therefore, the final decision about functional form depended on whether a linear or nonlinear specification was used. In the linear specification, the only predictor variable was a count of the number of symptoms endorsed. In a non-linear specification there were separate dummy coded predictor variables (i.e. coded 0 or 1) to distinguish respondents who endorsed exactly one symptom, exactly two symptoms, and so forth, through all the symptoms. An F-test of the significance of the explained variance increment in the non-linear model versus the linear model was used to decide which of the two models was a better fit. In all cases the non-linear specification was found to be superior to the linear specification. The WHO CIDI Short-Form (CIDI-SF) Table 1: CIDI 1.0 stem questions for CIDI-SF syndromes Syndrome Generalized anxiety disorder (GAD) Stem questions Have you ever had a period of a month or more when most of the time you felt worried and anxious? (IF YES) What is the longest period you’ve had of feeling worried and anxious? Some people have such an unreasonably strong fear of being in a crowd, leaving home alone, traveling on buses, cars and trains, or crossing a bridge, that they always get very upset in such a situation or avoid it altogether. Did you ever go through a period when being in such a situation always frightened you badly? Have you ever had a spell or attack when all of a sudden you felt frightened, anxious or very uneasy in situations when most people would not be afraid or anxious? Have you used any of these medicines in Part A (‘Part A’ refers to a CARD PRESENTED TO THE RESPONDENT FOR VISUAL INSPECTION THAT CONTAINS A ‘PART A’ LIST OF PRESCRIPTION DRUGS THAT CAN BE ABUSED AND A ‘PART B’ LIST OF ILLICIT DRUGS) on your own more than five times when they were not prescribed for you, either to relax, feel better, to feel high, or feel more active or alert. Now I’d like to ask you about your experiences with other drugs. Look at the drugs in Part B on the card. Have you ever used any of those more than five times? Have you ever taken any other drug more than five times on your own either to get high, to relax, or to make you feel better, more active or alert? Some people have such a strong unreasonable fear of doing things in front of others, like speaking in public, that they avoid it or feel extremely uncomfortable or uneasy about doing them. Have you ever had such a strong, unreasonable fear of: (i) speaking in public? (ii) having to use a toilet when away from home? (iii) eating or drinking in public? (iv) talking to people because you might have nothing to say or might sound foolish? (v) writing while someone watches? (vi) talking in front of a small group of people? There are other things that make some people so unreasonably afraid that they try to avoid them. Have you ever had a strong unreasonable fear of: (i) heights? (ii) flying? (iii) seeing blood? (iv) storms, thunder or lightening? (v) snakes, birds, rats, insects or other animals? (vi) closed spaces? (vii) getting a (shot/injection) or going to the dentist? (viii) being in water, like a swimming pool or lake? In your lifetime, have you ever had two weeks or more when nearly every day you felt sad, blue, depressed? Have there ever been two weeks or longer when you lost interest in most things like work or hobbies or things you usually like to do for fun? Now I am going to ask some questions about your use of alcoholic beverages like (BEVERAGES POPULAR LOCALLY – BEER, WINE, OR LIQUOR). In your entire lifetime, have you ever had at least 12 drinks of any kind of alcoholic beverage? (IF NO, PROBE WITH THE FOLLOWING QUESTION) Not even if you count having wine with meals, or beer at a sports event, or champagne at a wedding? (IF YES TO EITHER OF FIRST TWO QUESTIONS) In the past 12 months, did you have at least 12 drinks of any kind of alcoholic beverage? (IF NO) At any one year period of your entire life, did you have at least 12 drinks of any kind of alcoholic beverage? Agoraphobia (AG) Panic attack (PA) Drug Dependence (DD) Social phobia (SoP) Simple phobia (SiP) Major depressive episode (MDE) Alcohol dependence (AD) Kessler et al. Table 2: Prevalences of stem question endorsement and conditional probabilities of meeting full CIDI/DSM-III-R diagnostic criteria for the syndromes among stem question endorsers in the NCS Prevalence of stem question % Generalized anxiety disorder (GAD) Agoraphobia (AG) Panic attack (PA) Drug dependence (DD) Social phobia (SoP) Simple phobia (SiP) Major depressive episode (MDE) Alcohol dependence (AD) 13.9 18.4 27.9 32.0 38.7 51.6 56.0 71.3 (se) (0.4) (0.4) (0.5) (0.5) (0.5) (0.6) (0.6) (0.5) Conditional probability of meeting full criteria P (se) 0.370 0.363 0.260 0.235 0.345 0.218 0.305 0.198 (1.4) (1.2) (0.9) (0.8) (0.8) (0.6) (0.7) (0.5) Syndrome As noted in the introduction, one way of screening for these disorders would be to administer only the stem questions and assign each endorser the appropriate predicted probability of caseness. If the associations between the stems and caseness in a new study were the same as those in the NCS, this very brief screening method would yield fairly accurate total sample prevalence estimates. Or, alternatively, a new survey could administer the stems to all respondents and then administer the full CIDI only to a probability subsample of endorsers to generate study-specific conditional probabilities of CIDI caseness. Predicting caseness among endorsers of the stem questions The first column in Table 3 reports the number of CIDI symptom questions for each CIDI-SF syndrome. These were the numbers of predictors included in the stepwise regression models to improve individual-level prediction accuracy of CIDI caseness among stem question endorsers. The remainder of the table shows the number of these predictors found to add significantly to explained variance and the percentage of variance in the full CIDI caseness measures explained by this set of predictors in multiple regression analysis. As shown in Table 3, the number of predictors that added significantly to the stepwise prediction equations was much smaller in every case than the total number of predictors considered. This demonstrates that there is empirical redundancy among the CIDI symptom questions, allowing some reduction in administration time by relying on psychometric analysis to select the most informative subsets of items. The percentages of explained variance are all substantial, from a low of 44.4% for MDE to a high of 99.3% for AG. It is important to remember that these are variance components in the subsamples of stem question endorsers. The respondents who did not endorse stem questions, of course, are always classified as non-cases and this is always perfectly consistent with the classification in the full CIDI. Individual-level prediction accuracy A more concrete sense of the overall predictive accuracy of the CIDI-SF in comparison to the CIDI is presented in Table 4, where we show the sensitivity, specificity, positive predictive value, negative predictive value, and percentage classification accuracy for the CIDI-SF scales at their optimal dichotomous caseness cutpoints. An optimal cutpoint is defined here as the cutpoint on the symptom count scale that yields the highest percentage classification accuracy on the full CIDI. As shown in the table, very high percentages of the CIDI cases are correctly classified in the short-forms, with a range of correct classification (sensitivity) between 77.0% for DD and 100% for AG. The percentages of CIDI non-cases that are correctly classified in the short-forms (specificity) are also consistently quite high, with a range between 93.9% for MDE and 99.9% for DD. The percentages of CIDI-SF cases that are confirmed in the full CIDI (positive predictive value) range between 75.7% for MDE and 100% for AG. The percentages of CIDI-SF non-cases that are confirmed (negative predictive value) range between 86.9% for MDE and 100% for AG. The percentages of overall classification accuracy are excellent for all syndromes, ranging from 93.2% for MDE to 99.6% for GAD. The WHO CIDI Short-Form (CIDI-SF) Table 3: Numbers of symptom questions used in stepwise regressions, numbers of significant predictors in optimal prediction equations, and explained variances in the CIDI caseness measures Number of symptom questions used 23 13 18 18 9 9 21 18 Number of significant predictors 7 2 6 6 4 3 7 6 Proportion of explained variance 66.4 99.3 69.9 78.3 84.2 61.0 44.4 78.7 Syndrome Generalized anxiety disorder (GAD) Agoraphobia (AG) Panic attack (PA) Drug dependence (DD) Social phobia (SoP) Simple phobia (SiP) Major depressive episode (MDE) Alcohol dependence (AD) Table 4: Sensitivity (SENS), specificity (SPEC), positive predicitive value (PPV), negative predictive value (NPV), and total classification accuracy (TCA) of CIDI-SF scales compared to CIDI/DSM-III-R diagnoses in the NCS1 Syndrome Generalized anxiety disorder (GAD) Agoraphobia (AG) Panic attack (PA) Drug dependence (DD) Social phobia (SoP) Simple phobia (SiP) Major depressive episode (MDE) Alcohol dependence (AD) SENS 96.6 100.0 90.0 77.0 86.3 92.9 89.6 93.6 (se) (0.9) (–) (1.0) (1.5) (1.0) (0.8) (0.8) (0.8) SPEC 99.8 99.9 99.5 99.9 98.9 96.2 93.9 96.2 (se) (0.1) (0.1) (0.1) (0.1) (0.1) (0.2) (0.3) (0.2) PPV 96.8 99.6 96.2 98.2 92.4 76.4 75.7 80.2 (se) (0.9) (0.3) (0.6) (0.5) (0.8) (1.3) (1.0) (1.2) NPV 99.8 100.0 91.6 98.2 97.9 99.1 86.9 98.9 (se) (0.2) (–) (0.3) (0.2) (0.2) (0.1) (0.4) (0.1) TCA 99.6 99.9 98.4 98.2 97.2 95.9 93.2 95.8 (se) (0.1) (0.1) (0.1) (0.1) (0.1) (0.2) (0.3) (0.2) 1 SENS is the proportion of CIDI cases correctly classified in the CIDI-SF; SPEC is the proportion of CIDI non-cases correctly classified in the CIDI-SF; PPV is the proportion of CIDI-SF cases confirmed in the CIDI; NPV is the proportion of CIDI-SF non-cases confirmed in the CIDI; and TCA is the percentage of respondents whose CIDI-SF classification is the same as their classification on the CIDI. Subgroup variation in prediction accuracy As noted above in the method section, the CIDI-SF items were selected with an eye towards minimizing subgroup variation in scale performance. Results are presented graphically in Figures 1–7. Each figure shows the association between scores on one of the CIDI-SF scales and probability of full CIDI caseness in the total sample as well as in subsamples of males and females. At each point on the scales we also present bars to indicate the highest and lowest probabilities of caseness across all 16 subgroups considered in the analyses (two for age, two for gender, three for race/ethnicity, two for education, four for region of the country, and three for urbanicity). In each case, respondents who failed to endorse the diagnostic stem question are coded zero on the scale along with respondents who endorsed the stem question but were negative on all symptom questions. No figure is presented for AG, as the classification accuracy was so near to perfect in the total sample that meaningful subsample differences did not exist. The figures show that there is a consistently strong monotonic relationship between scores on the CIDISF scales and the full CIDI for all syndromes. The general shape of the total sample curves can be seen as well in subsamples. There is no evidence of systematic statistically significant subgroup differences of a sort that would be indicated by a consistent tendency for one subgroup to have higher or lower conditional probabilities of caseness than others at several points along any Kessler et al. short-forms in failing to discriminate at the top end. However, it is worth noting that the conditional probabilities of CIDI caseness are all quite high in the range of these scales above the inflection point. This means that the inflection point is more accurately interpreted as a ceiling effect than as a sensitivity failure at the high end of the scales. Timing estimates The length of the screening battery is a critical issue in epidemiological surveys. This is especially so in large general-purpose surveys such as the US National Health Interview Survey, where demands for space are made by many different researchers who have many different substantive interests. Because of the large NHIS sample size (over 50,000 respondents each year), the fact that the survey is carried out face-to-face rather than over the telephone, and the wide geographic dispersion of the sample (over 6 million square miles), the average total administration cost of NHIS questions is about $200,000 per minute per year. Seconds count in a situation of this sort. In order to obtain timing estimates for the CIDI-SF in a general population survey, the full set of eight scales was included in a nationally representative pilot for a survey on mid-life development carried out with of the scales. Furthermore, the highest subgroup conditional probabilities at a given scale score are, for the most part, lower than the lowest subgroup conditional probabilities at the next higher score. There are only two exceptions to this generally positive picture. The first is a low conditional probability of CIDI caseness for SoP among respondents with high education who scored 2 on the CIDI-SF scale for this disorder. This is a true outliner in the sense that the low conditional probability is confined to this one subgroup and is considerably below the conditional probability among respondents in this same subsample who scored 1 on the short-form scale. The second exception is a pattern of widely varying conditional probabilities of CIDI caseness for GAD at intermediate scores (in the range 2–5) on the CIDI-SF scale for this disorder. This instability is due to the small number of respondents out of the 8098 in the NCS who scored 2 (n = 19), 3 (n = 24), or 4 (n = 70) on this CIDI-SF scale. It is noteworthy that there is an inflection point at the high end of most CIDI-SF scales after which the curve becomes fairly flat. This indicates that the probability of true CIDI caseness above the inflection point is not strongly related to short-form scores. A pattern of this sort might be interpreted as a weakness of the NE = Northest, MW = Midwest, So = South, WE = West, ME = Metro, UR = Urban, RU = Rural, WT = White, BL = Black, HI = Hispanic. YO = Young, Old = Old, HEd = Higher Education, LEd = Lower Education Figure 1: The association between scores on the CIDI-SF Generalized Anxiety Disorder Scale and probability of caseness in the full CIDI The WHO CIDI Short-Form (CIDI-SF) NE = Northest, MW = Midwest, So = South, WE = West, ME = Metro, UR = Urban, RU = Rural, WT = White, BL = Black, HI = Hispanic, YO = Young, Old = Old, HEd = Higher Education, LEd = Lower Education Figure 2: The association between scores on the CIDI-SF Panic Scale and probability of caseness in the full CIDI NE = Northest, MW = Midwest, So = South, WE = West, ME = Metro, UR = Urban, RU = Rural, WT = White, BL = Black, HI = Hispanic, YO = Young, Old = Old, HEd = Higher Education, LEd = Lower Education Figure 3: The association between scores on the CIDI-SF Drug Dependence Scale and probability of caseness in the full CIDI Kessler et al. NE = Northest, MW = Midwest, So = South, WE = West, ME = Metro, UR = Urban, RU = Rural, WT = White, BL = Black, HI = Hispanic. YO = Young, Old = Old, HEd = Higher Education, LEd = Lower Education Figure 4: The association between scores on the CIDI-SF Social Phobia Scale and probability of caseness in the full CIDI NE = Northest, MW = Midwest, So = South, WE = West, ME = Metro, UR = Urban, RU = Rural, WT = White, BL = Black, HI = Hispanic. YO = Young, Old = Old, HEd = Higher Education, LEd = Lower Education Figure 5: The association between scores on the CIDI-SF Simple Phobia Scale and probability of caseness in the full CIDI The WHO CIDI Short-Form (CIDI-SF) NE = Northest, MW = Midwest, So = South, WE = West, ME = Metro, UR = Urban, RU = Rural, WT = White, BL = Black, HI = Hispanic. YO = Young, Old = Old, HEd = Higher Education, LEd = Lower Education Figure 6: The association between scores on the CIDI-SF Major Depressive Episode Scale and probability of caseness in the full CIDI NE = Northest, MW = Midwest, So = South, WE = West, ME = Metro, UR = Urban, RU = Rural, WT = White, BL = Black, HI = Hispanic. YO = Young, Old = Old, HEd = Higher Education, LEd = Lower Education Figure 7: The association between scores on the CIDI-SF Alcohol Dependence Scale and probability of caseness in the full CIDI Kessler et al. surveys carried out in a number of different countries using DSM-III, but also DSM-IV and ICD-10, criteria. In addition, their work will expand on the NCS analyses to include the full range of syndromes included in the CIDI, a number of which were not in the NCS. In the case of syndromes where the validity of the CIDI has been called into question, the expansion of the CIDI-SF will be carried out in conjunction with a broader effort on the part of the WHO CIDI Advisory Committee to improve the evaluation of this syndrome in the CIDI. A good case in point is non-affective psychosis (NAP), which is not evaluated with acceptable validity in the CIDI (Kendler et al., 1996). Professor Charles Pull is in charge of an effort by the WHO CIDI Advisory Committee to improve evaluation of NAP in the CIDI. Our efforts to develop a CIDI-SF assessment of NAP will be done in collaboration with Professor Pull and will build on his work to modify the assessment of this class of disorders in the full CIDI. Another problem with the current version of the short-form scales is the mismatch between the lifetime time frame of the CIDI questions in the NCS and the 12-month time frame in the CIDI-SF. As noted in the introduction, this was dictated by the fact that the initial motivation and funding for short-form scale development required a 12-month version for use in the US National Health Interview Survey. As part of the expanded development work to be carried out by the ICPE, this mismatch will be resolved. A lifetime version of the CIDI-SF will be developed based on analysis of a number of surveys that used the lifetime version of the CIDI. And a separate 12-month version of the CIDI-SF will be developed based on parallel analysis of surveys that used the recently developed 12-month version of the CIDI. A final limitation is the lack of validity data for the CIDI-SF. It is true that validity data exist for the full CIDI (Wittchen, 1994; Kessler et al., 1998) and that we have presented data in this report suggesting that there is a strong relationship between diagnoses based on the CIDI-SF and the full CIDI. However, it is important to recognize that the data reported in the tables of this paper represent part–whole associations between sets of responses to symptom questions collected in a single survey. Although the reduced sets of symptom questions are those used in the CIDI-SF, responses to these questions might well be different when they are presented as a separate short instrument rather than when they are embedded in a much longer support from the Network on Successful Mid-life Development of the John D. and Catherine T. MacArthur Foundation. Computer assisted interviewing was used and an internal clock employed to calculate the time between keystrokes in order to determine the average length of each CIDI-SF section. The 12month version of the scales was used. Timing estimates are as follows: GAD was assessed in an average of 24 s, with a range of 10–360 s and with 24% of respondents endorsing the stem question. AG was assessed in an average of 30 s, with a range of 2081 s and 14% endorsing the stem question. PA was assessed in an average of 38 s with a range of 13–191 s and 18% endorsing one of the two stem questions. DD was assessed in an average of 89 s with a range of 52–238 s and 17% endorsing the stem question. SoP was assessed in an average of 42 s with a range of 28–97 s and 21% endorsing the stem question. SiP was assessed in an average of 72 s with a range of 35–169 s and 45% endorsing the stem question. MDE was assessed in an average of 55 s with a range of 18–233 s and 33% endorsing at least one of the stem question. AD was assessed in an average of 68 s with a range of 20–181 s and 29% endorsing the stem question. Discussion Limitations Several limitations of the CIDI-SF scales in their current form need to be mentioned. Three of them deal with issues of coverage: • scale construction was designed to maximize concordance with DSM-III-R diagnoses rather than with diagnoses based on the more recent DSM-IV and ICD-10 systems; • scale construction was based on analysis of only a single survey with a restricted age range carried out in only one country; and • short-form scales were not developed for all syndromes covered in the CIDI. All three of these problems are being addressed in planned research to be carried out by the International Consortium in Psychiatric Epidemiology (ICPE), a group of researchers who have carried out CIDI surveys in different countries around the world and are collaborating in cross-national comparisons under the co-ordination of the WHO. The ICPE initiative will attempt to replicate the work reported in this paper in The WHO CIDI Short-Form (CIDI-SF) instrument. Furthermore, some important changes in question wording were made in the CIDI-SF based on pilot testing carried out at the National Center for Health Statistics Cognitive Survey Laboratory, making it even more important to carry out an independent validity study of the CIDI-SF. The most important change occurred in the stem question for MDE. The MDE stem question in the original CIDI asked respondents about a period of ‘two weeks or more when nearly every day you felt sad, blue or depressed’. This question does not include the DSM requirement that the depression must last most of the day. Therefore, the final version of the CIDI-SF added a separate question about duration within a day. Furthermore, based on cognitive laboratory evidence that respondents sometimes did not focus on the ‘nearly every day’ part of the stem question, an additional question was asked about duration throughout the two weeks. In the end, then, the original CIDI stem question was changed into three questions in the CIDI-SF: (i) ‘During the past 12 months, was there ever a time when you felt sad, blue, or depressed for two weeks or more in a row?’ (ii) ‘For the next few questions, please think of the two-week period during the past 12 months when these feelings were worst. During that time, did the feelings of being sad, blue, or depressed usually last all day long, most of the day, about half of the day, or less than half the day?’ (iii) ‘During those two weeks, did you feel this way every day, almost every day, or less often?’ Because of these changes, it is important to carry out separate validity studies of the CIDI-SF rather than to rely on the CIDI validity studies. Although no CIDI-SF validity studies have as yet been done, we plan to do these once the ICPE expansion of the shortform scales is completed. Uses of the CIDI-SF The CIDI-SF diagnostic classifications are not sufficiently precise to replace the more complete diagnostic evaluations made in the full CIDI. However, as noted in the introduction, the full CIDI takes over an hour to administer whereas the CIDI-SF can be completed in an average of less than ten minutes. This means that the CIDI-SF can be a useful first-stage screening measure in large studies with two-phase designs (Newman et al., 1990). The CIDI-SF is also ideal for use in general-purpose epidemiological studies that cannot invest the hour or more needed to administer detailed psychiatric diagnostic interviews but nonetheless want to evaluate the importance of psychiatric disorders as risk factors for other outcomes. An intermediate strategy would be to administer the CIDI-SF to a large sample and then to follow this up with a gold standard clinical reappraisal interview in a stratified subsample of respondents that oversamples those with high predicted probabilities of caseness. The CIDI-SF responses could then be benchmarked against the clinical reinterviews to generate predicted probabilities of caseness for all respondents. Like conventional diagnostic scales, the symptom counts in the CIDI-SF can be dichotomized to generate yes–no caseness designation. However, as the scales were developed based on psychometric analysis rather than logical coverage of all DSM criteria, a feasible alternative might be to work with the continuous measures made up of the symptom counts. If this is done, though, it is important to transform the scale scores so as to assess probability of caseness. Provisional transformations of this sort are available for the current version of the CIDI-SF based on the relationship between scale scores and probabilities of caseness in the NCS. Tables of these transformations can be obtained from the WHO CIDI home page along with a copy of all the CIDI-SF scales. A question can be raised about how to analyse diagnostic data of the sort generated by a continuous measure of this kind with predicted probabilities of caseness. A simple computation of means can be used to estimate prevalences. These estimates should be more accurate when based on continuous measures that code each response for probability of caseness than on dichotomous measures. The probability-of-caseness scores can also be used as continuous predictor variables in regression models that seek to evaluate the impact of psychiatric disorders on other outcomes. Given the strong negative skew in all these scales (the fact that the majority of respondents in general population surveys will have a predicted probability close to zero), though, it would be wise to evaluate the functional form of the relationships between these scales and other outcomes rather than to assume a linear relationship. In cases where two-stage interviewing is used in a single survey, standard procedures exist to compute the standard errors of prevalence estimates (Newman et al., 1990). More complex issues arise in using the continuous versions of the CIDI-SF scales as outcome variables in risk factor models. This complexity arises due to the fact that probability-of-caseness versions of these scales Kessler et al. maximize precision of clinically validated prevalence estimates for a fixed cost. Acknowledgements The work reported here was carried out in conjunction with the International Consortium of Psychiatric Epidemiology (ICPE). More information about the ICPE can be obtained from http://www.hcp.med.harvard.edu/icpe. The work was supported by the John D. and Catherine T. MacArthur Foundation Research Network on Successful Midlife Development and NIMH grants R01MH46376, R01MH52861, R01MH49098, K05MH00507 and T32MH16806. are skewed (the majority of respondents have a score of either zero or close to zero) and have constrained 0–1 ranges. There are a number of different ways to estimate prediction equations with outcomes of this sort. The most flexible approach – and the one we recommend – is weighted logistic regression analysis. The idea here is to create a data file in which each respondent who has a predicted probability greater than zero but less than one is entered as two separate records (as if this one individual were actually two separate people). Both records would have exactly the same scores on all predictor variables but they would have different scores on the outcome, in one case a score of zero and in the other a score of one. The record with a score of one on the outcome would have a weight equal to the respondent’s predicted probability of caseness (p). The record with a score of zero on the outcome would have a weight equal to the respondent’s predicted probability of non-caseness (1 – p). This data file would be analysed using conventional logistic regression with weighted data to estimate parameters (Hosmer and Lemeshow, 1989) and using design-based procedures to estimate confidence intervals (Skinner et al., 1989). It is worth noting that this same approach could be used with the full CIDI and, if so, would represent an attractive alternative to the probability-of-caseness weighting approach recently ‘proposed by Surtees et al. (1997) and Wainwright et al. (1997)’. The basic notion here would be to carry out a fairly substantial clinical validation study as part of a large general population CIDI survey. A stratified probability sample of CIDI respondents, with an oversampling of CIDI cases, would be reinterviewed using a gold standard clinical reappraisal interview. Responses to the CIDI questions would be used to estimate prediction equations for the clinical diagnoses in the subsample of validity study respondents. These prediction equations, in turn, would be used to impute predicted probabilities of caseness to all CIDI respondents in the larger survey. The weighted logistic regression approach, finally, would then be used to analyse the data. The main attraction is that this strategy would correct for invalidity in the CIDI. The downside is that the size of the clinical reappraisal subsample might have to be substantial in relation to the total sample for the inefficiency introduced by weighting to be overcome by the bias reduction to decrease rather than increase total survey error. Further methodological studies are currently under way to investigate the conditions under which this weighting approach might be optimal to http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png International Journal of Methods in Psychiatric Research Wiley

The World Health Organization Composite International Diagnostic Interview short‐form (CIDI‐SF)

Loading next page...
 
/lp/wiley/the-world-health-organization-composite-international-diagnostic-fLXUri3HmD

References (20)

Publisher
Wiley
Copyright
Copyright © 1998 Whurr Publishers Ltd.
ISSN
1049-8931
eISSN
1557-0657
DOI
10.1002/mpr.47
Publisher site
See Article on Publisher Site

Abstract

RONALD C. KESSLER, Department of Health Care Policy, Harvard Medical School, Boston, MA, USA GAVIN ANDREWS, Clinical Research Unit for Anxiety Disorders, University of NSW at St Vincent’s Hospital, Sydney, Australia DANIEL MROCZEK, Department of Psychology, Fordham University, Bronx, NY, USA BEDIRHAN USTUN, Division of Epidemiology, Classification and Assessment, World Health Organization, Geneva, Switzerland HANS-ULRICH WITTCHEN, Max Planck Institute of Psychiatry, Munich, Germany ABSTRACT Data are reported on a series of short-form (SF) screening scales of DSM-III-R psychiatric disorders developed from the World Health Organization’s Composite International Diagnostic Interview (CIDI). A multi-step procedure was used to generate CIDI-SF screening scales for each of eight DSM disorders from the US National Comorbidity Survey (NCS). This procedure began with the subsample of respondents who endorsed the CIDI diagnostic stem question for a given disorder and then used a series of stepwise regression analyses to select a subset of screening questions to maximize reproduction of the full CIDI diagnosis. A small number of screening questions, between three and eight for each disorder, was found to account for the significant associations between symptom ratings and CIDI diagnoses. Summary scales made up of these symptom questions correctly classify between 77% and 100% of CIDI cases and between 94% and 99% of CIDI non-cases in the NCS depending on the diagnosis. Overall classification accuracy ranged from a low of 93% for major depressive episode to a high of over 99% for generalized anxiety disorder. Pilot testing in a nationally representative telephone survey found that the full set of CIDI-SF scales can be administered in an average of seven minutes compared to over an hour for the full CIDI. The results are quite encouraging in suggesting that diagnostic classifications made in the full CIDI can be reproduced with excellent accuracy with the CIDI-SF scales. Independent verification of this reproduction accuracy, however, is needed in a data set other than the one in which the CIDI-SF was developed. Key words: Composite International Diagnostic Interior (CIDI), psychiatric diagnosis, diagnostic interviews, psychometrics, screening scales This report introduces the World Health Organization (WHO) Composite International Diagnostic Interview (CIDI) Short-Form (CIDI-SF), a fully structured set of scales developed from the larger CIDI (WHO, 1990). The goal of the CIDI-SF is to provide a quick screen (average administration time of 10 minutes) for the most commonly occurring psychiatric disorders assessed in the CIDI. The CIDI-SF was originally developed for use in the redesigned US National Health Interview Survey (NHIS; Adams and Morano, 1995), a large annual survey carried out in the United States under the auspices of the US government’s National Center for Health Statistics (NCHS). Prior to a major redesign scheduled for implementation before the year 2000, the NHIS made no consistent effort to assess psychiatric disorders. However, such an assessment was planned for inclusion in the redesigned NHIS if brief and accurate diagnosis-specific screening measures could be developed. The CIDI-SF, which was developed with support from the NCHS as part of the NHIS redesign effort, meets these requirements. The original version of the CIDI-SF was designed to screen for DSM-III-R (APA, 1987) disorders. A subsequent revision of the CIDI-SF was made to generate DSM-IV (APA, 1994) diagnoses. This revision was carried out by Andrews and Kessler et al. not clinicians. Although the CIDI is capable of generating diagnoses according to the definitions and criteria of both the DSM and ICD systems, our initial work focused on DSM-III-R diagnoses. We attempted to develop short-form scales for eight CIDI syndromes: major depressive episodes (MDE), generalized anxiety disorder (GAD), simple phobia (SiP), social phobia (SoP), agoraphobia with or without panic (AG), panic attacks (PA), alcohol dependence (AD), and drug dependence (DD). Both WHO Field Trials of the CIDI (Wittchen, 1994) and NCS clinical reappraisal studies (Kessler et al., 1998) documented acceptable validity of these CIDI diagnoses in comparison to blind clinical reinterviews. We experimented with the development of shortform scales for some of the other syndromes assessed in the NCS but decided not to produce final versions of such scales because of low validity of these syndromes in the NCS clinical reappraisal study. The latter include dysthymia (Kessler et al., 1998), bipolar disorder (Kessler et al., 1997), and non-affective psychosis (Kendler et al., 1996). In addition, we excluded other syndromes from CIDI-SF development because the full assessments in the CIDI are already quite brief. The two diagnoses of this sort that were assessed in the NCS are alcohol abuse and drug abuse. The CIDI uses only two questions to assess for each of these disorders, so the development of a short form was not necessary. The eight CIDI-SF scales were defined without DSM diagnostic hierarchy rules. This was due to the fact that some of the disorders higher in the hierarchy (such as bipolar disorder and non-affective psychosis for MDE and GAD) were not assessed in the CIDI-SF and the fact that the logistic complexities of assessing other hierarchy relationships (such as GAD only within episodes of MDE or PA only occurring on exposure to phobic stimuli) were beyond the scope of what could be accomplished in the NHIS. Finally, the decision was made to assess panic attacks rather than panic disorder in the CIDI-SF due to the fact that the designation of panic attacks as part of a panic disorder requires a determination of the temporal clustering of these attacks in a single month (four or more attacks within a single month or a month of persistent worry about having another attack) at any time in the respondent’s life. This requirement for lifetime assessment went beyond the time frame that could be considered in the NHIS. Kessler and consisted of adding items to implement the new DSM-IV requirement that symptoms lead to either ‘clinically significant distress or impairment in social, occupational, or other important areas of functioning.’ In addition, a screening scale for obsessivecompulsive disorder based on pilot data collected in Australia was added to the eight scales previously developed in the NCS. This modified CIDI-SF is scheduled for use in the conditions module of the redesigned NHIS once it is fielded. Further modifications to generate ICD-10 (WHO, 1992) diagnoses are currently under way for use in other surveys. In this report we describe the data analysis strategies used to develop the CIDI-SF, we present data on the relationship between the CIDI-SF and the full CIDI, and we discuss special data-analysis issues involved in coding the CIDI-SF in continuous versus categorical form. More detailed appendix tables and a complete copy of the CIDI-SF are available on the International Consortium of Psychiatric Epidemiology (ICPE) home page, which can be accessed over the Internet through the URL http://www.hcp.med.harvard.edu/icpe. Method Sample The initial CIDI-SF development was carried out in the US National Comorbidity Survey (NCS) (Kessler et al., 1994). The NCS was a nationally representative household survey administered in 1990-1992 to a sample of 8098 household respondents in the age range 15–54 in the coterminous United States. Interviews were conducted face-to-face in the homes of respondents and were administered in two parts. Only the Part I data were used in the CIDI-SF development. Part I was a modified version of CIDI 1.0 (WHO, 1990), which took an average of 65 minutes to complete. The response rate was 82.6%. The Part I data were weighted to adjust for differential non-response and variation in within-household probabilities of selection and post-stratified to match the crossclassification of a wide range of sociodemographic variables obtained in the NHIS (Kessler et al. 1995). All the NCS results reported here are based on these weighted data. The Composite International Diagnostic Interview (CIDI) The CIDI is a fully structured diagnostic interview designed to be used by trained interviewers who are The WHO CIDI Short-Form (CIDI-SF) Analysis procedures Most CIDI diagnostic sections have a stem-branch structure. These sections begin by asking one or two initial questions to determine whether the master symptoms of the syndrome have occurred. For example, the stem question for panic asks whether the respondent ever had ‘a spell or attack when all of a sudden you felt frightened, anxious or very uneasy in situations when most people would not be afraid or anxious’. Failure to endorse a single question leads to an automatic classification of not having the syndrome and a skip to the next section without being asked additional questions about the syndrome. Endorsement of the stem question, by comparison, leads to additional questions (the branch questions) about associated symptoms that might have clustered with the symptom assessed in the stem question into a clinically significant syndrome. Staying with the panic example, respondents who endorse the stem question are asked several follow-up questions designed to elicit an example of the situations in which they have attacks in order to make sure the attacks really occur in situations where most people would not be afraid. Respondents are then asked 18 questions about symptoms of autonomic arousal that occur in panic attacks. These questions are followed by questions about whether the arousal symptoms usually begin rapidly on exposure to the phobic stimulus. Responses to all these questions are used to decide whether the respondent meets criteria for having a panic attack. Additional questions are then asked to determine whether the attacks cluster into a panic disorder and, if so, questions are asked about age of onset, recency, course, and overlap with other diagnoses for purposes of making hierarchy exclusions. As it turns out, only a minority of the respondents who endorse a given diagnostic stem question in general population surveys go on to meet the full criteria for the syndrome. In the case of panic in the NCS, for example, 27.9% of respondents endorsed the stem question and 26.0% of those endorsers (7.3% of the total sample) were subsequently classified as ever having a true panic attack. It is noteworthy that one very quick way of screening for psychiatric disorders with a subset of CIDI questions is to administer only the diagnostic stem questions and to assign each respondent a score equivalent to the probability of caseness based on a benchmark surveys like the NCS. This set of questions could be administered in less than three minutes. The resulting weighted set of dichotomous variables would not have nearly as much precision as more thorough assessments. However, it could be very useful for certain purposes. For example, if such variables were available as part of a large prospective epidemiological study of risk factors for cardiovascular disease they would provide enough information to evaluate whether a more detailed psychiatric assessment would be useful. This kind of rough-and-ready evaluation could be extremely useful in calling attention to the importance of psychiatric disorders as risk factors. And the administration time of only three minutes might be short enough to convince researchers who are interested in other topics to take a provisional look at psychiatric disorders as possibly important risk factors. Preliminary analysis of the NCS data showed that the predicted probabilities of caseness among respondents who endorsed diagnostic stem questions differ significantly as a function of sociodemographic and symptom characteristics. This means that the inclusion of additional information about these predictors could improve classification of predicted probabilities of caseness over use of the stem questions alone. The CIDI-SF development process built on this insight by carrying out empirical analyses of the predictors of caseness in the subsample of NCS respondents who endorsed diagnostic stem questions. We attempted to select an optimal set of questions to maximize caseness classification accuracy. Both sociodemographic variables (such as age, gender, marital status) and symptom (branch) questions were used in these analyses. However, we concentrated on symptom questions in developing the short-form measures as exploratory analysis showed that the significant predictors of caseness were almost always symptom questions rather than sociodemographics. Selection of short-form items proceeded in four steps. First, a series of ordinary least squares (OLS) stepwise linear regression models was estimated for each CIDI/DSM-III-R diagnosis in the subsample of respondents who endorsed a stem question for that diagnosis. All symptom questions for the diagnosis were included as potential predictors. The use of OLS was appropriate despite the fact that the outcomes were dichotomous because the probabilities of caseness were all in the 20–80% range in the subsamples of stem endorsers. Plots of increments in explained variance were inspected to select the number of predictors that Kessler et al. Comparison of coefficients of multiple determination across best-fitting non-linear models for each of the equivalent subsets of predictors for a given outcome failed to find any one set of predictors that clearly stood out as superior to the others for any of the eight syndromes. Therefore, a fourth step was implemented to make a final decision among the alternatives. This step began with the use of receiver operator characteristic (Kraemer, 1992) analysis to select the optimal dichotomous caseness cutpoint for the scales generated by each of the models assuming equal importance of false positives and false negatives. Subgroup differences were then examined in the associations between these short-form dichotomies and the full CIDI caseness measures. Subgroups were defined on the basis of age (15–34 versus 35–54), gender (male, female), race/ ethnicity (non-Hispanic white, non-Hispanic black, Hispanic), years of education (0–12, 13+), region of the country (northeast, midwest, south, west) and urbanicity (major metropolitan area, nonmetropolitan urbanized area, rural area). Our goal was to determine whether one subset of items yielded a screening scale with more consistently high sensitivity or specificity in relation to the full CIDI than others across the full range of subgroups. A fairly clear basis for deciding on one final set of short-form questions for each of the eight syndromes was obtained from this step in the evaluation. Results The relationships between stem questions and lifetime diagnoses Table 1 presents the CIDI stem questions for each of the eight syndromes assessed in the CIDI-SF. Table 2 presents data on the proportions of NCS respondents who endorsed these stem questions along with conditional probabilities of meeting full DSM-III-R criteria for each of the CIDI syndromes among stem endorsers. The syndromes are presented in rankorder of stem question endorsement probabilities, from lowest to highest. As shown in the first column of Table 2, there is enormous variation in the proportion of NCS respondents who endorsed stem questions for the different disorders, from a low of 13.9% for GAD to a high of 71.3% for AD. The range of conditional probabilities of meeting full lifetime diagnostic criteria among stem endorsers is much more narrow, from a low of 0.198 for AD to a high of 0.370 for GAD. yielded optimal information. A parallel set of recursive partitioning models (Breiman et al. 1984) was estimated to search for significant non-additivities in the data. However, the additive specification was always found to yield better results in terms of cross-validated prediction accuracy. So we focused on additive models in further steps of model building. Second, once the optimal number of predictors was selected, a series of all possible subsets regression models was estimated from the full set of predictors included in the stepwise procedure. All possible subsets regression is a procedure for generating a number of roughly equivalent models of pre-specified complexity from a larger set of predictors. To take a hypothetical example, let’s say we began with 20 predictors and discovered through stepwise analysis that no more than six of these were needed to capture the significant additive effects on the outcome. There are 1140 different subsets of six items among 20 predictors. Many of these subsets will explain nearly as much variance in the outcome as the optimal subset. In order to investigate this for the CIDI symptom questions, we investigated the top 10 prediction equations, in terms of explained variance, for each syndrome. In each case we found that all 10 sets had coefficients of multiple determination that differed only trivially (i.e. in the fourth decimal place). Third, in order to select among these functionally equivalent subsets, we carried out a series of more complex analyses of each one. These included the use of constrained regression analysis to test for the equivalence of slopes across items and comparisons of linear versus non-linear specifications of the functional form between symptom counts and the outcome. The bestfitting models in virtually all cases were additive with equal slopes. Therefore, the final decision about functional form depended on whether a linear or nonlinear specification was used. In the linear specification, the only predictor variable was a count of the number of symptoms endorsed. In a non-linear specification there were separate dummy coded predictor variables (i.e. coded 0 or 1) to distinguish respondents who endorsed exactly one symptom, exactly two symptoms, and so forth, through all the symptoms. An F-test of the significance of the explained variance increment in the non-linear model versus the linear model was used to decide which of the two models was a better fit. In all cases the non-linear specification was found to be superior to the linear specification. The WHO CIDI Short-Form (CIDI-SF) Table 1: CIDI 1.0 stem questions for CIDI-SF syndromes Syndrome Generalized anxiety disorder (GAD) Stem questions Have you ever had a period of a month or more when most of the time you felt worried and anxious? (IF YES) What is the longest period you’ve had of feeling worried and anxious? Some people have such an unreasonably strong fear of being in a crowd, leaving home alone, traveling on buses, cars and trains, or crossing a bridge, that they always get very upset in such a situation or avoid it altogether. Did you ever go through a period when being in such a situation always frightened you badly? Have you ever had a spell or attack when all of a sudden you felt frightened, anxious or very uneasy in situations when most people would not be afraid or anxious? Have you used any of these medicines in Part A (‘Part A’ refers to a CARD PRESENTED TO THE RESPONDENT FOR VISUAL INSPECTION THAT CONTAINS A ‘PART A’ LIST OF PRESCRIPTION DRUGS THAT CAN BE ABUSED AND A ‘PART B’ LIST OF ILLICIT DRUGS) on your own more than five times when they were not prescribed for you, either to relax, feel better, to feel high, or feel more active or alert. Now I’d like to ask you about your experiences with other drugs. Look at the drugs in Part B on the card. Have you ever used any of those more than five times? Have you ever taken any other drug more than five times on your own either to get high, to relax, or to make you feel better, more active or alert? Some people have such a strong unreasonable fear of doing things in front of others, like speaking in public, that they avoid it or feel extremely uncomfortable or uneasy about doing them. Have you ever had such a strong, unreasonable fear of: (i) speaking in public? (ii) having to use a toilet when away from home? (iii) eating or drinking in public? (iv) talking to people because you might have nothing to say or might sound foolish? (v) writing while someone watches? (vi) talking in front of a small group of people? There are other things that make some people so unreasonably afraid that they try to avoid them. Have you ever had a strong unreasonable fear of: (i) heights? (ii) flying? (iii) seeing blood? (iv) storms, thunder or lightening? (v) snakes, birds, rats, insects or other animals? (vi) closed spaces? (vii) getting a (shot/injection) or going to the dentist? (viii) being in water, like a swimming pool or lake? In your lifetime, have you ever had two weeks or more when nearly every day you felt sad, blue, depressed? Have there ever been two weeks or longer when you lost interest in most things like work or hobbies or things you usually like to do for fun? Now I am going to ask some questions about your use of alcoholic beverages like (BEVERAGES POPULAR LOCALLY – BEER, WINE, OR LIQUOR). In your entire lifetime, have you ever had at least 12 drinks of any kind of alcoholic beverage? (IF NO, PROBE WITH THE FOLLOWING QUESTION) Not even if you count having wine with meals, or beer at a sports event, or champagne at a wedding? (IF YES TO EITHER OF FIRST TWO QUESTIONS) In the past 12 months, did you have at least 12 drinks of any kind of alcoholic beverage? (IF NO) At any one year period of your entire life, did you have at least 12 drinks of any kind of alcoholic beverage? Agoraphobia (AG) Panic attack (PA) Drug Dependence (DD) Social phobia (SoP) Simple phobia (SiP) Major depressive episode (MDE) Alcohol dependence (AD) Kessler et al. Table 2: Prevalences of stem question endorsement and conditional probabilities of meeting full CIDI/DSM-III-R diagnostic criteria for the syndromes among stem question endorsers in the NCS Prevalence of stem question % Generalized anxiety disorder (GAD) Agoraphobia (AG) Panic attack (PA) Drug dependence (DD) Social phobia (SoP) Simple phobia (SiP) Major depressive episode (MDE) Alcohol dependence (AD) 13.9 18.4 27.9 32.0 38.7 51.6 56.0 71.3 (se) (0.4) (0.4) (0.5) (0.5) (0.5) (0.6) (0.6) (0.5) Conditional probability of meeting full criteria P (se) 0.370 0.363 0.260 0.235 0.345 0.218 0.305 0.198 (1.4) (1.2) (0.9) (0.8) (0.8) (0.6) (0.7) (0.5) Syndrome As noted in the introduction, one way of screening for these disorders would be to administer only the stem questions and assign each endorser the appropriate predicted probability of caseness. If the associations between the stems and caseness in a new study were the same as those in the NCS, this very brief screening method would yield fairly accurate total sample prevalence estimates. Or, alternatively, a new survey could administer the stems to all respondents and then administer the full CIDI only to a probability subsample of endorsers to generate study-specific conditional probabilities of CIDI caseness. Predicting caseness among endorsers of the stem questions The first column in Table 3 reports the number of CIDI symptom questions for each CIDI-SF syndrome. These were the numbers of predictors included in the stepwise regression models to improve individual-level prediction accuracy of CIDI caseness among stem question endorsers. The remainder of the table shows the number of these predictors found to add significantly to explained variance and the percentage of variance in the full CIDI caseness measures explained by this set of predictors in multiple regression analysis. As shown in Table 3, the number of predictors that added significantly to the stepwise prediction equations was much smaller in every case than the total number of predictors considered. This demonstrates that there is empirical redundancy among the CIDI symptom questions, allowing some reduction in administration time by relying on psychometric analysis to select the most informative subsets of items. The percentages of explained variance are all substantial, from a low of 44.4% for MDE to a high of 99.3% for AG. It is important to remember that these are variance components in the subsamples of stem question endorsers. The respondents who did not endorse stem questions, of course, are always classified as non-cases and this is always perfectly consistent with the classification in the full CIDI. Individual-level prediction accuracy A more concrete sense of the overall predictive accuracy of the CIDI-SF in comparison to the CIDI is presented in Table 4, where we show the sensitivity, specificity, positive predictive value, negative predictive value, and percentage classification accuracy for the CIDI-SF scales at their optimal dichotomous caseness cutpoints. An optimal cutpoint is defined here as the cutpoint on the symptom count scale that yields the highest percentage classification accuracy on the full CIDI. As shown in the table, very high percentages of the CIDI cases are correctly classified in the short-forms, with a range of correct classification (sensitivity) between 77.0% for DD and 100% for AG. The percentages of CIDI non-cases that are correctly classified in the short-forms (specificity) are also consistently quite high, with a range between 93.9% for MDE and 99.9% for DD. The percentages of CIDI-SF cases that are confirmed in the full CIDI (positive predictive value) range between 75.7% for MDE and 100% for AG. The percentages of CIDI-SF non-cases that are confirmed (negative predictive value) range between 86.9% for MDE and 100% for AG. The percentages of overall classification accuracy are excellent for all syndromes, ranging from 93.2% for MDE to 99.6% for GAD. The WHO CIDI Short-Form (CIDI-SF) Table 3: Numbers of symptom questions used in stepwise regressions, numbers of significant predictors in optimal prediction equations, and explained variances in the CIDI caseness measures Number of symptom questions used 23 13 18 18 9 9 21 18 Number of significant predictors 7 2 6 6 4 3 7 6 Proportion of explained variance 66.4 99.3 69.9 78.3 84.2 61.0 44.4 78.7 Syndrome Generalized anxiety disorder (GAD) Agoraphobia (AG) Panic attack (PA) Drug dependence (DD) Social phobia (SoP) Simple phobia (SiP) Major depressive episode (MDE) Alcohol dependence (AD) Table 4: Sensitivity (SENS), specificity (SPEC), positive predicitive value (PPV), negative predictive value (NPV), and total classification accuracy (TCA) of CIDI-SF scales compared to CIDI/DSM-III-R diagnoses in the NCS1 Syndrome Generalized anxiety disorder (GAD) Agoraphobia (AG) Panic attack (PA) Drug dependence (DD) Social phobia (SoP) Simple phobia (SiP) Major depressive episode (MDE) Alcohol dependence (AD) SENS 96.6 100.0 90.0 77.0 86.3 92.9 89.6 93.6 (se) (0.9) (–) (1.0) (1.5) (1.0) (0.8) (0.8) (0.8) SPEC 99.8 99.9 99.5 99.9 98.9 96.2 93.9 96.2 (se) (0.1) (0.1) (0.1) (0.1) (0.1) (0.2) (0.3) (0.2) PPV 96.8 99.6 96.2 98.2 92.4 76.4 75.7 80.2 (se) (0.9) (0.3) (0.6) (0.5) (0.8) (1.3) (1.0) (1.2) NPV 99.8 100.0 91.6 98.2 97.9 99.1 86.9 98.9 (se) (0.2) (–) (0.3) (0.2) (0.2) (0.1) (0.4) (0.1) TCA 99.6 99.9 98.4 98.2 97.2 95.9 93.2 95.8 (se) (0.1) (0.1) (0.1) (0.1) (0.1) (0.2) (0.3) (0.2) 1 SENS is the proportion of CIDI cases correctly classified in the CIDI-SF; SPEC is the proportion of CIDI non-cases correctly classified in the CIDI-SF; PPV is the proportion of CIDI-SF cases confirmed in the CIDI; NPV is the proportion of CIDI-SF non-cases confirmed in the CIDI; and TCA is the percentage of respondents whose CIDI-SF classification is the same as their classification on the CIDI. Subgroup variation in prediction accuracy As noted above in the method section, the CIDI-SF items were selected with an eye towards minimizing subgroup variation in scale performance. Results are presented graphically in Figures 1–7. Each figure shows the association between scores on one of the CIDI-SF scales and probability of full CIDI caseness in the total sample as well as in subsamples of males and females. At each point on the scales we also present bars to indicate the highest and lowest probabilities of caseness across all 16 subgroups considered in the analyses (two for age, two for gender, three for race/ethnicity, two for education, four for region of the country, and three for urbanicity). In each case, respondents who failed to endorse the diagnostic stem question are coded zero on the scale along with respondents who endorsed the stem question but were negative on all symptom questions. No figure is presented for AG, as the classification accuracy was so near to perfect in the total sample that meaningful subsample differences did not exist. The figures show that there is a consistently strong monotonic relationship between scores on the CIDISF scales and the full CIDI for all syndromes. The general shape of the total sample curves can be seen as well in subsamples. There is no evidence of systematic statistically significant subgroup differences of a sort that would be indicated by a consistent tendency for one subgroup to have higher or lower conditional probabilities of caseness than others at several points along any Kessler et al. short-forms in failing to discriminate at the top end. However, it is worth noting that the conditional probabilities of CIDI caseness are all quite high in the range of these scales above the inflection point. This means that the inflection point is more accurately interpreted as a ceiling effect than as a sensitivity failure at the high end of the scales. Timing estimates The length of the screening battery is a critical issue in epidemiological surveys. This is especially so in large general-purpose surveys such as the US National Health Interview Survey, where demands for space are made by many different researchers who have many different substantive interests. Because of the large NHIS sample size (over 50,000 respondents each year), the fact that the survey is carried out face-to-face rather than over the telephone, and the wide geographic dispersion of the sample (over 6 million square miles), the average total administration cost of NHIS questions is about $200,000 per minute per year. Seconds count in a situation of this sort. In order to obtain timing estimates for the CIDI-SF in a general population survey, the full set of eight scales was included in a nationally representative pilot for a survey on mid-life development carried out with of the scales. Furthermore, the highest subgroup conditional probabilities at a given scale score are, for the most part, lower than the lowest subgroup conditional probabilities at the next higher score. There are only two exceptions to this generally positive picture. The first is a low conditional probability of CIDI caseness for SoP among respondents with high education who scored 2 on the CIDI-SF scale for this disorder. This is a true outliner in the sense that the low conditional probability is confined to this one subgroup and is considerably below the conditional probability among respondents in this same subsample who scored 1 on the short-form scale. The second exception is a pattern of widely varying conditional probabilities of CIDI caseness for GAD at intermediate scores (in the range 2–5) on the CIDI-SF scale for this disorder. This instability is due to the small number of respondents out of the 8098 in the NCS who scored 2 (n = 19), 3 (n = 24), or 4 (n = 70) on this CIDI-SF scale. It is noteworthy that there is an inflection point at the high end of most CIDI-SF scales after which the curve becomes fairly flat. This indicates that the probability of true CIDI caseness above the inflection point is not strongly related to short-form scores. A pattern of this sort might be interpreted as a weakness of the NE = Northest, MW = Midwest, So = South, WE = West, ME = Metro, UR = Urban, RU = Rural, WT = White, BL = Black, HI = Hispanic. YO = Young, Old = Old, HEd = Higher Education, LEd = Lower Education Figure 1: The association between scores on the CIDI-SF Generalized Anxiety Disorder Scale and probability of caseness in the full CIDI The WHO CIDI Short-Form (CIDI-SF) NE = Northest, MW = Midwest, So = South, WE = West, ME = Metro, UR = Urban, RU = Rural, WT = White, BL = Black, HI = Hispanic, YO = Young, Old = Old, HEd = Higher Education, LEd = Lower Education Figure 2: The association between scores on the CIDI-SF Panic Scale and probability of caseness in the full CIDI NE = Northest, MW = Midwest, So = South, WE = West, ME = Metro, UR = Urban, RU = Rural, WT = White, BL = Black, HI = Hispanic, YO = Young, Old = Old, HEd = Higher Education, LEd = Lower Education Figure 3: The association between scores on the CIDI-SF Drug Dependence Scale and probability of caseness in the full CIDI Kessler et al. NE = Northest, MW = Midwest, So = South, WE = West, ME = Metro, UR = Urban, RU = Rural, WT = White, BL = Black, HI = Hispanic. YO = Young, Old = Old, HEd = Higher Education, LEd = Lower Education Figure 4: The association between scores on the CIDI-SF Social Phobia Scale and probability of caseness in the full CIDI NE = Northest, MW = Midwest, So = South, WE = West, ME = Metro, UR = Urban, RU = Rural, WT = White, BL = Black, HI = Hispanic. YO = Young, Old = Old, HEd = Higher Education, LEd = Lower Education Figure 5: The association between scores on the CIDI-SF Simple Phobia Scale and probability of caseness in the full CIDI The WHO CIDI Short-Form (CIDI-SF) NE = Northest, MW = Midwest, So = South, WE = West, ME = Metro, UR = Urban, RU = Rural, WT = White, BL = Black, HI = Hispanic. YO = Young, Old = Old, HEd = Higher Education, LEd = Lower Education Figure 6: The association between scores on the CIDI-SF Major Depressive Episode Scale and probability of caseness in the full CIDI NE = Northest, MW = Midwest, So = South, WE = West, ME = Metro, UR = Urban, RU = Rural, WT = White, BL = Black, HI = Hispanic. YO = Young, Old = Old, HEd = Higher Education, LEd = Lower Education Figure 7: The association between scores on the CIDI-SF Alcohol Dependence Scale and probability of caseness in the full CIDI Kessler et al. surveys carried out in a number of different countries using DSM-III, but also DSM-IV and ICD-10, criteria. In addition, their work will expand on the NCS analyses to include the full range of syndromes included in the CIDI, a number of which were not in the NCS. In the case of syndromes where the validity of the CIDI has been called into question, the expansion of the CIDI-SF will be carried out in conjunction with a broader effort on the part of the WHO CIDI Advisory Committee to improve the evaluation of this syndrome in the CIDI. A good case in point is non-affective psychosis (NAP), which is not evaluated with acceptable validity in the CIDI (Kendler et al., 1996). Professor Charles Pull is in charge of an effort by the WHO CIDI Advisory Committee to improve evaluation of NAP in the CIDI. Our efforts to develop a CIDI-SF assessment of NAP will be done in collaboration with Professor Pull and will build on his work to modify the assessment of this class of disorders in the full CIDI. Another problem with the current version of the short-form scales is the mismatch between the lifetime time frame of the CIDI questions in the NCS and the 12-month time frame in the CIDI-SF. As noted in the introduction, this was dictated by the fact that the initial motivation and funding for short-form scale development required a 12-month version for use in the US National Health Interview Survey. As part of the expanded development work to be carried out by the ICPE, this mismatch will be resolved. A lifetime version of the CIDI-SF will be developed based on analysis of a number of surveys that used the lifetime version of the CIDI. And a separate 12-month version of the CIDI-SF will be developed based on parallel analysis of surveys that used the recently developed 12-month version of the CIDI. A final limitation is the lack of validity data for the CIDI-SF. It is true that validity data exist for the full CIDI (Wittchen, 1994; Kessler et al., 1998) and that we have presented data in this report suggesting that there is a strong relationship between diagnoses based on the CIDI-SF and the full CIDI. However, it is important to recognize that the data reported in the tables of this paper represent part–whole associations between sets of responses to symptom questions collected in a single survey. Although the reduced sets of symptom questions are those used in the CIDI-SF, responses to these questions might well be different when they are presented as a separate short instrument rather than when they are embedded in a much longer support from the Network on Successful Mid-life Development of the John D. and Catherine T. MacArthur Foundation. Computer assisted interviewing was used and an internal clock employed to calculate the time between keystrokes in order to determine the average length of each CIDI-SF section. The 12month version of the scales was used. Timing estimates are as follows: GAD was assessed in an average of 24 s, with a range of 10–360 s and with 24% of respondents endorsing the stem question. AG was assessed in an average of 30 s, with a range of 2081 s and 14% endorsing the stem question. PA was assessed in an average of 38 s with a range of 13–191 s and 18% endorsing one of the two stem questions. DD was assessed in an average of 89 s with a range of 52–238 s and 17% endorsing the stem question. SoP was assessed in an average of 42 s with a range of 28–97 s and 21% endorsing the stem question. SiP was assessed in an average of 72 s with a range of 35–169 s and 45% endorsing the stem question. MDE was assessed in an average of 55 s with a range of 18–233 s and 33% endorsing at least one of the stem question. AD was assessed in an average of 68 s with a range of 20–181 s and 29% endorsing the stem question. Discussion Limitations Several limitations of the CIDI-SF scales in their current form need to be mentioned. Three of them deal with issues of coverage: • scale construction was designed to maximize concordance with DSM-III-R diagnoses rather than with diagnoses based on the more recent DSM-IV and ICD-10 systems; • scale construction was based on analysis of only a single survey with a restricted age range carried out in only one country; and • short-form scales were not developed for all syndromes covered in the CIDI. All three of these problems are being addressed in planned research to be carried out by the International Consortium in Psychiatric Epidemiology (ICPE), a group of researchers who have carried out CIDI surveys in different countries around the world and are collaborating in cross-national comparisons under the co-ordination of the WHO. The ICPE initiative will attempt to replicate the work reported in this paper in The WHO CIDI Short-Form (CIDI-SF) instrument. Furthermore, some important changes in question wording were made in the CIDI-SF based on pilot testing carried out at the National Center for Health Statistics Cognitive Survey Laboratory, making it even more important to carry out an independent validity study of the CIDI-SF. The most important change occurred in the stem question for MDE. The MDE stem question in the original CIDI asked respondents about a period of ‘two weeks or more when nearly every day you felt sad, blue or depressed’. This question does not include the DSM requirement that the depression must last most of the day. Therefore, the final version of the CIDI-SF added a separate question about duration within a day. Furthermore, based on cognitive laboratory evidence that respondents sometimes did not focus on the ‘nearly every day’ part of the stem question, an additional question was asked about duration throughout the two weeks. In the end, then, the original CIDI stem question was changed into three questions in the CIDI-SF: (i) ‘During the past 12 months, was there ever a time when you felt sad, blue, or depressed for two weeks or more in a row?’ (ii) ‘For the next few questions, please think of the two-week period during the past 12 months when these feelings were worst. During that time, did the feelings of being sad, blue, or depressed usually last all day long, most of the day, about half of the day, or less than half the day?’ (iii) ‘During those two weeks, did you feel this way every day, almost every day, or less often?’ Because of these changes, it is important to carry out separate validity studies of the CIDI-SF rather than to rely on the CIDI validity studies. Although no CIDI-SF validity studies have as yet been done, we plan to do these once the ICPE expansion of the shortform scales is completed. Uses of the CIDI-SF The CIDI-SF diagnostic classifications are not sufficiently precise to replace the more complete diagnostic evaluations made in the full CIDI. However, as noted in the introduction, the full CIDI takes over an hour to administer whereas the CIDI-SF can be completed in an average of less than ten minutes. This means that the CIDI-SF can be a useful first-stage screening measure in large studies with two-phase designs (Newman et al., 1990). The CIDI-SF is also ideal for use in general-purpose epidemiological studies that cannot invest the hour or more needed to administer detailed psychiatric diagnostic interviews but nonetheless want to evaluate the importance of psychiatric disorders as risk factors for other outcomes. An intermediate strategy would be to administer the CIDI-SF to a large sample and then to follow this up with a gold standard clinical reappraisal interview in a stratified subsample of respondents that oversamples those with high predicted probabilities of caseness. The CIDI-SF responses could then be benchmarked against the clinical reinterviews to generate predicted probabilities of caseness for all respondents. Like conventional diagnostic scales, the symptom counts in the CIDI-SF can be dichotomized to generate yes–no caseness designation. However, as the scales were developed based on psychometric analysis rather than logical coverage of all DSM criteria, a feasible alternative might be to work with the continuous measures made up of the symptom counts. If this is done, though, it is important to transform the scale scores so as to assess probability of caseness. Provisional transformations of this sort are available for the current version of the CIDI-SF based on the relationship between scale scores and probabilities of caseness in the NCS. Tables of these transformations can be obtained from the WHO CIDI home page along with a copy of all the CIDI-SF scales. A question can be raised about how to analyse diagnostic data of the sort generated by a continuous measure of this kind with predicted probabilities of caseness. A simple computation of means can be used to estimate prevalences. These estimates should be more accurate when based on continuous measures that code each response for probability of caseness than on dichotomous measures. The probability-of-caseness scores can also be used as continuous predictor variables in regression models that seek to evaluate the impact of psychiatric disorders on other outcomes. Given the strong negative skew in all these scales (the fact that the majority of respondents in general population surveys will have a predicted probability close to zero), though, it would be wise to evaluate the functional form of the relationships between these scales and other outcomes rather than to assume a linear relationship. In cases where two-stage interviewing is used in a single survey, standard procedures exist to compute the standard errors of prevalence estimates (Newman et al., 1990). More complex issues arise in using the continuous versions of the CIDI-SF scales as outcome variables in risk factor models. This complexity arises due to the fact that probability-of-caseness versions of these scales Kessler et al. maximize precision of clinically validated prevalence estimates for a fixed cost. Acknowledgements The work reported here was carried out in conjunction with the International Consortium of Psychiatric Epidemiology (ICPE). More information about the ICPE can be obtained from http://www.hcp.med.harvard.edu/icpe. The work was supported by the John D. and Catherine T. MacArthur Foundation Research Network on Successful Midlife Development and NIMH grants R01MH46376, R01MH52861, R01MH49098, K05MH00507 and T32MH16806. are skewed (the majority of respondents have a score of either zero or close to zero) and have constrained 0–1 ranges. There are a number of different ways to estimate prediction equations with outcomes of this sort. The most flexible approach – and the one we recommend – is weighted logistic regression analysis. The idea here is to create a data file in which each respondent who has a predicted probability greater than zero but less than one is entered as two separate records (as if this one individual were actually two separate people). Both records would have exactly the same scores on all predictor variables but they would have different scores on the outcome, in one case a score of zero and in the other a score of one. The record with a score of one on the outcome would have a weight equal to the respondent’s predicted probability of caseness (p). The record with a score of zero on the outcome would have a weight equal to the respondent’s predicted probability of non-caseness (1 – p). This data file would be analysed using conventional logistic regression with weighted data to estimate parameters (Hosmer and Lemeshow, 1989) and using design-based procedures to estimate confidence intervals (Skinner et al., 1989). It is worth noting that this same approach could be used with the full CIDI and, if so, would represent an attractive alternative to the probability-of-caseness weighting approach recently ‘proposed by Surtees et al. (1997) and Wainwright et al. (1997)’. The basic notion here would be to carry out a fairly substantial clinical validation study as part of a large general population CIDI survey. A stratified probability sample of CIDI respondents, with an oversampling of CIDI cases, would be reinterviewed using a gold standard clinical reappraisal interview. Responses to the CIDI questions would be used to estimate prediction equations for the clinical diagnoses in the subsample of validity study respondents. These prediction equations, in turn, would be used to impute predicted probabilities of caseness to all CIDI respondents in the larger survey. The weighted logistic regression approach, finally, would then be used to analyse the data. The main attraction is that this strategy would correct for invalidity in the CIDI. The downside is that the size of the clinical reappraisal subsample might have to be substantial in relation to the total sample for the inefficiency introduced by weighting to be overcome by the bias reduction to decrease rather than increase total survey error. Further methodological studies are currently under way to investigate the conditions under which this weighting approach might be optimal to

Journal

International Journal of Methods in Psychiatric ResearchWiley

Published: Nov 1, 1998

There are no references for this article.