Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Online assessment of musical ability in 10 minutes: Development and validation of the Micro-PROMS

Online assessment of musical ability in 10 minutes: Development and validation of the Micro-PROMS We describe the development and validation of a test battery to assess musical ability that taps into a broad range of music perception skills and can be administered in 10 minutes or less. In Study 1, we derived four very brief versions from the Profile of Music Perception Skills (PROMS) and examined their properties in a sample of 280 participants. In Study 2 (N = 109), we administered the version retained from Study 1—termed Micro-PROMS—with the full-length PROMS, finding a short-to-long-form correlation of r = .72. In Study 3 (N = 198), we removed redundant trials and examined test–retest reliability as well as convergent, discriminant, and criterion validity. Results showed adequate internal consistency (  = .73) and test–retest reliability (ICC = .83). Findings supported convergent validity of the Micro-PROMS (r = .59 with the MET, p < .01) as well as discriminant validity with short-term and working memory (r ≲ .20). Criterion-related validity was evidenced by significant correlations of the Micro-PROMS with external indicators of musical proficiency ( r = .37, ps < .01), and with Gold-MSI General Musical Sophistication (r = .51, p<.01). In virtue of its brevity, psychometric qualities, and suitability for online administration, the battery fills a gap in the tools available to objectively assess musical ability. Keywords Assessment · Music perception · Musicians · Musical ability · Musical aptitude · Psychometrics Introduction The growth of studies on associations between musical capacities and mental or neural functioning has revived Interest in musical ability has grown continuously over the interest in the creation of tools for the objective measure- past two decades (Zentner & Strauss, 2017). One reason ment of musical abilities (Zentner & Gingras, 2019). This for this development is the increasing corpus of studies development is driven by several factors. First, batteries indicating that musical ability is associated with a range for the assessment of musical abilities can be helpful in of nonmusical abilities. This includes not only general particularizing the musical skills involved in nonmusi- auditory skills (Grassi et al., 2017), but also reading, pho- cal ability or impairment. For example, dyslexia appears nological awareness, second language abilities, memory, to be primarily related to impairments in the perception executive functions, socio-emotional skills, and motor and reproduction of rhythm, rather than to other musi- skills (see Thaut & Hodges, 2018; Sala & Gobet, 2020, cal deficits (Bégel et  al., 2022; Boll-Avetisyan et  al., for an overview of relevant studies). Beyond their rele- 2020). Second, the ever-growing body of evidence on vance for basic research, a deeper understanding of these the benefits of music training often remains unclear as to associations also holds the potential to inform treatment whether outcomes are due to the music training itself or approaches to various conditions, such as dyslexia, demen- to preexisting individual differences in musical ability. tia, or autism spectrum disorder (e.g., Boll-Avetisyan Music ability tests can help disambiguate such findings. et al., 2020; Brancatisano et al., 2020; Lam et al., 2020; Third, even when outcomes can be attributed to musi- Marquez-Garcia et al., 2022). cal intervention, it may remain unclear as to whether the outcome was driven by an improvement in musical skills or by nonmusical features of the intervention. In all these * Marcel Zentner cases, objectively assessed musical skills can help rule marcel.zentner@uibk.ac.at out alternative explanations and enrich interpretation of Department of Psychology, University of Innsbruck, the findings. Innsbruck, Austria Vol.:(0123456789) 1 3 Behavior Research Methods Although batteries for the assessment of musical abilities Table 1 Overview of recent musical aptitude tests assessing general have been available for a century, in recent times, musical musical perception ability proficiency has more often been indirectly inferred from Name of test Year Subtest domain Trials musicianship status rather than measured directly by stand- MET 2010 M, R 104 ardized musical ability tests (Zentner & Gingras, 2019). One SMDT 2014 M, R, P 63 reason for this practice is that musical aptitude batteries cre- PROMS 2012 A, ER, L, M, P, R, TB, TE, TU 162 ated in the past century were primarily designed for use in PROMS-S 2017 A, ER, M, P, R, TB, TE, TU 68 educational contexts, such as for determining which children Mini-PROMS 2017 A, M, TE, TU 36 might be best suited for admission to music schools, learning an instrument, or playing in a band. Another reason relates to Note. Information compiled from Zentner and Gingras (2019). shortcomings of earlier batteries, such as outdated, unwieldy A Accent, ER Embedded Rhythms, L Loudness, M Melody, MET formats; deficiencies in stimulus design and control; and Musical Ear Test, Mini-PROMS abridged version of the Profile of Music Perception Skills, P Pitch, PROMS Profile of Music Perception gaps in psychometric evaluation and documentation (see Skills, PROMS-S Profile of Music Perception Skills-Short, R Rhythm, Zentner & Gingras, 2019). SMDT Swedish Musical Discrimination Test, TB Timbre, TE Tempo, In recognition of these limitations, over the past decade, TU Tuning. investigators have developed a number of musical aptitude A detailed description and illustration of the stimulus material used tests with better psychometric, acoustic, and practical prop- in the PROMS subtests is provided in Law and Zentner (2012). erties (Zentner & Gingras, 2019). These advantages have led to increased use of musical ability batteries in domains that are concerned with music, notably psychology and neurosci- especially when responses are scored and recorded auto- ence. Although this development is important in bolstering matically on the hosting platform. Finally, a test that can the credibility of relevant research findings, most current be administered online has great versatility as it can be batteries assess a relatively narrow range of musical skills, administered in such diverse environments as a school, under usually skills in melody and rhythm discrimination (see individual testing conditions in a laboratory, or on adult par- Zentner & Gingras, 2019, for an overview). Because musi- ticipants’ personal computers outside the laboratory. These cal ability encompasses aspects beyond melody or rhythm benefits became particularly obvious during the 2020–2022 perception, such batteries may have limited content valid- pandemic, when in-person or laboratory testing was difficult ity. The Profile of Music Perception Skills (PROMS) was or even impossible in many parts of the world. devised to assess a broader range of musical discrimination Online implementation and delivery of the PROMS skills, including those in the domains of timbre, tempo, tun- works through LimeSur vey (LimeSurvey GmbH n.d). ing, and nonmetric rhythm perception. LimeSurvey is a powerful, open-source survey web applica- The PROMS exists in several versions that have all been tion that provides a great deal of flexibility for customizing shown to be both valid and reliable (Law & Zentner, 2012; online assessments by embedding JavaScript. This allows Zentner & Gingras, 2019; Zentner & Strauss, 2017). Recent us, for example, to determine users’ operating systems and evidence includes demonstrations of large differences browsers in order to adapt the presentation of trials to user between musicians and nonmusicians on Mini-PROMS test specifications, which minimizes risk of technical errors scores (e.g., Sun et al., 2021; Vanden Bosch der Nederlanden and ensures a stable test delivery environment. Researchers et al., 2020), as well significant associations of PROMS-S interested in using the PROMS for research purposes receive scores with brain activation patterns involved in music pro- a research account that provides access to their own use of cessing (Rajan et al., 2021). Table 1 provides an overview the PROMS. of key characteristics of these versions, along with other Although one of the strengths of the PROMS is that it recently developed musical ability tests. A more detailed allows assessment of music perception skills that are miss- overview of musical ability tests for general and special ing from other music aptitude batteries (e.g., discrimination populations is provided in Table S1. skills in the domains of timbre, tuning, or tempo), sometimes A distinctive feature of the PROMS is that it has been optimized for online administration (Zentner & Strauss, The account offers researchers the possibility to tailor the online 2017). This development has been motivated by the advan- assessment to their needs, for example, by adding questionnaires that tages offered by web-based data collection, including the are of interest to the researcher. Researchers also have the option of ability to (a) reach more diverse samples, as well as rare or providing participants feedback on their results at the end of the test. Responses can be accessed by researchers at any time and down- specific subpopulations; (b) recruit a larger number of par - loaded in multiple formats, including csv or SPSS files. The data are ticipants, who provide higher statistical power; (c) conduct securely stored on university servers. The battery can be delivered in cross-cultural studies without significant recruiting chal- several languages. For more information, see: https:// musem ap. org/ lenges; and (d) run studies more quickly and inexpensively, resou rces/ proms/ proms- user- guide 1 3 Behavior Research Methods investigators are primarily interested in obtaining an over- Overview of studies all summative score of musical ability. This is likely to be the case, for instance, when musical aptitude is included In light of the preceding review and considerations, the as a control variable; when it is included as a secondary overarching objective of the current research was to derive variable; when changes in performance over time need to a battery from the full-length PROMS that could (a) be be assessed for musical abilities overall rather than domain administered online in no more than 10 min, (b) retain the by domain; or when musical ability needs to be assessed for broadest possible range of musical dimensions characteristic screening purposes. Furthermore, most current music apti- of the original PROMS, and (c) provide a valid and reliable tude batteries take at least 15 min to complete. When time score of overall perceptual musical ability. To this end, we with participants is limited, or when examining populations conducted three studies. In Study 1, we screened trials from with limited attentional resources such as children or special the full-length PROMS for inclusion in four very brief ver- populations, researchers may find that a battery that takes sions of the PROMS and evaluated their properties regarding even 15 min is too long. brevity, difficulty, reliability, and validity. In Study 2, the For these reasons, we sought to devise a musical test bat- version providing the best trade-off between these proper - tery that could be administered in no more than 10 min, all ties—termed Micro-PROMS—was administered with the while retaining the broad range of musical dimensions that full-length PROMS to examine short-to-long version cor- is distinctive of the PROMS. The development of short ver- relations and compare key psychometric properties of the sions of test batteries presents some challenges, however. two versions. In Study 3, we examined test–retest reliabil- First, although test trials or trials of a short version are usu- ity, convergent validity of the battery with the Musical Ear ally all included in the full-length form, one cannot assume Test, discriminant validity against short-term and working that the reliability and validity evidence of the full-length memory, and criterion validity with multiple separate indica- form automatically extends to the abbreviated form (Smith tors of musical proficiency. et al., 2000). As a consequence, it is essential to establish the reliability and validity of the new measure independently. Second, the examination of associations between the full- length and the abbreviated version requires the two versions Study 1 to be administered separately to avoid inflated estimates resulting from correlated error variance. Method Third, if the full-length version of a test has a multidimen- sional structure, the content validity of the short version’s Participants overall score is contingent on preserving the diversity of the domains of the long version. Such preservation is under- Participants were 280 students (174 female, 106 male, 0 mined, for example, if item selection for the short version is other) from the University of Innsbruck, aged 18 to 69 years one-sidedly based on statistical criteria, such as maximizing (M = 24.35, SD = 7.70, Mdn = 22). Six (2.1%) partici- internal consistency. This can lead to overrepresentation of pants considered themselves to be professional musicians, items from particular subscales that correlate more highly 43 (15.4%) semiprofessional musicians, 142 (50.7%) ama- with the overall score than do items from other subscales, teur musicians, 81 (15.4%) music-loving nonmusicians, and thereby reducing content validity (Kruyen et al., 2013; Smith eight (2.9%) nonmusicians. Of those classified as amateur et al., 2000). Thus, in shortening a test, researchers need musician or above (n = 191), 150 reported that they were to balance reliability and validity criteria, ideally attaining still practicing their instrument regularly, corresponding to a satisfactory reliability without sacrificing validity. proportion of 54.6% of musically active participants against Fourth, because of these problems of construct under- 45.4% of either nonactive amateurs or nonmusicians. or misrepresentation, extensive validation is particularly important in the case of abbreviated tests. This includes Creation of the Micro‑PROMS demonstrations of convergent validity, criterion validity, and discriminant validity. In the current case, this means that the Trials for the Micro-PROMS were taken from all subtests instrument should show high correlation with other musical of the full-length PROMS with the exception of the Loud- aptitude tests whose validity has already been established, ness subtest, which had already been removed from previ- and it should exhibit significant correlations with external ous PROMS versions because of its weak correlations with indicators of musical proficiency. At the same time, the test other subtests (Zentner & Strauss, 2017), and the Embedded should not be unduly related to generic, nonmusical skills Rhythm subtest, which would have required special instruc- of audition and cognition that a musical aptitude test might tions and practice trials, making the test longer than desired. nonetheless inadvertently tax if not carefully designed. 1 3 Behavior Research Methods The selection of trials was based on data from a sample to affect the performance. Participants are asked to indicate of 667 participants, cumulated over seven different studies whether reference and comparison are the same or different conducted both in the laboratory and remotely online. Trials by selecting one of five answer options: “Definitely same,” were retained for inclusion according to general principles of “Probably same,” “Probably different,” “ Definitely differ - item analysis, notably item difficulty, skewness, item-to-total ent,” and “I don’t know.” The distinction between “prob- correlation, and test–retest performance of individual trials. ably” and “definitely” was introduced in line with signal In this selection process, statistical criteria (e.g., high item- detection theory (Macmillan & Creelman, 2005; see also to-total correlations) were weighed against validity require- Hautus et al., 2021) to account for confidence level, whereas ments (e.g., representation of the broadest possible range of the “I don’t know” option was provided to reduce guessing. musical dimensions of the PROMS) so as to achieve a good For each correct answer, participants received 1 point for balance between reliability and validity. high confidence ratings (i.e., “Definitely same/different”) and Unlike previous PROMS versions, in which instructions 0.5 points for lower confidence ratings (i.e., “Probably same/ and practice trials are specific to each of the subtests, we different”). Wrong answers and “I don’t know” answers were aimed at including only one general instruction and one set scored with 0 points. of practice trials to make the test more time-effective. The As a commonly used paradigm in perceptual discrimi- use of a single instruction also made it possible to present nation tasks, confidence ratings allow for finer sensory trials either in fixed sequence, wherein trials belonging to judgments compared to the traditional binary forced choice the same subtest were presented in successive order and the (Mamassian, 2020). Although there is evidence suggesting order of subtests was also fixed, or in random sequence, that answer format using confidence ratings rather than a in which trials from any subtest could be preceded or fol- yes/no answer format only affects results in terms of percent lowed by trials from any other subtest. In consideration of correct responses, and not in terms of d  or association with these aspects, we created four different trial sets, described criteria (Goshorn & Goshorn, 2001), recently developed in detail in the next section. methods can fully account for confidence ratings (Aujla, 2022). One such measure is Vokey’s (2016) d , which is Measures derived from fitting receiver operating characteristic (ROC) curves via principal component analysis and will be reported Micro‑PROMS The four versions of the new PROMS had here as a more robust alternative to traditional sensitivity the following characteristics: Version 1 included four tri- measures such as d  or d . Some participants achieved hit als from five subtests of the full-length PROMS (Melody, and false-alarm rates of 0 or 1, indicating that they correctly Tuning, Accent, Rhythm, and Timbre), resulting in 20 trials identified either all or none of the stimuli of a given class that were presented in fixed order. Version 2 was identical to (i.e., same or different correct) and confidence level. In line Version 1, except that trials were presented in random order. with the specialized literature, these values were adjusted Versions 3 and 4 included three to five trials from seven using the 1/(2N) rule (Hautus, 1995; Macmillan & Kaplan, subtests of the full-length PROMS (the five subtests of Ver - 1985). sions 1 and 2, plus trials from the Pitch and Tempo subtests), We should note that throughout the analyses reported resulting in 23 trials that were presented in either fixed order in this research, using raw scores, d scores, or alternative (Version 3) or in random order (Version 4). For the fixed d-prime measures such d  (Macmillan &  Kaplan, 1985) led order versions, subtest order was balanced by alternating to very similar findings and the same conclusions. Hence, structural and sensory subtests (see Law & Zentner, 2012). we always report d in the descriptive sections of the results For Version 1, trials grouped by subtest were presented in but use raw scores for the correlational analyses for easier the following order: Melody, Timbre, Accent, Tuning, and interpretation. The code for computing sensitivity estimates Rhythm. For Version 2, the respective order was: Melody, for the Micro-PROMS can be found under the OSF link pro- Timbre, Tempo, Tuning, Rhythm, Pitch, and Accent. The vided at the end. characteristics of the four versions are summarized in the first three rows of Table  2. As with all previous PROMS versions (see Law & Zent- A small minority of participants (5.7%) only provided definitely ner, 2012; Zentner & Strauss, 2017), participants are pre- same or definitely different answers. Due to the absence of less than sented a reference stimulus twice, separated by an inter- fully confident answers, d values could not be computed and were replaced with d-prime values computed from the individual hit and stimulus interval of 1.5 sec, followed by the comparison false-alarm rates of the collapsed rating categories using the same- stimulus after an interval of 2.5 sec. The reference stimulus different paradigm from the psyphy-package (v 0.2-3; Knoblauch, was presented twice to facilitate its encoding, thereby leav- 2022; see also Hautus, 1995). Removing these participants did not ing less room for individual differences in memory capacity change the results. 1 3 Behavior Research Methods Table 2 Descriptive summaries of test performance for the four initial Micro-PROMS versions Features Version 1 2 3 4 Order of trials Fixed Random Fixed Random Number of trials 20 20 23 23 Subtests M, TU, A, R, TB M, TU, A, R, TB, P, TE Number of participants 74 72 60 74 Mean test duration in min (SD) 9.39 (2.11) 9.48 (1.77) 10.11 (1.44) 10.41 (1.31) Mean total score (SD) 14.52 (2.73) 14.86 (2.32) 15.23 (2.62) 15.39 (2.53) Mean d (SD) 1.85 (0.75) 1.94 (0.78) 1.77 (0.77) 1.77 (0.68) Mean item-to-total correlation .37 .32 .33 .32 Omega (ω) .73 .64 .63 .61 % of trials answered correctly 79.5% 81.1% 74.9% 78.9% Correlation with music training .53* .43* .43* .39* Note. PROMS Profile of Music Perception Skills, M Melody, TU Tuning, A Accent, R Rhythm, TB Timbre, P Pitch, TE Tempo. Correct answers combined regardless of confidence level. * All ps < .01. The instructions asked participants to take the test in a Music training As in earlier studies that examined the PROMS (Law & Zentner, 2012; Zentner & Strauss, 2017), quiet environment and to use headphones. After answer- ing questions about their sociodemographic background information on participants’ musical proficiency was assessed with multiple indicators: (a) self-rated level of and music training, a script randomly assigned partici- pants to one of the four Micro-PROMS versions. Each musicianship (0 = nonmusician; 1 = music-loving nonmu- sician; 2 = amateur musician; 3 = semiprofessional musi- version was preceded by a general instruction page, a sound calibration page to set the volume, and a practice cian; 4 = professional musician; (b) musical activity (0 = not playing an instrument or singing; 1 = used to play an trial to familiarize participants with the tasks. On average, the full sessions took 14.5 min to complete (SD = 4.2, instrument or sing but no longer practicing; 2 = regularly practicing an instrument or singing for 0–5 hours a week; Mdn = 13.9), whereas the duration of the battery itself was slightly under 10 min on average (Mdn = 9.8). 3 = regularly practicing an instrument or singing for 6–12 hours a week; 4 = regularly practicing an instrument or sing- In light of the exploratory and preliminary nature of the study, two criteria were used to determine sample size. For ing for 13–30 hours a week; 5 = regularly practicing an instrument or singing for more than 30 hours a week); and item analyses, we sought to achieve a case-to-trial ratio of 3:1. For preliminary evidence regarding validity, we (c) participants’ years of music training in relation to their age. Because the three variables were internally consistent determined the sample size based on previous correlations between PROMS scores and external indicators of music (ω = .83), they were z-transformed and the mean across the three variables used as a composite index reflecting individ- training, which were in the range of .35–.50 (Law & Zent- ner, 2012; Zentner & Strauss, 2017). Analyses using the uals’ extent of music training and music-making, henceforth termed “music training” for the sake of brevity. R package pwr (version 1.3-0; Champely, 2020) indicated that a total of N = 244 (n = 61 per group) would suffice for Procedure the present purposes. The test was administered autonomously online, hosted on Results and discussion LimeSurvey (version 2.64.1+; LimeSurvey GmbH n.d). By embedding a JavaScript code, users’ operating systems A summary of the main characteristics and outcomes of each of the four test versions is presented in Table 2. Regarding and browsers are recognized and test delivery is adapted accordingly. This ensures stable test delivery regardless of internal consistency, the version in which trials from five PROMS subtests were presented in fixed order performed the device or browser being used, and it also greatly reduces susceptibility to technical problems, such as delayed stimu- best (Version 1), followed by Version 2, 3, and 4. In light of our goal to retain as many subtests of the original PROMS lus delivery. The tool also allows researchers to track test- taking times for each of the test components. as possible to ensure content validity, we retained Version 3 1 3 Behavior Research Methods for further development (henceforth termed Micro-PROMS). a better balance of subtests with differing number of tri - We accepted the slightly lower internal consistency in als. Subtests were presented in the following order: Melody, view of the fact that some lesser-performing trials could Tuning, Timbre, Tempo, Rhythm, Pitch, and Accent. be replaced in subsequent studies to attain higher internal consistency. Full‑PROMS The PROMS includes nine subtests with 18 trials each for a total of 162 trials. In the original publica- tion, internal consistency of the total score was ω = .95, Study 2 test-retest was ICC = .88,  and,  on average participants had scored 109.60 points (SD = 17.88, see Law & Zentner, Study 2 had three main objectives. First, we aimed at 2012). improving the psychometric properties of the version retained from Study 1 by replacing trials that had performed Music‑Mindedness Questionnaire The Music-Mindedness least well. Second, we examined the extent to which this Questionnaire (MMQ, Zentner & Strauss, 2017) comprises very brief version would be correlated with the full-length a four-item Music Competence scale (e.g., “I can tell when 162-trial version of the PROMS (Full-PROMS). Third, we an instrument is out of tune”) and a four-item Music Appre- sought to broaden the empirical basis for evaluating the new ciation scale (e.g., “Musical experiences belong to the most instrument’s psychometric properties by gathering additional precious experiences in my life”), each rated on a five-point information about its validity. To this end, we administered scale (1 = “not at all”; 5 = “very much”). In the current a questionnaire designed to capture musical competence and study, we used a slightly longer version of the MMQ, com- music appreciation, expecting the Micro-PROMS to exhibit prising seven items per scale. Internal consistency in the higher correlations with the competence than with the appre- current study was ω = .90 for the Music Competence scale ciation part of the questionnaire. To achieve these aims, we and ω = .87 for the Music Appreciation scale. administered both versions of the PROMS and the question- naire to a new sample of listeners. Music training Questions and coding regarding the partici- pants’ music training and background were the same as those Method described in Study 1. The internal consistency of the music training composite score was ω = .86. Participants Procedure Participants were 109 psychology students of the University of Innsbruck (36 male, 73 female, 0 other) aged 18–52 years The procedure was similar to that of Study 1. Participants (M = 23.23, SD = 5.37, Mdn = 22), who completed both the were first administered the Micro-PROMS, followed by the Micro-PROMS and the Full-PROMS and received course MMQ and the Full-PROMS. Informed consent was obtained credit for participation. None of the participants considered from all participants before assessment. We expected the themselves to be professional musicians, eight (7.3%) con- correlation between the short and the long version to be r sidered themselves to be semiprofessional musicians, 34 ≳ .50 and validity correlations with external indicators of (31.2%) amateur musicians, 56 (51.4%) music-loving non- musical proficiency to be r ≳. 30 (see Table 2). Power was musicians, and 11 (10.1%) nonmusicians. Of those classified again estimated using the R package pwr (version 1.3-0; as amateur musicians or above (n = 42), 25 reported that Champely, 2020). Results showed that to detect effects of they still practiced regularly, corresponding to a proportion this size with a power of .80 (α = .05), a sample of at least of 22.9% musically active participants against 77.1% partici- 84 participants would be required. Differences in correla - pants that were either musically nonactive or nonmusicians. tion coefficients were examined using the R package cocor (Diedenhofen & Musch, 2015), using Dunn and Clark’s z Measures (1969). Sensitivity values were computed using the same approach as in Study 1. Micro‑PROMS To rectify some of the psychometric inad- equacies of the retained version, we removed eight trials because of their low difficulty (% correct > 90%) and com- paratively low item-to-total correlations. The trials were The “Loudness” subtest exhibited low correlations with all other replaced by four trials from Version 1 examined in Study 1 PROMS subtests (Law & Zentner, 2012). It was therefore removed in and by one trial taken from the original Full-PROMS, all of the short version of full-PROMS, the PROMS-Short (see Zentner & which had performed well psychometrically. Also, subtest Strauss, 2017). In the current study, it was administered for the sake order was altered slightly from the one in Study 1 to achieve of completeness, but not considered in subtest-level analyses. 1 3 Behavior Research Methods were remarkably similar to those of the longer version. Of Table 3 Side-by-side comparison of descriptive statistics and psycho- metric properties of Micro-PROMS and Full-PROMS particular note was the increase in internal reliability to ω = .75 compared to the version retained from Study 1 (ω = Key metrics Micro-PROMS Full-PROMS .63). We found neither any signic fi ant die ff rences for gender, Number of trials 20 162 nor any significant correlations between the PROMS total Mean item-to-total correlation r = .40 r = .32 scores and age. Reliability ω = .75 ω = .95 Mean test duration in min (SD) 10.92 (3.07) 60.55 (13.35) Preliminary evidence for validity Mean raw score (SD) 11.96 (3.20) 105.16 (19.09) Mean % of trials answered 68.4% 73.8% To obtain initial validity information, we examined asso- correctly ciations of the Micro-PROMS total score with the Music Mean skewness (SD) −0.47 (1.00) −0.97 (2.00) Competence and Music Appreciation subscales of the Mean kurtosis (SD) −0.46 (1.49) 3.51 (15.07) MMQ. Because the PROMS measures musical ability rather Mean d (SD) 1.23 (0.77) 1.57 (0.44) than music appreciation, we expected test scores to be more MMQ Music Competence r = .62** r = .60** strongly related to the MMQ-Competence scale than to the MMQ Music Appreciation r = .31** r = .24* MMQ-Appreciation scale. As anticipated, the correlation Music training composite r = .47** r = .42** of the Micro-PROMS with MMQ-Competence was signifi- cantly higher than with MMQ-Appreciation (r = .62 vs. r = Note. N = 109. PROMS Profile of Music Perception Skills, MMQ Music-Mindedness Questionnaire. .31, z = 3.71, p < .001). Correct answers combined regardless of confidence level. To particularize the unique variance associated with *p < .05. **p < .01. each of the scales, we ran a multiple linear regression with the Micro-PROMS score as the outcome variable and the two MMQ-scales as predictors. When both variables were Results entered simultaneously, only the competence scale main- tained a significant association with the Micro-PROMS (β Descriptive statistics and psychometric properties = 0.60, 95% CI [0.43, 0.77], p < .001), whereas the appre- ciation scale ceased to explain variance in Micro-PROMS The correlation between the Full-PROMS total score and the test scores (β = 0.04, 95% CI [−0.13, 0.21], p = .639). In further support of validity, the Micro-PROMS was strongly Micro-PROMS was r = .72, p < .001 (with Micro-PROMS items removed, r = .66, p < .001). An examination of cor- correlated with the music training composite score repre- senting external indicators of musical proficiency (r = .47, relations between the Micro-PROMS score and individual PROMS subtest correlations is also of interest, for the new p < .001). battery was conceived to represent the broadest possible range of musical dimensions that is distinctive of the full- Discussion length PROMS. This would not be the case, for instance, if the Micro-PROMS exhibited much stronger correlations Study 2 provided further evidence in support of the psycho- metric soundness of the Micro-PROMS. The high intercor- with certain subtests than with others. Thus, if the Micro- PROMS were to correlate highly with Timbre and Pitch but relation between the Micro-PROMS and the Full-PROMS indicates that the total score of the Micro-PROMS provides exhibit low correlations with Accent and Tempo, that would indicate that the Micro-PROMS primarily measures abilities a reasonable approximation to the total score of the full- length PROMS. This finding is noteworthy in light of the in timbre and pitch discrimination, and does not sufficiently account for discrimination abilities in the domains of rhythm drastic reduction of trials from 162 to 20 and the fact that the two versions of the battery were administered sepa- and tempo. Ideally, then, we should find that all subtests exhibit moderately strong correlations with the Micro- rately. Moreover, the changes to trials relative to the version retained from Study 1 led to satisfactory internal consist- PROMS total score. As shown by the individual PROMS subtest correlations with the Micro-PROMS total score, this ency. Validity correlations were similar to those that were obtained with the Full-PROMS. For example, the correla- was indeed the case: r = .48 (Embedded Rhythms), r = .51 (Pitch, Tempo), r = .54 (Tuning), r = .56 (Timbre), r = .58 tion between the music training composite and the Micro- PROMS was r = .47, which is not significantly different (Accent), r = .62 (Rhythm), r = .68 (Melody), all ps < .001. Table  3 provides additional information related to the from the r =.42 correlation found for the Full-PROMS (z = 0.73, p = .463). Finally, content validity was evidenced Micro-PROMS (left column) and the Full-PROMS (middle column), notably raw scores, d  scores, and internal consist- by similar-sized and substantial correlations between the Micro-PROMS and all of the subtests of the Full-PROMS. ency values. Overall, the key metrics of the Micro-PROMS 1 3 Behavior Research Methods This evidence is important in light of our goal to create an of 32 participants took part in a retest session conducted in instrument that would be capable of reflecting the broadest the laboratory after completing the full test battery. possible range of musical skills that can be assessed with the Overall, 14 participants (7.1%) considered themselves original full-length PROMS. as nonmusicians, 79 (39.9%) each as music-loving nonmu- Despite these encouraging results, not all trials performed sicians and amateur musicians, 25 (12.6%) as semiprofes- equally well, suggesting that some lesser-performing trials sional musicians, and 1 (0.5%) as professional musician. The could be removed to further shorten the battery. Moreover, majority of participants reported having either played an the case for validity was limited to correlations with par- instrument or sung at some earlier point in their lives (n = ticipants’ self-report of musical behavior and competence. 47, 23.7%) or that they were still practicing regularly (n = A stronger case for validity could be made if the predicted 87, 43.9%), compared to a third of participants (n = 64) that pattern of convergent and discriminant validity correlation reported having never been musically active in their lives. could be replicated on the basis of objective tests. Measures Study 3 Micro‑PROMS To improve the psychometric efficiency of the instrument, we removed three trials that had not added The goal of Study 3 was threefold—first, to replace and/ to reliability or validity. Furthermore, the Timbre subtest of or remove trials that had not worked well in Study 2 and the full-length PROMS had been represented by only one to re-examine the psychometric properties of the battery; trial in the version retained for Study 2. To ensure that each second, to evaluate the test–retest reliability of the battery; of the seven subtests of the original PROMS was represented and third, to expand the basis for evaluating the validity of by at least two trials, we added a trial from the Full-PROMS the battery. Convergent validity was examined against the Timbre subtest. As can be seen from Table 4, some musi- Musical Ear Test (MET), an established battery of musical cal dimensions were represented with two items, and some ability in melody and rhythm perception (Wallentin et al., with three items. This was done intentionally to achieve a 2010), whereas two composite indices of musical training balance of “sequential” and “sensory” components of musi- and expertise, including the Goldsmiths Musical Sophisti- cal perceptual ability, each represented with nine items. The cation Index (Gold-MSI), served as indicators of criterion distinction is based on factor analyses of original version validity. of the PROMS, which distinguished two PROMS factors, Discriminant validity was examined against the Digit with Accent, Rhythm, Melody loading on the sequential Span (DS) test, which measures both short-term and work- factor, and Pitch, Tempo, Timbre, Tuning on the sensory ing memory (Groth-Marnat et al., 2001). Furthermore, we factor (Law & Zentner, 2012). Subtest order was the same examined differential patterns, expecting stronger associa - as in Study 2. tions between the Micro-PROMS and scales tapping into music training or competence (e.g., MMQ-Competence; Music training Questions and coding regarding the partici- Gold-MSI perceptual abilities) than scales relating to music pants’ music training were the same as described in Stud- appreciation or engagement with music (e.g., MMQ-Appre- ies 1 and 2, comprising participants’ self-rated level of ciation; Gold-MSI emotions). musicianship, musical activity, and years of music training adjusted for age. Internal consistency was ω = .71. Method MusicM ‑ indedness Questionnaire (MMQ) As in Study 2, par- Participants ticipants completed the MMQ, which assesses music com- petence and music appreciation. Both scales were internally A total of 198 participants (77 male, 120 female, 1 other), consistent (ω = .85 and ω = .86, respectively). aged 15–59 years (M = 23.89, SD = 6.45, Mdn = 22), fully completed the Micro-PROMS as well as the MET. Of those Goldsmiths Musical Sophistication Index (Gold‑MSI) The participants, 196 (99.0%) also completed the MMQ, 165 Gold-MSI (Müllensiefen et al., 2014) is a multidimensional (83.3%) the DS test, and 105 (53.0%) the Gold-MSI. A total self-report questionnaire assessing participants’ musical sophistication on the basis of their active musical engage- ment, perceptual abilities, music training, singing abilities, A certain amount of attrition occurred for DS because test par- and emotional responses to music. It comprises 38 items and ticipants were asked to install special assessment software on their is sensitive to individual differences in music sophistication device. The Gold-MSI was added to the test battery at a second stage in both musicians and nonmusicians (Müllensiefen et al., of data collection to expand the basis for examining content valid- ity (see Procedure). 2014). In the current study, we used the German translation 1 3 Behavior Research Methods Table 4 Descriptive and psychometric information for each of the 18 Micro-PROMS items Item number Full-length Correct answer Mean SD% Correct RIT RIR ω if item is PROMS dropped 1 M2 D 0.91 0.27 93 .41 .35 .70 2 M11 D 0.76 0.40 81 .46 .35 .70 3 M12 D 0.62 0.42 74 .31 .15 .69 4 TU12 D 0.24 0.38 32 .32 .18 .72 5 TU17 S 0.85 0.30 93 .30 .19 .73 6 TU18 D 0.57 0.45 66 .46 .31 .71 7 TB14 D 0.39 0.44 47 .48 .34 .71 8 TB16 S 0.84 0.30 93 .23 .12 .73 9 TE5 D 0.81 0.36 86 .29 .16 .73 10 TE12 D 0.60 0.43 71 .53 .40 .68 11 R4 S 0.55 0.45 65 .32 .16 .72 12 R12 D 0.88 0.30 92 .41 .31 .70 13 R18 D 0.78 0.38 83 .40 .27 .69 14 P10 D 0.37 0.44 45 .52 .40 .68 15 P12 D 0.33 0.42 41 .39 .24 .72 16 A3 D 0.81 0.35 87 .43 .33 .70 17 A5 S 0.68 0.42 77 .35 .22 .71 18 A12 D 0.49 0.46 57 .33 .16 .71 Note. RIT = item-total correlation (correlation between item score and overall test score); RIR = item-rest correlation (correlation between item score and overall test score without the given item); M = Melody; TU = Tuning; TB = Timbre; TE = Tempo; R = Rhythm; P = Pitch; A = Accent. D = Different; S = Same. Item designation in the full-length PROMS. For example, the first item of the Micro-PROMS corresponds to Melody subtest item number 2 in the full-length PROMS, hence M2. Percentage of correct answers regardless of confidence level. by Schaal et al. (2014). Internal consistency for the subscales order (backward DS). Lists successively increase in length ranged from ω = .84 to ω = .89 and was ω = .91 for the and thus in difficulty. Assessment is stopped when partici- overall score of musical sophistication. pants fail to recall two consecutive lists of the same length. For the current study, the online version of the auditory Musical Ear Test (MET) The MET (Wallentin et al., 2010) DS test implemented in Inquisit 4 (Millisecond Software, comprises a Melody and a Rhythm subtest with 52 trials LLC, 2015) based on the procedure reported by Woods et al. each and takes approximately 20 min to complete (Cor- (2011) was administered. We report two estimates each for reira et al., 2022). Participants are asked to judge whether forward and backward recall: (1) total trial (TT)—the TT- two melodic or rhythmic phrases are identical or not. The score reflects the total number of both correct and incor - MET has been shown to have high internal consistency (α = rect trials presented prior to two consecutive errors at the .87) and to be significantly correlated with other measures same list length; (2) maximum length (ML)—the ML-score of musical expertise (Correira et al., 2022; Swaminathan reflects the maximum length of the list that was success- et al., 2021; Wallentin et al., 2010). For the present study, fully recalled. While the TT-score is similar to the widely an online version of the MET was implemented by the study used total correct score obtained from the Digit Span sub- authors using LimeSurvey (LimeSurvey GmbH n.d). Internal test of the Wechsler intelligence tests, it shows poorer test– consistency was ω = .73 for the Melody subtest, ω = .75 for retest reliability and convergent validity than the ML-score the Rhythm subtest, and ω = .85 for the total score. (Woods et al., 2011). DS test The DS test is a widely used tool for measuring both Procedure short-term and working memory (see Groth-Marnat et al., 2001). Participants are presented with a list of numbers for Similar to previous studies, instruments were administered a few seconds and are then asked to reproduce them from online via LimeSurvey (version 2.64.1+; LimeSurvey GmbH memory, either in the same order (forward DS) or in reverse n.d) and Inquisit 4 (Millisecond Software, LLC, 2015). 1 3 Behavior Research Methods Inquisit was used to administer the DS test and required was 4.00 and the highest score was 17.50 points. For the participants to install an app to enable remote assessment. MET, scores ranged between 51 and 98 points, with an aver- Participants were recruited via a university mailing list, as age score of 77.84 (SD = 9.42, Mdn = 79) out of 104 points. well as through the subreddit r/SampleSize (see Shatz, 2017) Mean scores and variances were similar to those reported and postings on the social media platforms Facebook and in a different online administration of the MET (Correira Instagram. Upon completion of the assessments, partici- et al., 2022), indicating that the current test conditions and pants received individual feedback on their performance on outcomes were comparable across the studies. the Micro-PROMS, the MET, and the DS test. Psychology The average sensitivity of the Micro-PROMS, based on students (n = 88, 44.4%) from the University of Innsbruck d (see Study 1, Method), was M = 1.39 (SD = 0.78; Mdn additionally received course credit for their participation. = 1.33) and M = 1.32 (SD = 0.80; Mdn = 1.28) for initial Participants were asked to complete the assessments in a and retest assessments, respectively (see Table 4 for detailed quiet environment and to use headphones to minimize pos- psychometric information for each of the 18 trials). Internal sible distractions. consistency was ω = .70 for the initial assessment and ω = Data collection took place in two stages (see Partici- .85 for the retest assessment. Test–retest reliability, com- pants). After agreeing to an informed consent statement, puted using a two-way mixed-effects model (single ratings, participants were asked to provide information relating to absolute agreement), was ICC = 0.83, 95% CI [0.68, 0.92]. sociodemographic and music background. After a sound This value is very close to the test–retest figures obtained for calibration test, in stage 1 of the data collection, measures the original version of the PROMS (Law & Zentner, 2012), were completed in the following order: (1) Micro-PROMS, as well as for the Short-PROMS and the Mini-PROMS (2) MMQ, (3) MET, (4) DS test. In stage 2 of the data col- (Zentner & Strauss, 2017). lection, the position of the Micro-PROMS, MET, and DS remained the same, and the Gold-MSI and MMQ were Convergent, discriminant, and criterion validity administered randomly in either second or fourth position of the sequence. After completion of the stage 2 assess- As shown in Table 5, the Micro-PROMS was highly and ments, participants were offered the opportunity to sign up significantly correlated with the MET and with MMQ- for a retest session against an allowance of 10 EUR. Thirty- Competence, lending support to its convergent validity. two individuals (16.2%) participated in the retest assess- Discriminant validity was evidenced by significant but ment, which took place in the laboratory. The time interval small correlations with DS forward, DS backward, and between the initial and the retest assessment was slightly Gold-MSI Emotions. Furthermore, convergent correlations over a week on average (M = 8.55, SD = 2.95, Mdn = 9). were significantly and markedly larger than discriminant From intercorrelations between the PROMS and other correlations (see Table S2). Correlations were almost iden- objective musical aptitude tests (Law & Zentner, 2012), tical when controlled for participants’ sex and age. With associations between the MET and the Gold-MSI (Cor- regard to criterion validity, the Micro-PROMS total score reira et al., 2022), and from findings of Studies 1 and 2, was significantly associated with both our music training we expected convergent and criterion correlations of r >. composite and the broader index of musicality provided 30. From the literature about associations between memory by the Gold-MSI total score. Figure 1 depicts the scores of capacity and musical aptitude, we expected these associa- the Micro-PROMS for participants with different levels of tions to fall within the range of r = .20–.30 (e.g., Kunert self-reported musicianship, the latter representing one of et  al., 2016; Swaminathan et al., 2021). As in the previ- the variables included in our music training composite (see ous two studies, power was estimated using the R package Measures in Study 1). pwr (version 1.3-0; Champely, 2020). Results showed that Overall, the pattern of convergent and discriminant valid- to detect correlations of this size with a power of .80 (α = ity correlations of the Micro-PROMS and the MET were .05), the current sample sizes were sufficient. Differences quite similar. For example, the Gold-MSI total score cor- in correlation coefficients were examined using the R pack - relations with the Micro-PROMS and the MET did not dif- age cocor (Diedenhofen & Musch, 2015), using Dunn and fer significantly (r = .52 vs. r = .49, z = 0.35, p = .723). Clark’s z (1969). Moreover, the Micro-PROMS and the MET total score each explained independent amounts of variance in Gold-MSI Results General Musical Sophistication when entered simultane- ously in a multiple regression (see Table 6). When the MET Descriptive statistics and reliability Melody and MET Rhythm subtests were entered separately, the Micro-PROMS was the sole significant predictor. As On average, participants scored 11.08 (SD = 2.85; Mdn = in previous studies, discriminant correlations of the Micro- 11) out of 18 points on the Micro-PROMS. The lowest score PROMS with memory capacity were somewhat lower than 1 3 Behavior Research Methods 1 3 Table 5 Means, standard deviations, and zero-order correlations of the Micro-PROMS and key validity measures Variable M SD n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 Micro-PROMS 11.08 2.85 198 2 MET Melody score 38.58 5.80 198 .57** 3 MET Rhythm score 39.26 4.94 198 .45** .54** 4 MET total score 77.84 9.42 198 .59** .90** .85** a + 5 DS forward (2-error max length) 6.89 1.45 165 .16* .30** .15 .26** 6 DS backward (2-error max length) 6.15 1.47 165 .23** .25** .25** .29** .42** 7 MMQ Competence 2.99 0.86 196 .46** .42** .32** .43** .15 .12 + + 8 MMQ Appreciation 3.39 0.88 196 .18* .19** .12 .18* −.10 −.15 .48** 9 Music training composite −0.00 0.79 198 .26** .39** .16* .33** .11 .01 .75** .42** 10 Gold-MSI Active Engagement 3.68 1.10 105 .34** .15 .35** .28** −.08 −.10 .49** .69** .37** 11 Gold-MSI Perceptual Abilities 5.25 0.91 105 .45** .29** .38** .38** .06 .04 .76** .38** .59** .38** 12 Gold-MSI Singing Abilities 3.92 1.15 105 .46** .41** .25** .38** .09 .01 .66** .40** .54** .27** .60** 13 Gold-MSI Musical Training 3.69 1.36 105 .40** .49** .39** .51** .08 −.05 .79** .34** .79** .34** .61** .53** 14 Gold-MSI Emotions 5.41 0.93 105 .22* .12 .21* .18 −.31** −.27** .21* .62** .02 .51** .30** .22* .09 15 Gold-MSI General Musical Sophistication 3.99 0.98 105 .51** .46** .41** .49** .06 −.07 .86** .56** .77** .61** .75** .79** .82** .33** Note. DS = Digit Span; Gold-MSI = Goldsmiths Musical Sophistication Index; MET = Musical Ear Test; MMQ = Music-Mindedness Questionnaire; PROMS = Profile of Music Perception Skills. p < .10. *p < .05. **p < .01. If DS scores were computed using the Total Trial instead of the Maximum Length scoring, correlations with the Micro-PROMS were r = .15* and r = .14 ns. Correlations DS-forward DS-backward with the MET total score were r = .24** and r = .24**. MET-Subtest correlations were: MET-Melody with r = .26** and r = .21**; MET-Rhythm with DS-forrward DS-backward DS-forward DS-backward r = .14 ns., and with r = .21**. DS-forward DS-backward Behavior Research Methods Fig. 1 Distribution of Micro-PROMS total scores by different levels of self-reported musicianship status. Note. N = 198. Micro-PROMS scores differ as a function of self-reported musicianship level, F(4,193) = 4.47, p = .002 short-term memory and working memory, and against Table 6 Summary of multiple regression analyses predicting music sophistication from MET and Micro-PROMS scores self-report scales assessing emotional rather than ability components of musicality. The criterion validity correla- Gold-MSI General Musical Sophistication tions with the composite index of musical training was b SE 95% CI β p significant, if somewhat attenuated relative to the respec- [LL, UL] tive correlations reported in studies using the full-length (Intercept) 0.31 0.65 [−0.97, 1.59] .631 PROMS and the Mini-PROMS (Law & Zentner, 2012; Micro-PROMS 0.12 0.04 [0.04, 0.20] 0.33 .003 Zentner & Strauss, 2017), and those found in the present MET (Total score) 0.03 0.01 [0.01, 0.05] 0.29 .007 Studies 1 and 2. A possible explanation for the differences 2 2 Model Fit R = .310, R = .296, F(2, 102) = 22.87 is provided by the near absence of professional musicians adj (p < .001) in the current sample (see Fig. 1) and the resulting range (Intercept) 0.33 0.65 [−0.96, 1.62] .613 restriction in musicianship status. Still, the Micro-PROMS Micro-PROMS 0.12 0.04 [0.04, 0.20] 0.32 .003 explained significant and substantial amounts of variance MET (Melody score) 0.03 0.02 [−0.00, 0.07] 0.20 .069 in Gold-MSI General Musical Sophistication, even when MET (Rhythm score) 0.03 0.02 [−0.01, 0.06] 0.14 .187 controlling for MET scores. 2 2 Model Fit R = .310, R = .290, F(3, 101) = 15.13 adj (p < .001) Note. N = 105. Gold-MSI Goldsmiths Musical Sophistication Index, MET Musical Ear Test, PROMS Profile of Music Perception Skills. General discussion Across three studies involving over 580 participants, the those found for memory-to-MET correlations (Swaminathan current research introduced a test battery for the assess- ment of musical ability that has some distinctive features et al., 2021; Wallentin et al., 2010). relative to earlier batteries. First, it is capable of providing an overall assessment of musical ability in about 10 min, Discussion making it the shortest test battery of overall perceptual musical ability that we are aware of. Second, it takes a Taken together, these results offer solid evidence that the broad range of music perception skills into account. In addition to tasks relating to discrimination for melody and Micro-PROMS provides a reliable and valid assessment of musical ability, despite its short duration. Specifically, rhythm, it includes trials relating to discrimination skills in the domains of pitch, timbre, tuning, tempo, and accent. the instrument’s final 18-trial version proved internally consistent and also exhibited good test–retest reliability. Third, it has been devised for online administration that is easy for researchers and participants to use. In combining Importantly, the Micro-PROMS met all validity criteria for successful test performance: convergent validity with these features, the current measure goes an important step beyond previously existing measures toward meeting the the MET—a different, well-established battery of musi- cal ability—as well as with a self-report instrument relat- requirement for a tool that can be used online to identify musical ability when time is critical. ing to musical competence; discriminant validity against 1 3 Behavior Research Methods Nederlanden et al., 2020), than those reported for the MET Psychometric properties (Correira et al., 2022; Swaminathan et al., 2021; Wallentin et al., 2010; Zentner & Gingras, 2019). In the current study, Against the aim of providing an overall summative score of musical ability in a very short time, the Micro-PROMS the differences in correlations of the two instruments with memory outcomes were consistent with the earlier findings, met the psychometric criteria of successful test performance. Specifically, despite using less than 15% of the trials of the if somewhat less pronounced. The small associations between performance on the full-length PROMS, the total scores of the two instruments were quite highly correlated and exhibited similar psycho- Micro-PROMS and memory tasks could be due to two dis- tinctive aspects of the PROMS. First, trials assessing skills metric properties. Naturally, shortening a test battery to such an extent has costs. For example, the overall scores repre- in domains such as timbre, pitch, and tuning are shorter and somewhat less complex than trials assessing rhythm and sent different levels of granularity, with the Micro-PROMS functioning as a screening tool for general musical aptitude, melody perception, taxing memory capacity less as a result. Thus, Talamini et al. (2016) found that only the Melody whereas the Full-PROMS provides a very detailed profile of multiple music perception abilities. Thus, the Micro- subtest of the Mini-PROMS was substantially correlated with auditory working memory. Second, in all versions of PROMS offers no domain-specific results, with the con- sequence that specific strengths and weaknesses cannot be the PROMS, the reference stimulus is presented twice to facilitate its encoding, whereas in the MET and in other assessed as is the case with the longer forms of the PROMS. All the same, the diversity of the contents represented by the music aptitude batteries that we are aware of, the reference stimulus is presented only once. Both the shorter duration of subscales in the longer PROMS versions was preserved to some extent by including trials from nearly all subtests of the trials and the repetition of the reference stimulus seem to leave individual differences in memory skills little room the long version in the Micro-PROMS. Conceptually, then, the total score of the Micro-PROMS can be compared with to affect performance. the total score of the full-length PROMS. Empirically, the comparability was evaluated against a Implications and uses number of psychometric measures. Internal consistency was adequate, if somewhat lower than that of the full-length The Micro-PROMS will be particularly useful in situa- tions where time is critical and researchers are primar- PROMS. This was to be expected due to the Micro-PROMS’ small number of items capturing a wide range of musical ily interested in a summative, overall estimate of musical ability. This may be the case, for example, when musi- content. In terms of validity, the pattern of correlations related to convergent, discriminant, and criterion validity cal aptitude needs to be assessed as a secondary variable alongside several other constructs, when it is assessed as was comparable to that of the full-length PROMS. More spe- cifically, convergent and discriminant validity could be dem- a control variable, or when the target sample is a special population with limited attentional resources (e.g., chil- onstrated against both objective ability tests (i.e., MET, DS) and self-report scales (Gold-MSI, MMQ). Criterion validity dren, older adults, clinical samples). For the latter groups, the variations in trial content may be of additional help in correlations with composite indices of musical training and expertise were significant and sizeable. sustaining attention and concentration. Brevity can be crit- ical regardless of the target population, especially when A comparison between the Micro-PROMS and the MET revealed that the two instruments perform about equally investigators seek to obtain large and diverse samples. Thus, it has been found that the risk of dropout increases well in psychometric terms (see Correira et al., 2022; Swa- minathan et al., 2021). In terms of reliability, the MET has by up to 20% for each additional 10-min interval in web- based studies (Galesic & Bošnjak, 2009; see also Liu & slightly higher internal consistency than the Micro-PROMS, which was to be expected given the MET’s higher number Wronski, 2018). Like previous versions of the PROMS, the Micro- of trials and stronger homogeneity in trial content. Because the test–retest reliability of the MET remains to be exam- PROMS was specifically devised for online administration. The process of making the PROMS suitable for online test- ined, test–retest reliability comparisons could not be drawn between the two instruments. With regard to validity, the ing involved technical aspects, such as ensuring adaptability to variations in computer hardware, operating systems, and size of correlations between the two batteries and the Gold- MSI were very similar, and similar also relative to MET- types of browsers, as well as ensuring that participants can take the test in the absence of an experimenter by formulat- to-Gold-MSI correlations found in an earlier study with a large sample (Correira et al., 2022). In turn, the discrimi- ing instructions that are clear and easy to follow. Further- more, the settings allow researchers to provide automatically nant correlations of the Micro-PROMS with short-term and working memory have typically been lower, in the range generated feedback of results displayed at the end of the test, which can be an incentive for participation. of r ≈ .20 (e.g., Kunert et  al., 2016; Vanden Bosch der 1 3 Behavior Research Methods Although the limited control over the participants’ listen- Finally, although comparatively ample evidence for the ing environment is an understandable source of concern, battery’s convergent, discriminant, and criterion validity was research indicates that online and offline assessments of obtained in the current studies, the validation of any test bat- musical aptitude yield largely similar results (Correira et al., tery is a continuous process that will require different types 2022). In our own research, we found that PROMS key met- of independent studies to produce definite results. The pro- rics, such as internal reliability, trial difficulty, and validity cess might involve validation by examining associations with correlations, obtained in the laboratory (Law & Zentner, proximal indicators of musical behaviors, such as the ease 2012) and remotely online (Zentner & Strauss, 2017) were with which musical novices acquire skills in understand- very similar. This finding is consistent with replications of ing and/or producing music over time, or studies relating data from in-person testing by data acquired online (e.g., to distal criteria, i.e., nonmusical abilities that should none- Chetverikov & Upravitelev, 2016; Chierchia et al., 2019; theless be conceptually related to musical aptitude, such as Nussenbaum et al., 2020; Zentner & Strauss, 2017), includ- phonological awareness or vocal emotion recognition. Such ing in the auditory domain (Milne et al., 2021). information will be valuable but will take years to collect. Despite evidence suggesting that online assessments of Despite its limitations, the Micro-PROMS closes an musical ability are reliable, internet assessments will likely important gap in tools available for the assessment of musi- introduce a small amount of noise relative to in-person testing. cal ability. If a summative score of musical ability is all that Ultimately, the potential drawbacks of online testing need to researchers need, the Micro-PROMS represents an interest- be weighed against its advantages, such as the ease of reaching ing alternative to longer versions of the PROMS or to other diverse samples, rare or specific subpopulations, or large num- music aptitude batteries due to the broad array of music per- bers of participants who will in turn provide higher statistical ception skills covered by the test, its brevity, and the ease power. Sometimes there is simply no choice, as has been the with which it can be administered online. case in many parts of the world during the 2020–2022 pan- Supplementary Information The online version contains supplemen- demic. Researchers preferring to administer the Micro-PROMS tary material available at https://doi. or g/10. 3758/ s13428- 023- 02130-4 . via in-person or laboratory testing can easily do so, provided their work environments are connected to the internet. Acknowledgements We wish to thank Dr Michael Hautus and Dr Mat- thias Gondan for their helpful advice on data analyses. Authors' contributions MZ conceptualized the project. MZ, HS, and Limitations SR developed the study design. HS, SR, and MD collected the data. All authors contributed to the analyses. MZ and HS wrote the paper. Several limitations of the present investigation are note- Funding Open access funding provided by University of Innsbruck and worthy. First, although we found the Micro-PROMS to be Medical University of Innsbruck. psychometrically sound overall, additional studies are nec- essary to better establish its psychometric properties. For Data availability The data and materials for all studies are available at https:// osf. io/ au6m5/. Administering the Micro-PROMS does not example, our samples were relatively homogeneous, and it require a code, as it is freely accessible online. In order to request a is therefore necessary to examine the psychometric proper- PROMS research account please visit: https://musem ap. or g/r esour ces/ ties of the Micro-PROMS in samples of different ethnic and proms. None of the experiments was preregistered. educational backgrounds, and across different age groups. Code availability (software application or custom code) Code to com- The Micro-PROMS should be suitable for use in child and pute d-prime estimates for the Micro-PROMS is available at https:// older adult populations because of its brevity, but the extent osf. io/ au6m5/. to which this is the case remains to be determined. Second, it is important to keep in mind that the PROMS Declarations measures perceptual musical abilities. Although there is evi- Ethics approval The questionnaire and methodology for this study were dence to suggest that perceptual musical abilities are sub- approved by the Board for Ethical Issues of the University of Innsbruck stantially correlated with certain musical production skills, (No. 69/2021). such as tapping a tempo or rhythm (Dalla Bella et al., 2017; Georgi et al., 2023), current definitions of musicality encom- Consent to participate Informed consent was obtained from all indi- vidual participants included in the study. The statement was worded pass components such as abilities in the domains of perform- as follows: “All results will be processed in a way that guarantees your ing or creating music (Levitin, 2012). The moderately strong anonymity. Your participation is voluntary, and you may withdraw at correlations between performance on the PROMS and exter- any stage in the proceedings.” nal indicators of musical proficiency, such as being a musi- Consent for publication Participants were predominantly drawn from cian, are encouraging, but do not obviate the need for the a university mailing list which students can subscribe to if they are construction of test batteries that tap into a broader array of interested in participating in scientific studies conducted by members musical talents. 1 3 Behavior Research Methods of the University of Innsbruck. By subscribing, participants consent to Galesic, M., & Bošnjak, M. (2009). Effects of questionnaire length on the publication of anonymized data. participation and indicators of response quality in a web survey. Public Opinion Quarterly. Advance online publication. Georgi, M., Gingras, B., & Zentner, M. (2023). The Tapping-PROMS: Conflicts of interest/Competing interests The authors have no relevant A test for the assessment of sensorimotor rhythmic abilities. financial or nonfinancial interests to disclose. Frontiers in Psychology 13. https:// doi. org/ 10. 3389/ fpsyg.202 2. Open Access This article is licensed under a Creative Commons Goshorn, E. L., & Goshorn, J. D. (2001). Analysis of yes–no and con- Attribution 4.0 International License, which permits use, sharing, fidence-rating word recognition response formats. The Journal of adaptation, distribution and reproduction in any medium or format, the Acoustical Society of America, 110(5), 2706–2706. as long as you give appropriate credit to the original author(s) and the Grassi, M., Meneghetti, C., Toffalini, E., & Borella, E. (2017). Audi- source, provide a link to the Creative Commons licence, and indicate tory and cognitive performance in elderly musicians and non- if changes were made. The images or other third party material in this musicians. PLoS ONE, 12(11), e0187881. https:// doi. org/ 10. article are included in the article's Creative Commons licence, unless 1371/ journ al. pone. 01878 81 indicated otherwise in a credit line to the material. If material is not Groth-Marnat, G., Gallagher, R. E., Hale, J. B., & Kaplan, E. (2001). included in the article's Creative Commons licence and your intended The Wechsler Intelligence Scales. In A. S. Kaufman & N. L. use is not permitted by statutory regulation or exceeds the permitted Kaufman (Eds.), Cambridge child and adolescent psychiatry. use, you will need to obtain permission directly from the copyright Specific learning disabilities and difficulties in children and holder. To view a copy of this licence, visit http:// creat iveco mmons. adolescents: Psychological Assessment and Evaluation (pp. org/ licen ses/ by/4. 0/. 29–51). Cambridge University Press. Hautus, M. J. (1995). Corrections for extreme proportions and their biasing effects on estimated values of d ′. Behavior Research Methods, Instruments, & Computers, 27, 46–51. References Hautus, M., Macmillan, N. A., & Creelman, C. D. (2021). Detection Theory. A User's Guide. Routledge. Kruyen, P. M., Emons, W. H. M., & Sijtsma, K. (2013). On the Aujla, H. (2022). do′: Sensitivity at the optimal criterion location. shortcomings of shortened tests: a literature review. Interna- Behavior Research Methods, 1–27. https:// doi. or g/ 10. 3758/ tional Journal of Testing, 13(3), 223–248. https:// doi. org/ 10. s13428- 022- 01913-5 1080/ 15305 058. 2012. 703734 Bégel, V., Dalla Bella, S., Devignes, Q., Vandenbergue, M., Lemaî- Knoblauch, K. (2022). psyphy: Functions for Analyzing Psychophys- tre, M.-P., & Dellacherie, D. (2022). Rhythm as an independ- ical Data in R. https:// CRAN. Rproj ect. org/ packa ge= psyphy. ent determinant of developmental dyslexia. Developmental Kunert, R., Willems, R. M., & Hagoort, P. (2016). An Independ- Psychology, 58(2), 339–358. https:// doi. org/ 10. 1037/ dev00 ent psychometric evaluation of the PROMS Measure of Music Perception Skills. PLOS ONE, 11(7), e0159103. https://doi. or g/ Boll-Avetisyan, N., Bhatara, A., & Höhle, B. (2020). Processing of 10. 1371/ journ al. pone. 01591 03 rhythm in speech and music in adult dyslexia. Brain Sciences, Lam, H. L., Li, W. T. V., Laher, I., & Wong, R. Y. (2020). Effects of 10(5), 261. https:// doi. org/ 10. 3390/ brain sci10 050261 music therapy on patients with dementia—a systematic review. Brancatisano, O., Baird, A., & Thompson, W. F. (2020). Why is music Geriatrics, 5(4), 62. https://doi. or g/10. 3390/ g eriatr ics50400 62 therapeutic for neurological disorders? The Therapeutic Music Law, L. N. C., & Zentner, M. (2012). Assessing musical abilities Capacities Model. Neuroscience & Biobehavioral Reviews, 112, objectively: construction and validation of the profile of music 600–615. perception skills. PLoS ONE, 7(12), e52508. https://doi. or g/10. Champely, S. (2020). pwr: Basic functions for power analysis [Com- 1371/ journ al. pone. 00525 08 puter software]. https:// CRAN.R- proje ct. org/ packa ge= pwr Levitin, D. J. (2012). What does it mean to be musical? Neuron, Chetverikov, A., & Upravitelev, P. (2016). Online versus offline: The 73(4), 633–637. https:// doi. org/ 10. 1016/j. neuron. 2012. 01. 017 Web as a medium for response time data collection. Behavior LimeSurvey GmbH (n.d). LimeSurvey [Computer software]. Research Methods, 48(3), 1086–1099. https:// doi. org/ 10. 3758/ Liu, M., & Wronski, L. (2018). Examining completion rates in web s13428- 015- 0632-x surveys via over 25,000 real-world surveys. Social Science Chierchia, G., Fuhrmann, D., Knoll, L. J., Pi-Sunyer, B. P., Computer Review, 36(1), 116–124. h t t p s : / / d o i . o r g / 1 0 . 1 1 7 7 / Sakhardande, A. L., & Blakemore, S.-J. (2019). The matrix rea- 08944 39317 695581 soning item bank (MaRs-IB): novel, open-access abstract reason- Macmillan, N.A. & Creelman, C.D. (2005). Signal detection theory. ing items for adolescents and adults. Royal Society Open Science, 2nd edition. Erlbaum. 6(10), 190232. https:// doi. org/ 10. 1098/ rsos. 190232 Macmillan, N. A., & Kaplan, H. L. (1985). Detection theory analysis Correira, A. I., Vincenzi, M., Vanzella, P., Pinheiro, A. P., Lima, C. of group data: estimating sensitivity from average hit and false- F., & Schellenberg, E. G. (2022). Can musical ability be tested alarm rates. Psychological Bulletin, 98(1), 185–199. online? Behavior Research Methods, 54(2), 955–969. https://doi. Mamassian, P. (2020). Confidence forced-choice and other metaper - org/ 10. 3758/ s13428- 021- 01641-2 ceptual tasks. Perception, 49(6), 616–635. https:// doi. org/ 10. Dalla Bella, S., Farrugia, N., Benoit, C.-E., Begel, V., Verga, L., Hard- 1177/ 03010 06620 928010 ing, E., & Kotz, S. A. (2017). Baasta: Battery for the Assess- Marquez-Garcia, A. V., Magnuson, J., Morris, J., Iarocci, G., Does- ment of Auditory Sensorimotor and Timing Abilities. Behavior burg, S., & Moreno, S. (2022). Music therapy in autism spec- Research Methods, 49(3), 1128–1145. https:// doi. org/ 10. 3758/ trum disorder: a systematic review. Review Journal of Autism s13428- 016- 0773-6 and Developmental Disorders, 9(1), 91–107. https://doi. or g/10. Diedenhofen, B., & Musch, J. (2015). cocor: A comprehensive solution 1007/ s40489- 021- 00246-x for the statistical comparison of correlations. PLoS ONE, 10(4), Millisecond Software, LLC. (2015). Inquisit 4 [Computer software]. e0121945. https:// doi. org/ 10. 1371/ journ al. pone. 01219 45 https:// www. milli second. com Dunn, O. J., & Clark, V. (1969). Correlation coefficients measured on Milne, A. E., Bianco, R., Poole, K. C., Zhao, S., Oxenham, A. J., Bil- the same individuals. Journal of the American Statistical Associa- lig, A. J., & Chait, M. (2021). An online headphone screening tion, 64(325), 366. https:// doi. org/ 10. 2307/ 22837 46 1 3 Behavior Research Methods test based on dichotic pitch. Behavior Research Methods, 53(4), Talamini, F., Carretti, B., & Grassi, M. (2016). The working memory of 1551–1562. https:// doi. org/ 10. 3758/ s13428- 020- 01514-0 musicians and nonmusicians. Music Perception, 34(2), 183–191. Müllensiefen, D., Gingras, B., Musil, J., & Stewart, L. (2014). The https:// doi. org/ 10. 1525/ mp. 2016. 34.2. 183 musicality of non-musicians: an index for assessing musical Thaut, M., & Hodges, D. A. (Eds.). (2018). Oxford handbooks online. The sophistication in the general population. PLoS ONE, 9(2), Oxford handbook of music and the brain. Oxford University Press. e89642. https:// doi. org/ 10. 1371/ journ al. pone. 00896 42 Vanden Bosch der Nederlanden, C. M., Zaragoza, C., Rubio-Garcia, A., Nussenbaum, K., Scheuplein, M., Phaneuf, C. V., Evans, M. D., & Clarkson, E., & Snyder, J. S. (2020). Change detection in complex Hartley, C. A. (2020). Moving developmental research online: auditory scenes is predicted by auditory memory, pitch percep- comparing in-lab and web-based studies of model-based rein- tion, and years of musical training. Psychological Research, 84(3), forcement learning. Collabra. Psychology, 6(1), 17213. https:// 585–601. https:// doi. org/ 10. 1007/ s00426- 018- 1072-x doi. org/ 10. 1525/ colla bra. 17213 Vokey, J. R. (2016). Single-step simple ROC curve fitting via PCA. Rajan, A., Shah, A., Ingalhalikar, M., & Singh, N. C. (2021). Structural Canadian Journal of Experimental Psychology / Revue cana- connectivity predicts sequential processing differences in music dienne de psychologie expérimentale, 70(4), 301–305. https:// perception ability. European Journal of Neuroscience, 54(6), doi. org/ 10. 1037/ cep00 00095 6093–6103. https:// doi. org/ 10. 1111/ ejn. 15407 Wallentin, M., Nielsen, A. H., Friis-Olivarius, M., Vuust, C., & Vuust, Sala, G., & Gobet, F. (2020). Cognitive and academic benefits of music P. (2010). The Musical Ear Test, a new reliable test for measuring training with children: A multilevel meta-analysis. Memory & musical competence. Learning and Individual Differences, 20(3), Cognition, 48(8), 1429–1441. 188–196. https:// doi. org/ 10. 1016/j. lindif. 2010. 02. 004 Schaal, N. K., Bauer, A.-K. R., & Müllensiefen, D. (2014). Der Gold- Woods, D. L., Kishiyama, M. M., Yund, E. W., Herron, T. J., Edwards, MSI: Replikation und Validierung eines Fragebogeninstrumentes B., Poliva, O., Hink, R. F., & Reed, B. (2011). Improving digit zur Messung Musikalischer Erfahrenheit anhand einer deutschen span assessment of short-term verbal memory. Journal of Clinical Stichprobe. Musicae Scientiae, 18(4), 423–447. https:// doi. org/ and Experimental Neuropsychology, 33(1), 101–111. https:// doi. 10. 1177/ 10298 64914 541851org/ 10. 1080/ 13803 395. 2010. 493149 Shatz, I. (2017). Fast, Free, and Targeted: Reddit as a source for recruit- Zentner, M., & Gingras, B. (2019). The assessment of musical ability ing participants online. Social Science Computer Review, 35(4), and its determinants. In P. J. Rentfrow & D. J. Levitin (Eds.), 537–549. https:// doi. org/ 10. 1177/ 08944 39316 650163 Foundations in music psychology: Theory and research (pp. Smith, G. T., McCarthy, D. M., & Anderson, K. G. (2000). On the 641–683). The MIT Press. sins of short-form development. Psychological Assessment, 12(1), Zentner, M., & Strauss, H. (2017). Assessing musical ability quickly 102–111. https:// doi. org/ 10. 1037/ 1040- 3590. 12.1. 102 and objectively: Development and validation of the Short- Sun, R. R., Wang, Y., Fast, A., Dutka, C., Cadogan, K., Burton, L., PROMS and the Mini-PROMS. Annals of the New York Academy Kubay, C., & Drachenberg, D. (2021). Influence of musical back - of Sciences, 1400(1), 33–45. https://doi. or g/10. 1111/ n yas.13410 ground on surgical skills acquisition. Surgery, 170(1), 75–80. https:// doi. org/ 10. 1016/j. surg. 2021. 01. 013 Open Practices Statement The data and materials for all studies are Swaminathan, S., Kragness, H. E., & Schellenberg, E. G. (2021). The available at https:// osf. io/ au6m5/. None of the studies was preregistered. Musical Ear Test: norms and correlates from a large sample of Canadian undergraduates. Behavior Research Methods, 53(5), Publisher’s note Springer Nature remains neutral with regard to 2007–2024. https:// doi. org/ 10. 3758/ s13428- 020- 01528-8 jurisdictional claims in published maps and institutional affiliations. 1 3 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Behavior Research Methods Springer Journals

Online assessment of musical ability in 10 minutes: Development and validation of the Micro-PROMS

Loading next page...
 
/lp/springer-journals/online-assessment-of-musical-ability-in-10-minutes-development-and-C04127PcSP

References (45)

Publisher
Springer Journals
Copyright
Copyright © The Author(s) 2023
eISSN
1554-3528
DOI
10.3758/s13428-023-02130-4
Publisher site
See Article on Publisher Site

Abstract

We describe the development and validation of a test battery to assess musical ability that taps into a broad range of music perception skills and can be administered in 10 minutes or less. In Study 1, we derived four very brief versions from the Profile of Music Perception Skills (PROMS) and examined their properties in a sample of 280 participants. In Study 2 (N = 109), we administered the version retained from Study 1—termed Micro-PROMS—with the full-length PROMS, finding a short-to-long-form correlation of r = .72. In Study 3 (N = 198), we removed redundant trials and examined test–retest reliability as well as convergent, discriminant, and criterion validity. Results showed adequate internal consistency (  = .73) and test–retest reliability (ICC = .83). Findings supported convergent validity of the Micro-PROMS (r = .59 with the MET, p < .01) as well as discriminant validity with short-term and working memory (r ≲ .20). Criterion-related validity was evidenced by significant correlations of the Micro-PROMS with external indicators of musical proficiency ( r = .37, ps < .01), and with Gold-MSI General Musical Sophistication (r = .51, p<.01). In virtue of its brevity, psychometric qualities, and suitability for online administration, the battery fills a gap in the tools available to objectively assess musical ability. Keywords Assessment · Music perception · Musicians · Musical ability · Musical aptitude · Psychometrics Introduction The growth of studies on associations between musical capacities and mental or neural functioning has revived Interest in musical ability has grown continuously over the interest in the creation of tools for the objective measure- past two decades (Zentner & Strauss, 2017). One reason ment of musical abilities (Zentner & Gingras, 2019). This for this development is the increasing corpus of studies development is driven by several factors. First, batteries indicating that musical ability is associated with a range for the assessment of musical abilities can be helpful in of nonmusical abilities. This includes not only general particularizing the musical skills involved in nonmusi- auditory skills (Grassi et al., 2017), but also reading, pho- cal ability or impairment. For example, dyslexia appears nological awareness, second language abilities, memory, to be primarily related to impairments in the perception executive functions, socio-emotional skills, and motor and reproduction of rhythm, rather than to other musi- skills (see Thaut & Hodges, 2018; Sala & Gobet, 2020, cal deficits (Bégel et  al., 2022; Boll-Avetisyan et  al., for an overview of relevant studies). Beyond their rele- 2020). Second, the ever-growing body of evidence on vance for basic research, a deeper understanding of these the benefits of music training often remains unclear as to associations also holds the potential to inform treatment whether outcomes are due to the music training itself or approaches to various conditions, such as dyslexia, demen- to preexisting individual differences in musical ability. tia, or autism spectrum disorder (e.g., Boll-Avetisyan Music ability tests can help disambiguate such findings. et al., 2020; Brancatisano et al., 2020; Lam et al., 2020; Third, even when outcomes can be attributed to musi- Marquez-Garcia et al., 2022). cal intervention, it may remain unclear as to whether the outcome was driven by an improvement in musical skills or by nonmusical features of the intervention. In all these * Marcel Zentner cases, objectively assessed musical skills can help rule marcel.zentner@uibk.ac.at out alternative explanations and enrich interpretation of Department of Psychology, University of Innsbruck, the findings. Innsbruck, Austria Vol.:(0123456789) 1 3 Behavior Research Methods Although batteries for the assessment of musical abilities Table 1 Overview of recent musical aptitude tests assessing general have been available for a century, in recent times, musical musical perception ability proficiency has more often been indirectly inferred from Name of test Year Subtest domain Trials musicianship status rather than measured directly by stand- MET 2010 M, R 104 ardized musical ability tests (Zentner & Gingras, 2019). One SMDT 2014 M, R, P 63 reason for this practice is that musical aptitude batteries cre- PROMS 2012 A, ER, L, M, P, R, TB, TE, TU 162 ated in the past century were primarily designed for use in PROMS-S 2017 A, ER, M, P, R, TB, TE, TU 68 educational contexts, such as for determining which children Mini-PROMS 2017 A, M, TE, TU 36 might be best suited for admission to music schools, learning an instrument, or playing in a band. Another reason relates to Note. Information compiled from Zentner and Gingras (2019). shortcomings of earlier batteries, such as outdated, unwieldy A Accent, ER Embedded Rhythms, L Loudness, M Melody, MET formats; deficiencies in stimulus design and control; and Musical Ear Test, Mini-PROMS abridged version of the Profile of Music Perception Skills, P Pitch, PROMS Profile of Music Perception gaps in psychometric evaluation and documentation (see Skills, PROMS-S Profile of Music Perception Skills-Short, R Rhythm, Zentner & Gingras, 2019). SMDT Swedish Musical Discrimination Test, TB Timbre, TE Tempo, In recognition of these limitations, over the past decade, TU Tuning. investigators have developed a number of musical aptitude A detailed description and illustration of the stimulus material used tests with better psychometric, acoustic, and practical prop- in the PROMS subtests is provided in Law and Zentner (2012). erties (Zentner & Gingras, 2019). These advantages have led to increased use of musical ability batteries in domains that are concerned with music, notably psychology and neurosci- especially when responses are scored and recorded auto- ence. Although this development is important in bolstering matically on the hosting platform. Finally, a test that can the credibility of relevant research findings, most current be administered online has great versatility as it can be batteries assess a relatively narrow range of musical skills, administered in such diverse environments as a school, under usually skills in melody and rhythm discrimination (see individual testing conditions in a laboratory, or on adult par- Zentner & Gingras, 2019, for an overview). Because musi- ticipants’ personal computers outside the laboratory. These cal ability encompasses aspects beyond melody or rhythm benefits became particularly obvious during the 2020–2022 perception, such batteries may have limited content valid- pandemic, when in-person or laboratory testing was difficult ity. The Profile of Music Perception Skills (PROMS) was or even impossible in many parts of the world. devised to assess a broader range of musical discrimination Online implementation and delivery of the PROMS skills, including those in the domains of timbre, tempo, tun- works through LimeSur vey (LimeSurvey GmbH n.d). ing, and nonmetric rhythm perception. LimeSurvey is a powerful, open-source survey web applica- The PROMS exists in several versions that have all been tion that provides a great deal of flexibility for customizing shown to be both valid and reliable (Law & Zentner, 2012; online assessments by embedding JavaScript. This allows Zentner & Gingras, 2019; Zentner & Strauss, 2017). Recent us, for example, to determine users’ operating systems and evidence includes demonstrations of large differences browsers in order to adapt the presentation of trials to user between musicians and nonmusicians on Mini-PROMS test specifications, which minimizes risk of technical errors scores (e.g., Sun et al., 2021; Vanden Bosch der Nederlanden and ensures a stable test delivery environment. Researchers et al., 2020), as well significant associations of PROMS-S interested in using the PROMS for research purposes receive scores with brain activation patterns involved in music pro- a research account that provides access to their own use of cessing (Rajan et al., 2021). Table 1 provides an overview the PROMS. of key characteristics of these versions, along with other Although one of the strengths of the PROMS is that it recently developed musical ability tests. A more detailed allows assessment of music perception skills that are miss- overview of musical ability tests for general and special ing from other music aptitude batteries (e.g., discrimination populations is provided in Table S1. skills in the domains of timbre, tuning, or tempo), sometimes A distinctive feature of the PROMS is that it has been optimized for online administration (Zentner & Strauss, The account offers researchers the possibility to tailor the online 2017). This development has been motivated by the advan- assessment to their needs, for example, by adding questionnaires that tages offered by web-based data collection, including the are of interest to the researcher. Researchers also have the option of ability to (a) reach more diverse samples, as well as rare or providing participants feedback on their results at the end of the test. Responses can be accessed by researchers at any time and down- specific subpopulations; (b) recruit a larger number of par - loaded in multiple formats, including csv or SPSS files. The data are ticipants, who provide higher statistical power; (c) conduct securely stored on university servers. The battery can be delivered in cross-cultural studies without significant recruiting chal- several languages. For more information, see: https:// musem ap. org/ lenges; and (d) run studies more quickly and inexpensively, resou rces/ proms/ proms- user- guide 1 3 Behavior Research Methods investigators are primarily interested in obtaining an over- Overview of studies all summative score of musical ability. This is likely to be the case, for instance, when musical aptitude is included In light of the preceding review and considerations, the as a control variable; when it is included as a secondary overarching objective of the current research was to derive variable; when changes in performance over time need to a battery from the full-length PROMS that could (a) be be assessed for musical abilities overall rather than domain administered online in no more than 10 min, (b) retain the by domain; or when musical ability needs to be assessed for broadest possible range of musical dimensions characteristic screening purposes. Furthermore, most current music apti- of the original PROMS, and (c) provide a valid and reliable tude batteries take at least 15 min to complete. When time score of overall perceptual musical ability. To this end, we with participants is limited, or when examining populations conducted three studies. In Study 1, we screened trials from with limited attentional resources such as children or special the full-length PROMS for inclusion in four very brief ver- populations, researchers may find that a battery that takes sions of the PROMS and evaluated their properties regarding even 15 min is too long. brevity, difficulty, reliability, and validity. In Study 2, the For these reasons, we sought to devise a musical test bat- version providing the best trade-off between these proper - tery that could be administered in no more than 10 min, all ties—termed Micro-PROMS—was administered with the while retaining the broad range of musical dimensions that full-length PROMS to examine short-to-long version cor- is distinctive of the PROMS. The development of short ver- relations and compare key psychometric properties of the sions of test batteries presents some challenges, however. two versions. In Study 3, we examined test–retest reliabil- First, although test trials or trials of a short version are usu- ity, convergent validity of the battery with the Musical Ear ally all included in the full-length form, one cannot assume Test, discriminant validity against short-term and working that the reliability and validity evidence of the full-length memory, and criterion validity with multiple separate indica- form automatically extends to the abbreviated form (Smith tors of musical proficiency. et al., 2000). As a consequence, it is essential to establish the reliability and validity of the new measure independently. Second, the examination of associations between the full- length and the abbreviated version requires the two versions Study 1 to be administered separately to avoid inflated estimates resulting from correlated error variance. Method Third, if the full-length version of a test has a multidimen- sional structure, the content validity of the short version’s Participants overall score is contingent on preserving the diversity of the domains of the long version. Such preservation is under- Participants were 280 students (174 female, 106 male, 0 mined, for example, if item selection for the short version is other) from the University of Innsbruck, aged 18 to 69 years one-sidedly based on statistical criteria, such as maximizing (M = 24.35, SD = 7.70, Mdn = 22). Six (2.1%) partici- internal consistency. This can lead to overrepresentation of pants considered themselves to be professional musicians, items from particular subscales that correlate more highly 43 (15.4%) semiprofessional musicians, 142 (50.7%) ama- with the overall score than do items from other subscales, teur musicians, 81 (15.4%) music-loving nonmusicians, and thereby reducing content validity (Kruyen et al., 2013; Smith eight (2.9%) nonmusicians. Of those classified as amateur et al., 2000). Thus, in shortening a test, researchers need musician or above (n = 191), 150 reported that they were to balance reliability and validity criteria, ideally attaining still practicing their instrument regularly, corresponding to a satisfactory reliability without sacrificing validity. proportion of 54.6% of musically active participants against Fourth, because of these problems of construct under- 45.4% of either nonactive amateurs or nonmusicians. or misrepresentation, extensive validation is particularly important in the case of abbreviated tests. This includes Creation of the Micro‑PROMS demonstrations of convergent validity, criterion validity, and discriminant validity. In the current case, this means that the Trials for the Micro-PROMS were taken from all subtests instrument should show high correlation with other musical of the full-length PROMS with the exception of the Loud- aptitude tests whose validity has already been established, ness subtest, which had already been removed from previ- and it should exhibit significant correlations with external ous PROMS versions because of its weak correlations with indicators of musical proficiency. At the same time, the test other subtests (Zentner & Strauss, 2017), and the Embedded should not be unduly related to generic, nonmusical skills Rhythm subtest, which would have required special instruc- of audition and cognition that a musical aptitude test might tions and practice trials, making the test longer than desired. nonetheless inadvertently tax if not carefully designed. 1 3 Behavior Research Methods The selection of trials was based on data from a sample to affect the performance. Participants are asked to indicate of 667 participants, cumulated over seven different studies whether reference and comparison are the same or different conducted both in the laboratory and remotely online. Trials by selecting one of five answer options: “Definitely same,” were retained for inclusion according to general principles of “Probably same,” “Probably different,” “ Definitely differ - item analysis, notably item difficulty, skewness, item-to-total ent,” and “I don’t know.” The distinction between “prob- correlation, and test–retest performance of individual trials. ably” and “definitely” was introduced in line with signal In this selection process, statistical criteria (e.g., high item- detection theory (Macmillan & Creelman, 2005; see also to-total correlations) were weighed against validity require- Hautus et al., 2021) to account for confidence level, whereas ments (e.g., representation of the broadest possible range of the “I don’t know” option was provided to reduce guessing. musical dimensions of the PROMS) so as to achieve a good For each correct answer, participants received 1 point for balance between reliability and validity. high confidence ratings (i.e., “Definitely same/different”) and Unlike previous PROMS versions, in which instructions 0.5 points for lower confidence ratings (i.e., “Probably same/ and practice trials are specific to each of the subtests, we different”). Wrong answers and “I don’t know” answers were aimed at including only one general instruction and one set scored with 0 points. of practice trials to make the test more time-effective. The As a commonly used paradigm in perceptual discrimi- use of a single instruction also made it possible to present nation tasks, confidence ratings allow for finer sensory trials either in fixed sequence, wherein trials belonging to judgments compared to the traditional binary forced choice the same subtest were presented in successive order and the (Mamassian, 2020). Although there is evidence suggesting order of subtests was also fixed, or in random sequence, that answer format using confidence ratings rather than a in which trials from any subtest could be preceded or fol- yes/no answer format only affects results in terms of percent lowed by trials from any other subtest. In consideration of correct responses, and not in terms of d  or association with these aspects, we created four different trial sets, described criteria (Goshorn & Goshorn, 2001), recently developed in detail in the next section. methods can fully account for confidence ratings (Aujla, 2022). One such measure is Vokey’s (2016) d , which is Measures derived from fitting receiver operating characteristic (ROC) curves via principal component analysis and will be reported Micro‑PROMS The four versions of the new PROMS had here as a more robust alternative to traditional sensitivity the following characteristics: Version 1 included four tri- measures such as d  or d . Some participants achieved hit als from five subtests of the full-length PROMS (Melody, and false-alarm rates of 0 or 1, indicating that they correctly Tuning, Accent, Rhythm, and Timbre), resulting in 20 trials identified either all or none of the stimuli of a given class that were presented in fixed order. Version 2 was identical to (i.e., same or different correct) and confidence level. In line Version 1, except that trials were presented in random order. with the specialized literature, these values were adjusted Versions 3 and 4 included three to five trials from seven using the 1/(2N) rule (Hautus, 1995; Macmillan & Kaplan, subtests of the full-length PROMS (the five subtests of Ver - 1985). sions 1 and 2, plus trials from the Pitch and Tempo subtests), We should note that throughout the analyses reported resulting in 23 trials that were presented in either fixed order in this research, using raw scores, d scores, or alternative (Version 3) or in random order (Version 4). For the fixed d-prime measures such d  (Macmillan &  Kaplan, 1985) led order versions, subtest order was balanced by alternating to very similar findings and the same conclusions. Hence, structural and sensory subtests (see Law & Zentner, 2012). we always report d in the descriptive sections of the results For Version 1, trials grouped by subtest were presented in but use raw scores for the correlational analyses for easier the following order: Melody, Timbre, Accent, Tuning, and interpretation. The code for computing sensitivity estimates Rhythm. For Version 2, the respective order was: Melody, for the Micro-PROMS can be found under the OSF link pro- Timbre, Tempo, Tuning, Rhythm, Pitch, and Accent. The vided at the end. characteristics of the four versions are summarized in the first three rows of Table  2. As with all previous PROMS versions (see Law & Zent- A small minority of participants (5.7%) only provided definitely ner, 2012; Zentner & Strauss, 2017), participants are pre- same or definitely different answers. Due to the absence of less than sented a reference stimulus twice, separated by an inter- fully confident answers, d values could not be computed and were replaced with d-prime values computed from the individual hit and stimulus interval of 1.5 sec, followed by the comparison false-alarm rates of the collapsed rating categories using the same- stimulus after an interval of 2.5 sec. The reference stimulus different paradigm from the psyphy-package (v 0.2-3; Knoblauch, was presented twice to facilitate its encoding, thereby leav- 2022; see also Hautus, 1995). Removing these participants did not ing less room for individual differences in memory capacity change the results. 1 3 Behavior Research Methods Table 2 Descriptive summaries of test performance for the four initial Micro-PROMS versions Features Version 1 2 3 4 Order of trials Fixed Random Fixed Random Number of trials 20 20 23 23 Subtests M, TU, A, R, TB M, TU, A, R, TB, P, TE Number of participants 74 72 60 74 Mean test duration in min (SD) 9.39 (2.11) 9.48 (1.77) 10.11 (1.44) 10.41 (1.31) Mean total score (SD) 14.52 (2.73) 14.86 (2.32) 15.23 (2.62) 15.39 (2.53) Mean d (SD) 1.85 (0.75) 1.94 (0.78) 1.77 (0.77) 1.77 (0.68) Mean item-to-total correlation .37 .32 .33 .32 Omega (ω) .73 .64 .63 .61 % of trials answered correctly 79.5% 81.1% 74.9% 78.9% Correlation with music training .53* .43* .43* .39* Note. PROMS Profile of Music Perception Skills, M Melody, TU Tuning, A Accent, R Rhythm, TB Timbre, P Pitch, TE Tempo. Correct answers combined regardless of confidence level. * All ps < .01. The instructions asked participants to take the test in a Music training As in earlier studies that examined the PROMS (Law & Zentner, 2012; Zentner & Strauss, 2017), quiet environment and to use headphones. After answer- ing questions about their sociodemographic background information on participants’ musical proficiency was assessed with multiple indicators: (a) self-rated level of and music training, a script randomly assigned partici- pants to one of the four Micro-PROMS versions. Each musicianship (0 = nonmusician; 1 = music-loving nonmu- sician; 2 = amateur musician; 3 = semiprofessional musi- version was preceded by a general instruction page, a sound calibration page to set the volume, and a practice cian; 4 = professional musician; (b) musical activity (0 = not playing an instrument or singing; 1 = used to play an trial to familiarize participants with the tasks. On average, the full sessions took 14.5 min to complete (SD = 4.2, instrument or sing but no longer practicing; 2 = regularly practicing an instrument or singing for 0–5 hours a week; Mdn = 13.9), whereas the duration of the battery itself was slightly under 10 min on average (Mdn = 9.8). 3 = regularly practicing an instrument or singing for 6–12 hours a week; 4 = regularly practicing an instrument or sing- In light of the exploratory and preliminary nature of the study, two criteria were used to determine sample size. For ing for 13–30 hours a week; 5 = regularly practicing an instrument or singing for more than 30 hours a week); and item analyses, we sought to achieve a case-to-trial ratio of 3:1. For preliminary evidence regarding validity, we (c) participants’ years of music training in relation to their age. Because the three variables were internally consistent determined the sample size based on previous correlations between PROMS scores and external indicators of music (ω = .83), they were z-transformed and the mean across the three variables used as a composite index reflecting individ- training, which were in the range of .35–.50 (Law & Zent- ner, 2012; Zentner & Strauss, 2017). Analyses using the uals’ extent of music training and music-making, henceforth termed “music training” for the sake of brevity. R package pwr (version 1.3-0; Champely, 2020) indicated that a total of N = 244 (n = 61 per group) would suffice for Procedure the present purposes. The test was administered autonomously online, hosted on Results and discussion LimeSurvey (version 2.64.1+; LimeSurvey GmbH n.d). By embedding a JavaScript code, users’ operating systems A summary of the main characteristics and outcomes of each of the four test versions is presented in Table 2. Regarding and browsers are recognized and test delivery is adapted accordingly. This ensures stable test delivery regardless of internal consistency, the version in which trials from five PROMS subtests were presented in fixed order performed the device or browser being used, and it also greatly reduces susceptibility to technical problems, such as delayed stimu- best (Version 1), followed by Version 2, 3, and 4. In light of our goal to retain as many subtests of the original PROMS lus delivery. The tool also allows researchers to track test- taking times for each of the test components. as possible to ensure content validity, we retained Version 3 1 3 Behavior Research Methods for further development (henceforth termed Micro-PROMS). a better balance of subtests with differing number of tri - We accepted the slightly lower internal consistency in als. Subtests were presented in the following order: Melody, view of the fact that some lesser-performing trials could Tuning, Timbre, Tempo, Rhythm, Pitch, and Accent. be replaced in subsequent studies to attain higher internal consistency. Full‑PROMS The PROMS includes nine subtests with 18 trials each for a total of 162 trials. In the original publica- tion, internal consistency of the total score was ω = .95, Study 2 test-retest was ICC = .88,  and,  on average participants had scored 109.60 points (SD = 17.88, see Law & Zentner, Study 2 had three main objectives. First, we aimed at 2012). improving the psychometric properties of the version retained from Study 1 by replacing trials that had performed Music‑Mindedness Questionnaire The Music-Mindedness least well. Second, we examined the extent to which this Questionnaire (MMQ, Zentner & Strauss, 2017) comprises very brief version would be correlated with the full-length a four-item Music Competence scale (e.g., “I can tell when 162-trial version of the PROMS (Full-PROMS). Third, we an instrument is out of tune”) and a four-item Music Appre- sought to broaden the empirical basis for evaluating the new ciation scale (e.g., “Musical experiences belong to the most instrument’s psychometric properties by gathering additional precious experiences in my life”), each rated on a five-point information about its validity. To this end, we administered scale (1 = “not at all”; 5 = “very much”). In the current a questionnaire designed to capture musical competence and study, we used a slightly longer version of the MMQ, com- music appreciation, expecting the Micro-PROMS to exhibit prising seven items per scale. Internal consistency in the higher correlations with the competence than with the appre- current study was ω = .90 for the Music Competence scale ciation part of the questionnaire. To achieve these aims, we and ω = .87 for the Music Appreciation scale. administered both versions of the PROMS and the question- naire to a new sample of listeners. Music training Questions and coding regarding the partici- pants’ music training and background were the same as those Method described in Study 1. The internal consistency of the music training composite score was ω = .86. Participants Procedure Participants were 109 psychology students of the University of Innsbruck (36 male, 73 female, 0 other) aged 18–52 years The procedure was similar to that of Study 1. Participants (M = 23.23, SD = 5.37, Mdn = 22), who completed both the were first administered the Micro-PROMS, followed by the Micro-PROMS and the Full-PROMS and received course MMQ and the Full-PROMS. Informed consent was obtained credit for participation. None of the participants considered from all participants before assessment. We expected the themselves to be professional musicians, eight (7.3%) con- correlation between the short and the long version to be r sidered themselves to be semiprofessional musicians, 34 ≳ .50 and validity correlations with external indicators of (31.2%) amateur musicians, 56 (51.4%) music-loving non- musical proficiency to be r ≳. 30 (see Table 2). Power was musicians, and 11 (10.1%) nonmusicians. Of those classified again estimated using the R package pwr (version 1.3-0; as amateur musicians or above (n = 42), 25 reported that Champely, 2020). Results showed that to detect effects of they still practiced regularly, corresponding to a proportion this size with a power of .80 (α = .05), a sample of at least of 22.9% musically active participants against 77.1% partici- 84 participants would be required. Differences in correla - pants that were either musically nonactive or nonmusicians. tion coefficients were examined using the R package cocor (Diedenhofen & Musch, 2015), using Dunn and Clark’s z Measures (1969). Sensitivity values were computed using the same approach as in Study 1. Micro‑PROMS To rectify some of the psychometric inad- equacies of the retained version, we removed eight trials because of their low difficulty (% correct > 90%) and com- paratively low item-to-total correlations. The trials were The “Loudness” subtest exhibited low correlations with all other replaced by four trials from Version 1 examined in Study 1 PROMS subtests (Law & Zentner, 2012). It was therefore removed in and by one trial taken from the original Full-PROMS, all of the short version of full-PROMS, the PROMS-Short (see Zentner & which had performed well psychometrically. Also, subtest Strauss, 2017). In the current study, it was administered for the sake order was altered slightly from the one in Study 1 to achieve of completeness, but not considered in subtest-level analyses. 1 3 Behavior Research Methods were remarkably similar to those of the longer version. Of Table 3 Side-by-side comparison of descriptive statistics and psycho- metric properties of Micro-PROMS and Full-PROMS particular note was the increase in internal reliability to ω = .75 compared to the version retained from Study 1 (ω = Key metrics Micro-PROMS Full-PROMS .63). We found neither any signic fi ant die ff rences for gender, Number of trials 20 162 nor any significant correlations between the PROMS total Mean item-to-total correlation r = .40 r = .32 scores and age. Reliability ω = .75 ω = .95 Mean test duration in min (SD) 10.92 (3.07) 60.55 (13.35) Preliminary evidence for validity Mean raw score (SD) 11.96 (3.20) 105.16 (19.09) Mean % of trials answered 68.4% 73.8% To obtain initial validity information, we examined asso- correctly ciations of the Micro-PROMS total score with the Music Mean skewness (SD) −0.47 (1.00) −0.97 (2.00) Competence and Music Appreciation subscales of the Mean kurtosis (SD) −0.46 (1.49) 3.51 (15.07) MMQ. Because the PROMS measures musical ability rather Mean d (SD) 1.23 (0.77) 1.57 (0.44) than music appreciation, we expected test scores to be more MMQ Music Competence r = .62** r = .60** strongly related to the MMQ-Competence scale than to the MMQ Music Appreciation r = .31** r = .24* MMQ-Appreciation scale. As anticipated, the correlation Music training composite r = .47** r = .42** of the Micro-PROMS with MMQ-Competence was signifi- cantly higher than with MMQ-Appreciation (r = .62 vs. r = Note. N = 109. PROMS Profile of Music Perception Skills, MMQ Music-Mindedness Questionnaire. .31, z = 3.71, p < .001). Correct answers combined regardless of confidence level. To particularize the unique variance associated with *p < .05. **p < .01. each of the scales, we ran a multiple linear regression with the Micro-PROMS score as the outcome variable and the two MMQ-scales as predictors. When both variables were Results entered simultaneously, only the competence scale main- tained a significant association with the Micro-PROMS (β Descriptive statistics and psychometric properties = 0.60, 95% CI [0.43, 0.77], p < .001), whereas the appre- ciation scale ceased to explain variance in Micro-PROMS The correlation between the Full-PROMS total score and the test scores (β = 0.04, 95% CI [−0.13, 0.21], p = .639). In further support of validity, the Micro-PROMS was strongly Micro-PROMS was r = .72, p < .001 (with Micro-PROMS items removed, r = .66, p < .001). An examination of cor- correlated with the music training composite score repre- senting external indicators of musical proficiency (r = .47, relations between the Micro-PROMS score and individual PROMS subtest correlations is also of interest, for the new p < .001). battery was conceived to represent the broadest possible range of musical dimensions that is distinctive of the full- Discussion length PROMS. This would not be the case, for instance, if the Micro-PROMS exhibited much stronger correlations Study 2 provided further evidence in support of the psycho- metric soundness of the Micro-PROMS. The high intercor- with certain subtests than with others. Thus, if the Micro- PROMS were to correlate highly with Timbre and Pitch but relation between the Micro-PROMS and the Full-PROMS indicates that the total score of the Micro-PROMS provides exhibit low correlations with Accent and Tempo, that would indicate that the Micro-PROMS primarily measures abilities a reasonable approximation to the total score of the full- length PROMS. This finding is noteworthy in light of the in timbre and pitch discrimination, and does not sufficiently account for discrimination abilities in the domains of rhythm drastic reduction of trials from 162 to 20 and the fact that the two versions of the battery were administered sepa- and tempo. Ideally, then, we should find that all subtests exhibit moderately strong correlations with the Micro- rately. Moreover, the changes to trials relative to the version retained from Study 1 led to satisfactory internal consist- PROMS total score. As shown by the individual PROMS subtest correlations with the Micro-PROMS total score, this ency. Validity correlations were similar to those that were obtained with the Full-PROMS. For example, the correla- was indeed the case: r = .48 (Embedded Rhythms), r = .51 (Pitch, Tempo), r = .54 (Tuning), r = .56 (Timbre), r = .58 tion between the music training composite and the Micro- PROMS was r = .47, which is not significantly different (Accent), r = .62 (Rhythm), r = .68 (Melody), all ps < .001. Table  3 provides additional information related to the from the r =.42 correlation found for the Full-PROMS (z = 0.73, p = .463). Finally, content validity was evidenced Micro-PROMS (left column) and the Full-PROMS (middle column), notably raw scores, d  scores, and internal consist- by similar-sized and substantial correlations between the Micro-PROMS and all of the subtests of the Full-PROMS. ency values. Overall, the key metrics of the Micro-PROMS 1 3 Behavior Research Methods This evidence is important in light of our goal to create an of 32 participants took part in a retest session conducted in instrument that would be capable of reflecting the broadest the laboratory after completing the full test battery. possible range of musical skills that can be assessed with the Overall, 14 participants (7.1%) considered themselves original full-length PROMS. as nonmusicians, 79 (39.9%) each as music-loving nonmu- Despite these encouraging results, not all trials performed sicians and amateur musicians, 25 (12.6%) as semiprofes- equally well, suggesting that some lesser-performing trials sional musicians, and 1 (0.5%) as professional musician. The could be removed to further shorten the battery. Moreover, majority of participants reported having either played an the case for validity was limited to correlations with par- instrument or sung at some earlier point in their lives (n = ticipants’ self-report of musical behavior and competence. 47, 23.7%) or that they were still practicing regularly (n = A stronger case for validity could be made if the predicted 87, 43.9%), compared to a third of participants (n = 64) that pattern of convergent and discriminant validity correlation reported having never been musically active in their lives. could be replicated on the basis of objective tests. Measures Study 3 Micro‑PROMS To improve the psychometric efficiency of the instrument, we removed three trials that had not added The goal of Study 3 was threefold—first, to replace and/ to reliability or validity. Furthermore, the Timbre subtest of or remove trials that had not worked well in Study 2 and the full-length PROMS had been represented by only one to re-examine the psychometric properties of the battery; trial in the version retained for Study 2. To ensure that each second, to evaluate the test–retest reliability of the battery; of the seven subtests of the original PROMS was represented and third, to expand the basis for evaluating the validity of by at least two trials, we added a trial from the Full-PROMS the battery. Convergent validity was examined against the Timbre subtest. As can be seen from Table 4, some musi- Musical Ear Test (MET), an established battery of musical cal dimensions were represented with two items, and some ability in melody and rhythm perception (Wallentin et al., with three items. This was done intentionally to achieve a 2010), whereas two composite indices of musical training balance of “sequential” and “sensory” components of musi- and expertise, including the Goldsmiths Musical Sophisti- cal perceptual ability, each represented with nine items. The cation Index (Gold-MSI), served as indicators of criterion distinction is based on factor analyses of original version validity. of the PROMS, which distinguished two PROMS factors, Discriminant validity was examined against the Digit with Accent, Rhythm, Melody loading on the sequential Span (DS) test, which measures both short-term and work- factor, and Pitch, Tempo, Timbre, Tuning on the sensory ing memory (Groth-Marnat et al., 2001). Furthermore, we factor (Law & Zentner, 2012). Subtest order was the same examined differential patterns, expecting stronger associa - as in Study 2. tions between the Micro-PROMS and scales tapping into music training or competence (e.g., MMQ-Competence; Music training Questions and coding regarding the partici- Gold-MSI perceptual abilities) than scales relating to music pants’ music training were the same as described in Stud- appreciation or engagement with music (e.g., MMQ-Appre- ies 1 and 2, comprising participants’ self-rated level of ciation; Gold-MSI emotions). musicianship, musical activity, and years of music training adjusted for age. Internal consistency was ω = .71. Method MusicM ‑ indedness Questionnaire (MMQ) As in Study 2, par- Participants ticipants completed the MMQ, which assesses music com- petence and music appreciation. Both scales were internally A total of 198 participants (77 male, 120 female, 1 other), consistent (ω = .85 and ω = .86, respectively). aged 15–59 years (M = 23.89, SD = 6.45, Mdn = 22), fully completed the Micro-PROMS as well as the MET. Of those Goldsmiths Musical Sophistication Index (Gold‑MSI) The participants, 196 (99.0%) also completed the MMQ, 165 Gold-MSI (Müllensiefen et al., 2014) is a multidimensional (83.3%) the DS test, and 105 (53.0%) the Gold-MSI. A total self-report questionnaire assessing participants’ musical sophistication on the basis of their active musical engage- ment, perceptual abilities, music training, singing abilities, A certain amount of attrition occurred for DS because test par- and emotional responses to music. It comprises 38 items and ticipants were asked to install special assessment software on their is sensitive to individual differences in music sophistication device. The Gold-MSI was added to the test battery at a second stage in both musicians and nonmusicians (Müllensiefen et al., of data collection to expand the basis for examining content valid- ity (see Procedure). 2014). In the current study, we used the German translation 1 3 Behavior Research Methods Table 4 Descriptive and psychometric information for each of the 18 Micro-PROMS items Item number Full-length Correct answer Mean SD% Correct RIT RIR ω if item is PROMS dropped 1 M2 D 0.91 0.27 93 .41 .35 .70 2 M11 D 0.76 0.40 81 .46 .35 .70 3 M12 D 0.62 0.42 74 .31 .15 .69 4 TU12 D 0.24 0.38 32 .32 .18 .72 5 TU17 S 0.85 0.30 93 .30 .19 .73 6 TU18 D 0.57 0.45 66 .46 .31 .71 7 TB14 D 0.39 0.44 47 .48 .34 .71 8 TB16 S 0.84 0.30 93 .23 .12 .73 9 TE5 D 0.81 0.36 86 .29 .16 .73 10 TE12 D 0.60 0.43 71 .53 .40 .68 11 R4 S 0.55 0.45 65 .32 .16 .72 12 R12 D 0.88 0.30 92 .41 .31 .70 13 R18 D 0.78 0.38 83 .40 .27 .69 14 P10 D 0.37 0.44 45 .52 .40 .68 15 P12 D 0.33 0.42 41 .39 .24 .72 16 A3 D 0.81 0.35 87 .43 .33 .70 17 A5 S 0.68 0.42 77 .35 .22 .71 18 A12 D 0.49 0.46 57 .33 .16 .71 Note. RIT = item-total correlation (correlation between item score and overall test score); RIR = item-rest correlation (correlation between item score and overall test score without the given item); M = Melody; TU = Tuning; TB = Timbre; TE = Tempo; R = Rhythm; P = Pitch; A = Accent. D = Different; S = Same. Item designation in the full-length PROMS. For example, the first item of the Micro-PROMS corresponds to Melody subtest item number 2 in the full-length PROMS, hence M2. Percentage of correct answers regardless of confidence level. by Schaal et al. (2014). Internal consistency for the subscales order (backward DS). Lists successively increase in length ranged from ω = .84 to ω = .89 and was ω = .91 for the and thus in difficulty. Assessment is stopped when partici- overall score of musical sophistication. pants fail to recall two consecutive lists of the same length. For the current study, the online version of the auditory Musical Ear Test (MET) The MET (Wallentin et al., 2010) DS test implemented in Inquisit 4 (Millisecond Software, comprises a Melody and a Rhythm subtest with 52 trials LLC, 2015) based on the procedure reported by Woods et al. each and takes approximately 20 min to complete (Cor- (2011) was administered. We report two estimates each for reira et al., 2022). Participants are asked to judge whether forward and backward recall: (1) total trial (TT)—the TT- two melodic or rhythmic phrases are identical or not. The score reflects the total number of both correct and incor - MET has been shown to have high internal consistency (α = rect trials presented prior to two consecutive errors at the .87) and to be significantly correlated with other measures same list length; (2) maximum length (ML)—the ML-score of musical expertise (Correira et al., 2022; Swaminathan reflects the maximum length of the list that was success- et al., 2021; Wallentin et al., 2010). For the present study, fully recalled. While the TT-score is similar to the widely an online version of the MET was implemented by the study used total correct score obtained from the Digit Span sub- authors using LimeSurvey (LimeSurvey GmbH n.d). Internal test of the Wechsler intelligence tests, it shows poorer test– consistency was ω = .73 for the Melody subtest, ω = .75 for retest reliability and convergent validity than the ML-score the Rhythm subtest, and ω = .85 for the total score. (Woods et al., 2011). DS test The DS test is a widely used tool for measuring both Procedure short-term and working memory (see Groth-Marnat et al., 2001). Participants are presented with a list of numbers for Similar to previous studies, instruments were administered a few seconds and are then asked to reproduce them from online via LimeSurvey (version 2.64.1+; LimeSurvey GmbH memory, either in the same order (forward DS) or in reverse n.d) and Inquisit 4 (Millisecond Software, LLC, 2015). 1 3 Behavior Research Methods Inquisit was used to administer the DS test and required was 4.00 and the highest score was 17.50 points. For the participants to install an app to enable remote assessment. MET, scores ranged between 51 and 98 points, with an aver- Participants were recruited via a university mailing list, as age score of 77.84 (SD = 9.42, Mdn = 79) out of 104 points. well as through the subreddit r/SampleSize (see Shatz, 2017) Mean scores and variances were similar to those reported and postings on the social media platforms Facebook and in a different online administration of the MET (Correira Instagram. Upon completion of the assessments, partici- et al., 2022), indicating that the current test conditions and pants received individual feedback on their performance on outcomes were comparable across the studies. the Micro-PROMS, the MET, and the DS test. Psychology The average sensitivity of the Micro-PROMS, based on students (n = 88, 44.4%) from the University of Innsbruck d (see Study 1, Method), was M = 1.39 (SD = 0.78; Mdn additionally received course credit for their participation. = 1.33) and M = 1.32 (SD = 0.80; Mdn = 1.28) for initial Participants were asked to complete the assessments in a and retest assessments, respectively (see Table 4 for detailed quiet environment and to use headphones to minimize pos- psychometric information for each of the 18 trials). Internal sible distractions. consistency was ω = .70 for the initial assessment and ω = Data collection took place in two stages (see Partici- .85 for the retest assessment. Test–retest reliability, com- pants). After agreeing to an informed consent statement, puted using a two-way mixed-effects model (single ratings, participants were asked to provide information relating to absolute agreement), was ICC = 0.83, 95% CI [0.68, 0.92]. sociodemographic and music background. After a sound This value is very close to the test–retest figures obtained for calibration test, in stage 1 of the data collection, measures the original version of the PROMS (Law & Zentner, 2012), were completed in the following order: (1) Micro-PROMS, as well as for the Short-PROMS and the Mini-PROMS (2) MMQ, (3) MET, (4) DS test. In stage 2 of the data col- (Zentner & Strauss, 2017). lection, the position of the Micro-PROMS, MET, and DS remained the same, and the Gold-MSI and MMQ were Convergent, discriminant, and criterion validity administered randomly in either second or fourth position of the sequence. After completion of the stage 2 assess- As shown in Table 5, the Micro-PROMS was highly and ments, participants were offered the opportunity to sign up significantly correlated with the MET and with MMQ- for a retest session against an allowance of 10 EUR. Thirty- Competence, lending support to its convergent validity. two individuals (16.2%) participated in the retest assess- Discriminant validity was evidenced by significant but ment, which took place in the laboratory. The time interval small correlations with DS forward, DS backward, and between the initial and the retest assessment was slightly Gold-MSI Emotions. Furthermore, convergent correlations over a week on average (M = 8.55, SD = 2.95, Mdn = 9). were significantly and markedly larger than discriminant From intercorrelations between the PROMS and other correlations (see Table S2). Correlations were almost iden- objective musical aptitude tests (Law & Zentner, 2012), tical when controlled for participants’ sex and age. With associations between the MET and the Gold-MSI (Cor- regard to criterion validity, the Micro-PROMS total score reira et al., 2022), and from findings of Studies 1 and 2, was significantly associated with both our music training we expected convergent and criterion correlations of r >. composite and the broader index of musicality provided 30. From the literature about associations between memory by the Gold-MSI total score. Figure 1 depicts the scores of capacity and musical aptitude, we expected these associa- the Micro-PROMS for participants with different levels of tions to fall within the range of r = .20–.30 (e.g., Kunert self-reported musicianship, the latter representing one of et  al., 2016; Swaminathan et al., 2021). As in the previ- the variables included in our music training composite (see ous two studies, power was estimated using the R package Measures in Study 1). pwr (version 1.3-0; Champely, 2020). Results showed that Overall, the pattern of convergent and discriminant valid- to detect correlations of this size with a power of .80 (α = ity correlations of the Micro-PROMS and the MET were .05), the current sample sizes were sufficient. Differences quite similar. For example, the Gold-MSI total score cor- in correlation coefficients were examined using the R pack - relations with the Micro-PROMS and the MET did not dif- age cocor (Diedenhofen & Musch, 2015), using Dunn and fer significantly (r = .52 vs. r = .49, z = 0.35, p = .723). Clark’s z (1969). Moreover, the Micro-PROMS and the MET total score each explained independent amounts of variance in Gold-MSI Results General Musical Sophistication when entered simultane- ously in a multiple regression (see Table 6). When the MET Descriptive statistics and reliability Melody and MET Rhythm subtests were entered separately, the Micro-PROMS was the sole significant predictor. As On average, participants scored 11.08 (SD = 2.85; Mdn = in previous studies, discriminant correlations of the Micro- 11) out of 18 points on the Micro-PROMS. The lowest score PROMS with memory capacity were somewhat lower than 1 3 Behavior Research Methods 1 3 Table 5 Means, standard deviations, and zero-order correlations of the Micro-PROMS and key validity measures Variable M SD n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 Micro-PROMS 11.08 2.85 198 2 MET Melody score 38.58 5.80 198 .57** 3 MET Rhythm score 39.26 4.94 198 .45** .54** 4 MET total score 77.84 9.42 198 .59** .90** .85** a + 5 DS forward (2-error max length) 6.89 1.45 165 .16* .30** .15 .26** 6 DS backward (2-error max length) 6.15 1.47 165 .23** .25** .25** .29** .42** 7 MMQ Competence 2.99 0.86 196 .46** .42** .32** .43** .15 .12 + + 8 MMQ Appreciation 3.39 0.88 196 .18* .19** .12 .18* −.10 −.15 .48** 9 Music training composite −0.00 0.79 198 .26** .39** .16* .33** .11 .01 .75** .42** 10 Gold-MSI Active Engagement 3.68 1.10 105 .34** .15 .35** .28** −.08 −.10 .49** .69** .37** 11 Gold-MSI Perceptual Abilities 5.25 0.91 105 .45** .29** .38** .38** .06 .04 .76** .38** .59** .38** 12 Gold-MSI Singing Abilities 3.92 1.15 105 .46** .41** .25** .38** .09 .01 .66** .40** .54** .27** .60** 13 Gold-MSI Musical Training 3.69 1.36 105 .40** .49** .39** .51** .08 −.05 .79** .34** .79** .34** .61** .53** 14 Gold-MSI Emotions 5.41 0.93 105 .22* .12 .21* .18 −.31** −.27** .21* .62** .02 .51** .30** .22* .09 15 Gold-MSI General Musical Sophistication 3.99 0.98 105 .51** .46** .41** .49** .06 −.07 .86** .56** .77** .61** .75** .79** .82** .33** Note. DS = Digit Span; Gold-MSI = Goldsmiths Musical Sophistication Index; MET = Musical Ear Test; MMQ = Music-Mindedness Questionnaire; PROMS = Profile of Music Perception Skills. p < .10. *p < .05. **p < .01. If DS scores were computed using the Total Trial instead of the Maximum Length scoring, correlations with the Micro-PROMS were r = .15* and r = .14 ns. Correlations DS-forward DS-backward with the MET total score were r = .24** and r = .24**. MET-Subtest correlations were: MET-Melody with r = .26** and r = .21**; MET-Rhythm with DS-forrward DS-backward DS-forward DS-backward r = .14 ns., and with r = .21**. DS-forward DS-backward Behavior Research Methods Fig. 1 Distribution of Micro-PROMS total scores by different levels of self-reported musicianship status. Note. N = 198. Micro-PROMS scores differ as a function of self-reported musicianship level, F(4,193) = 4.47, p = .002 short-term memory and working memory, and against Table 6 Summary of multiple regression analyses predicting music sophistication from MET and Micro-PROMS scores self-report scales assessing emotional rather than ability components of musicality. The criterion validity correla- Gold-MSI General Musical Sophistication tions with the composite index of musical training was b SE 95% CI β p significant, if somewhat attenuated relative to the respec- [LL, UL] tive correlations reported in studies using the full-length (Intercept) 0.31 0.65 [−0.97, 1.59] .631 PROMS and the Mini-PROMS (Law & Zentner, 2012; Micro-PROMS 0.12 0.04 [0.04, 0.20] 0.33 .003 Zentner & Strauss, 2017), and those found in the present MET (Total score) 0.03 0.01 [0.01, 0.05] 0.29 .007 Studies 1 and 2. A possible explanation for the differences 2 2 Model Fit R = .310, R = .296, F(2, 102) = 22.87 is provided by the near absence of professional musicians adj (p < .001) in the current sample (see Fig. 1) and the resulting range (Intercept) 0.33 0.65 [−0.96, 1.62] .613 restriction in musicianship status. Still, the Micro-PROMS Micro-PROMS 0.12 0.04 [0.04, 0.20] 0.32 .003 explained significant and substantial amounts of variance MET (Melody score) 0.03 0.02 [−0.00, 0.07] 0.20 .069 in Gold-MSI General Musical Sophistication, even when MET (Rhythm score) 0.03 0.02 [−0.01, 0.06] 0.14 .187 controlling for MET scores. 2 2 Model Fit R = .310, R = .290, F(3, 101) = 15.13 adj (p < .001) Note. N = 105. Gold-MSI Goldsmiths Musical Sophistication Index, MET Musical Ear Test, PROMS Profile of Music Perception Skills. General discussion Across three studies involving over 580 participants, the those found for memory-to-MET correlations (Swaminathan current research introduced a test battery for the assess- ment of musical ability that has some distinctive features et al., 2021; Wallentin et al., 2010). relative to earlier batteries. First, it is capable of providing an overall assessment of musical ability in about 10 min, Discussion making it the shortest test battery of overall perceptual musical ability that we are aware of. Second, it takes a Taken together, these results offer solid evidence that the broad range of music perception skills into account. In addition to tasks relating to discrimination for melody and Micro-PROMS provides a reliable and valid assessment of musical ability, despite its short duration. Specifically, rhythm, it includes trials relating to discrimination skills in the domains of pitch, timbre, tuning, tempo, and accent. the instrument’s final 18-trial version proved internally consistent and also exhibited good test–retest reliability. Third, it has been devised for online administration that is easy for researchers and participants to use. In combining Importantly, the Micro-PROMS met all validity criteria for successful test performance: convergent validity with these features, the current measure goes an important step beyond previously existing measures toward meeting the the MET—a different, well-established battery of musi- cal ability—as well as with a self-report instrument relat- requirement for a tool that can be used online to identify musical ability when time is critical. ing to musical competence; discriminant validity against 1 3 Behavior Research Methods Nederlanden et al., 2020), than those reported for the MET Psychometric properties (Correira et al., 2022; Swaminathan et al., 2021; Wallentin et al., 2010; Zentner & Gingras, 2019). In the current study, Against the aim of providing an overall summative score of musical ability in a very short time, the Micro-PROMS the differences in correlations of the two instruments with memory outcomes were consistent with the earlier findings, met the psychometric criteria of successful test performance. Specifically, despite using less than 15% of the trials of the if somewhat less pronounced. The small associations between performance on the full-length PROMS, the total scores of the two instruments were quite highly correlated and exhibited similar psycho- Micro-PROMS and memory tasks could be due to two dis- tinctive aspects of the PROMS. First, trials assessing skills metric properties. Naturally, shortening a test battery to such an extent has costs. For example, the overall scores repre- in domains such as timbre, pitch, and tuning are shorter and somewhat less complex than trials assessing rhythm and sent different levels of granularity, with the Micro-PROMS functioning as a screening tool for general musical aptitude, melody perception, taxing memory capacity less as a result. Thus, Talamini et al. (2016) found that only the Melody whereas the Full-PROMS provides a very detailed profile of multiple music perception abilities. Thus, the Micro- subtest of the Mini-PROMS was substantially correlated with auditory working memory. Second, in all versions of PROMS offers no domain-specific results, with the con- sequence that specific strengths and weaknesses cannot be the PROMS, the reference stimulus is presented twice to facilitate its encoding, whereas in the MET and in other assessed as is the case with the longer forms of the PROMS. All the same, the diversity of the contents represented by the music aptitude batteries that we are aware of, the reference stimulus is presented only once. Both the shorter duration of subscales in the longer PROMS versions was preserved to some extent by including trials from nearly all subtests of the trials and the repetition of the reference stimulus seem to leave individual differences in memory skills little room the long version in the Micro-PROMS. Conceptually, then, the total score of the Micro-PROMS can be compared with to affect performance. the total score of the full-length PROMS. Empirically, the comparability was evaluated against a Implications and uses number of psychometric measures. Internal consistency was adequate, if somewhat lower than that of the full-length The Micro-PROMS will be particularly useful in situa- tions where time is critical and researchers are primar- PROMS. This was to be expected due to the Micro-PROMS’ small number of items capturing a wide range of musical ily interested in a summative, overall estimate of musical ability. This may be the case, for example, when musi- content. In terms of validity, the pattern of correlations related to convergent, discriminant, and criterion validity cal aptitude needs to be assessed as a secondary variable alongside several other constructs, when it is assessed as was comparable to that of the full-length PROMS. More spe- cifically, convergent and discriminant validity could be dem- a control variable, or when the target sample is a special population with limited attentional resources (e.g., chil- onstrated against both objective ability tests (i.e., MET, DS) and self-report scales (Gold-MSI, MMQ). Criterion validity dren, older adults, clinical samples). For the latter groups, the variations in trial content may be of additional help in correlations with composite indices of musical training and expertise were significant and sizeable. sustaining attention and concentration. Brevity can be crit- ical regardless of the target population, especially when A comparison between the Micro-PROMS and the MET revealed that the two instruments perform about equally investigators seek to obtain large and diverse samples. Thus, it has been found that the risk of dropout increases well in psychometric terms (see Correira et al., 2022; Swa- minathan et al., 2021). In terms of reliability, the MET has by up to 20% for each additional 10-min interval in web- based studies (Galesic & Bošnjak, 2009; see also Liu & slightly higher internal consistency than the Micro-PROMS, which was to be expected given the MET’s higher number Wronski, 2018). Like previous versions of the PROMS, the Micro- of trials and stronger homogeneity in trial content. Because the test–retest reliability of the MET remains to be exam- PROMS was specifically devised for online administration. The process of making the PROMS suitable for online test- ined, test–retest reliability comparisons could not be drawn between the two instruments. With regard to validity, the ing involved technical aspects, such as ensuring adaptability to variations in computer hardware, operating systems, and size of correlations between the two batteries and the Gold- MSI were very similar, and similar also relative to MET- types of browsers, as well as ensuring that participants can take the test in the absence of an experimenter by formulat- to-Gold-MSI correlations found in an earlier study with a large sample (Correira et al., 2022). In turn, the discrimi- ing instructions that are clear and easy to follow. Further- more, the settings allow researchers to provide automatically nant correlations of the Micro-PROMS with short-term and working memory have typically been lower, in the range generated feedback of results displayed at the end of the test, which can be an incentive for participation. of r ≈ .20 (e.g., Kunert et  al., 2016; Vanden Bosch der 1 3 Behavior Research Methods Although the limited control over the participants’ listen- Finally, although comparatively ample evidence for the ing environment is an understandable source of concern, battery’s convergent, discriminant, and criterion validity was research indicates that online and offline assessments of obtained in the current studies, the validation of any test bat- musical aptitude yield largely similar results (Correira et al., tery is a continuous process that will require different types 2022). In our own research, we found that PROMS key met- of independent studies to produce definite results. The pro- rics, such as internal reliability, trial difficulty, and validity cess might involve validation by examining associations with correlations, obtained in the laboratory (Law & Zentner, proximal indicators of musical behaviors, such as the ease 2012) and remotely online (Zentner & Strauss, 2017) were with which musical novices acquire skills in understand- very similar. This finding is consistent with replications of ing and/or producing music over time, or studies relating data from in-person testing by data acquired online (e.g., to distal criteria, i.e., nonmusical abilities that should none- Chetverikov & Upravitelev, 2016; Chierchia et al., 2019; theless be conceptually related to musical aptitude, such as Nussenbaum et al., 2020; Zentner & Strauss, 2017), includ- phonological awareness or vocal emotion recognition. Such ing in the auditory domain (Milne et al., 2021). information will be valuable but will take years to collect. Despite evidence suggesting that online assessments of Despite its limitations, the Micro-PROMS closes an musical ability are reliable, internet assessments will likely important gap in tools available for the assessment of musi- introduce a small amount of noise relative to in-person testing. cal ability. If a summative score of musical ability is all that Ultimately, the potential drawbacks of online testing need to researchers need, the Micro-PROMS represents an interest- be weighed against its advantages, such as the ease of reaching ing alternative to longer versions of the PROMS or to other diverse samples, rare or specific subpopulations, or large num- music aptitude batteries due to the broad array of music per- bers of participants who will in turn provide higher statistical ception skills covered by the test, its brevity, and the ease power. Sometimes there is simply no choice, as has been the with which it can be administered online. case in many parts of the world during the 2020–2022 pan- Supplementary Information The online version contains supplemen- demic. Researchers preferring to administer the Micro-PROMS tary material available at https://doi. or g/10. 3758/ s13428- 023- 02130-4 . via in-person or laboratory testing can easily do so, provided their work environments are connected to the internet. Acknowledgements We wish to thank Dr Michael Hautus and Dr Mat- thias Gondan for their helpful advice on data analyses. Authors' contributions MZ conceptualized the project. MZ, HS, and Limitations SR developed the study design. HS, SR, and MD collected the data. All authors contributed to the analyses. MZ and HS wrote the paper. Several limitations of the present investigation are note- Funding Open access funding provided by University of Innsbruck and worthy. First, although we found the Micro-PROMS to be Medical University of Innsbruck. psychometrically sound overall, additional studies are nec- essary to better establish its psychometric properties. For Data availability The data and materials for all studies are available at https:// osf. io/ au6m5/. Administering the Micro-PROMS does not example, our samples were relatively homogeneous, and it require a code, as it is freely accessible online. In order to request a is therefore necessary to examine the psychometric proper- PROMS research account please visit: https://musem ap. or g/r esour ces/ ties of the Micro-PROMS in samples of different ethnic and proms. None of the experiments was preregistered. educational backgrounds, and across different age groups. Code availability (software application or custom code) Code to com- The Micro-PROMS should be suitable for use in child and pute d-prime estimates for the Micro-PROMS is available at https:// older adult populations because of its brevity, but the extent osf. io/ au6m5/. to which this is the case remains to be determined. Second, it is important to keep in mind that the PROMS Declarations measures perceptual musical abilities. Although there is evi- Ethics approval The questionnaire and methodology for this study were dence to suggest that perceptual musical abilities are sub- approved by the Board for Ethical Issues of the University of Innsbruck stantially correlated with certain musical production skills, (No. 69/2021). such as tapping a tempo or rhythm (Dalla Bella et al., 2017; Georgi et al., 2023), current definitions of musicality encom- Consent to participate Informed consent was obtained from all indi- vidual participants included in the study. The statement was worded pass components such as abilities in the domains of perform- as follows: “All results will be processed in a way that guarantees your ing or creating music (Levitin, 2012). The moderately strong anonymity. Your participation is voluntary, and you may withdraw at correlations between performance on the PROMS and exter- any stage in the proceedings.” nal indicators of musical proficiency, such as being a musi- Consent for publication Participants were predominantly drawn from cian, are encouraging, but do not obviate the need for the a university mailing list which students can subscribe to if they are construction of test batteries that tap into a broader array of interested in participating in scientific studies conducted by members musical talents. 1 3 Behavior Research Methods of the University of Innsbruck. By subscribing, participants consent to Galesic, M., & Bošnjak, M. (2009). Effects of questionnaire length on the publication of anonymized data. participation and indicators of response quality in a web survey. Public Opinion Quarterly. Advance online publication. Georgi, M., Gingras, B., & Zentner, M. (2023). The Tapping-PROMS: Conflicts of interest/Competing interests The authors have no relevant A test for the assessment of sensorimotor rhythmic abilities. financial or nonfinancial interests to disclose. Frontiers in Psychology 13. https:// doi. org/ 10. 3389/ fpsyg.202 2. Open Access This article is licensed under a Creative Commons Goshorn, E. L., & Goshorn, J. D. (2001). Analysis of yes–no and con- Attribution 4.0 International License, which permits use, sharing, fidence-rating word recognition response formats. The Journal of adaptation, distribution and reproduction in any medium or format, the Acoustical Society of America, 110(5), 2706–2706. as long as you give appropriate credit to the original author(s) and the Grassi, M., Meneghetti, C., Toffalini, E., & Borella, E. (2017). Audi- source, provide a link to the Creative Commons licence, and indicate tory and cognitive performance in elderly musicians and non- if changes were made. The images or other third party material in this musicians. PLoS ONE, 12(11), e0187881. https:// doi. org/ 10. article are included in the article's Creative Commons licence, unless 1371/ journ al. pone. 01878 81 indicated otherwise in a credit line to the material. If material is not Groth-Marnat, G., Gallagher, R. E., Hale, J. B., & Kaplan, E. (2001). included in the article's Creative Commons licence and your intended The Wechsler Intelligence Scales. In A. S. Kaufman & N. L. use is not permitted by statutory regulation or exceeds the permitted Kaufman (Eds.), Cambridge child and adolescent psychiatry. use, you will need to obtain permission directly from the copyright Specific learning disabilities and difficulties in children and holder. To view a copy of this licence, visit http:// creat iveco mmons. adolescents: Psychological Assessment and Evaluation (pp. org/ licen ses/ by/4. 0/. 29–51). Cambridge University Press. Hautus, M. J. (1995). Corrections for extreme proportions and their biasing effects on estimated values of d ′. Behavior Research Methods, Instruments, & Computers, 27, 46–51. References Hautus, M., Macmillan, N. A., & Creelman, C. D. (2021). Detection Theory. A User's Guide. Routledge. Kruyen, P. M., Emons, W. H. M., & Sijtsma, K. (2013). On the Aujla, H. (2022). do′: Sensitivity at the optimal criterion location. shortcomings of shortened tests: a literature review. Interna- Behavior Research Methods, 1–27. https:// doi. or g/ 10. 3758/ tional Journal of Testing, 13(3), 223–248. https:// doi. org/ 10. s13428- 022- 01913-5 1080/ 15305 058. 2012. 703734 Bégel, V., Dalla Bella, S., Devignes, Q., Vandenbergue, M., Lemaî- Knoblauch, K. (2022). psyphy: Functions for Analyzing Psychophys- tre, M.-P., & Dellacherie, D. (2022). Rhythm as an independ- ical Data in R. https:// CRAN. Rproj ect. org/ packa ge= psyphy. ent determinant of developmental dyslexia. Developmental Kunert, R., Willems, R. M., & Hagoort, P. (2016). An Independ- Psychology, 58(2), 339–358. https:// doi. org/ 10. 1037/ dev00 ent psychometric evaluation of the PROMS Measure of Music Perception Skills. PLOS ONE, 11(7), e0159103. https://doi. or g/ Boll-Avetisyan, N., Bhatara, A., & Höhle, B. (2020). Processing of 10. 1371/ journ al. pone. 01591 03 rhythm in speech and music in adult dyslexia. Brain Sciences, Lam, H. L., Li, W. T. V., Laher, I., & Wong, R. Y. (2020). Effects of 10(5), 261. https:// doi. org/ 10. 3390/ brain sci10 050261 music therapy on patients with dementia—a systematic review. Brancatisano, O., Baird, A., & Thompson, W. F. (2020). Why is music Geriatrics, 5(4), 62. https://doi. or g/10. 3390/ g eriatr ics50400 62 therapeutic for neurological disorders? The Therapeutic Music Law, L. N. C., & Zentner, M. (2012). Assessing musical abilities Capacities Model. Neuroscience & Biobehavioral Reviews, 112, objectively: construction and validation of the profile of music 600–615. perception skills. PLoS ONE, 7(12), e52508. https://doi. or g/10. Champely, S. (2020). pwr: Basic functions for power analysis [Com- 1371/ journ al. pone. 00525 08 puter software]. https:// CRAN.R- proje ct. org/ packa ge= pwr Levitin, D. J. (2012). What does it mean to be musical? Neuron, Chetverikov, A., & Upravitelev, P. (2016). Online versus offline: The 73(4), 633–637. https:// doi. org/ 10. 1016/j. neuron. 2012. 01. 017 Web as a medium for response time data collection. Behavior LimeSurvey GmbH (n.d). LimeSurvey [Computer software]. Research Methods, 48(3), 1086–1099. https:// doi. org/ 10. 3758/ Liu, M., & Wronski, L. (2018). Examining completion rates in web s13428- 015- 0632-x surveys via over 25,000 real-world surveys. Social Science Chierchia, G., Fuhrmann, D., Knoll, L. J., Pi-Sunyer, B. P., Computer Review, 36(1), 116–124. h t t p s : / / d o i . o r g / 1 0 . 1 1 7 7 / Sakhardande, A. L., & Blakemore, S.-J. (2019). The matrix rea- 08944 39317 695581 soning item bank (MaRs-IB): novel, open-access abstract reason- Macmillan, N.A. & Creelman, C.D. (2005). Signal detection theory. ing items for adolescents and adults. Royal Society Open Science, 2nd edition. Erlbaum. 6(10), 190232. https:// doi. org/ 10. 1098/ rsos. 190232 Macmillan, N. A., & Kaplan, H. L. (1985). Detection theory analysis Correira, A. I., Vincenzi, M., Vanzella, P., Pinheiro, A. P., Lima, C. of group data: estimating sensitivity from average hit and false- F., & Schellenberg, E. G. (2022). Can musical ability be tested alarm rates. Psychological Bulletin, 98(1), 185–199. online? Behavior Research Methods, 54(2), 955–969. https://doi. Mamassian, P. (2020). Confidence forced-choice and other metaper - org/ 10. 3758/ s13428- 021- 01641-2 ceptual tasks. Perception, 49(6), 616–635. https:// doi. org/ 10. Dalla Bella, S., Farrugia, N., Benoit, C.-E., Begel, V., Verga, L., Hard- 1177/ 03010 06620 928010 ing, E., & Kotz, S. A. (2017). Baasta: Battery for the Assess- Marquez-Garcia, A. V., Magnuson, J., Morris, J., Iarocci, G., Does- ment of Auditory Sensorimotor and Timing Abilities. Behavior burg, S., & Moreno, S. (2022). Music therapy in autism spec- Research Methods, 49(3), 1128–1145. https:// doi. org/ 10. 3758/ trum disorder: a systematic review. Review Journal of Autism s13428- 016- 0773-6 and Developmental Disorders, 9(1), 91–107. https://doi. or g/10. Diedenhofen, B., & Musch, J. (2015). cocor: A comprehensive solution 1007/ s40489- 021- 00246-x for the statistical comparison of correlations. PLoS ONE, 10(4), Millisecond Software, LLC. (2015). Inquisit 4 [Computer software]. e0121945. https:// doi. org/ 10. 1371/ journ al. pone. 01219 45 https:// www. milli second. com Dunn, O. J., & Clark, V. (1969). Correlation coefficients measured on Milne, A. E., Bianco, R., Poole, K. C., Zhao, S., Oxenham, A. J., Bil- the same individuals. Journal of the American Statistical Associa- lig, A. J., & Chait, M. (2021). An online headphone screening tion, 64(325), 366. https:// doi. org/ 10. 2307/ 22837 46 1 3 Behavior Research Methods test based on dichotic pitch. Behavior Research Methods, 53(4), Talamini, F., Carretti, B., & Grassi, M. (2016). The working memory of 1551–1562. https:// doi. org/ 10. 3758/ s13428- 020- 01514-0 musicians and nonmusicians. Music Perception, 34(2), 183–191. Müllensiefen, D., Gingras, B., Musil, J., & Stewart, L. (2014). The https:// doi. org/ 10. 1525/ mp. 2016. 34.2. 183 musicality of non-musicians: an index for assessing musical Thaut, M., & Hodges, D. A. (Eds.). (2018). Oxford handbooks online. The sophistication in the general population. PLoS ONE, 9(2), Oxford handbook of music and the brain. Oxford University Press. e89642. https:// doi. org/ 10. 1371/ journ al. pone. 00896 42 Vanden Bosch der Nederlanden, C. M., Zaragoza, C., Rubio-Garcia, A., Nussenbaum, K., Scheuplein, M., Phaneuf, C. V., Evans, M. D., & Clarkson, E., & Snyder, J. S. (2020). Change detection in complex Hartley, C. A. (2020). Moving developmental research online: auditory scenes is predicted by auditory memory, pitch percep- comparing in-lab and web-based studies of model-based rein- tion, and years of musical training. Psychological Research, 84(3), forcement learning. Collabra. Psychology, 6(1), 17213. https:// 585–601. https:// doi. org/ 10. 1007/ s00426- 018- 1072-x doi. org/ 10. 1525/ colla bra. 17213 Vokey, J. R. (2016). Single-step simple ROC curve fitting via PCA. Rajan, A., Shah, A., Ingalhalikar, M., & Singh, N. C. (2021). Structural Canadian Journal of Experimental Psychology / Revue cana- connectivity predicts sequential processing differences in music dienne de psychologie expérimentale, 70(4), 301–305. https:// perception ability. European Journal of Neuroscience, 54(6), doi. org/ 10. 1037/ cep00 00095 6093–6103. https:// doi. org/ 10. 1111/ ejn. 15407 Wallentin, M., Nielsen, A. H., Friis-Olivarius, M., Vuust, C., & Vuust, Sala, G., & Gobet, F. (2020). Cognitive and academic benefits of music P. (2010). The Musical Ear Test, a new reliable test for measuring training with children: A multilevel meta-analysis. Memory & musical competence. Learning and Individual Differences, 20(3), Cognition, 48(8), 1429–1441. 188–196. https:// doi. org/ 10. 1016/j. lindif. 2010. 02. 004 Schaal, N. K., Bauer, A.-K. R., & Müllensiefen, D. (2014). Der Gold- Woods, D. L., Kishiyama, M. M., Yund, E. W., Herron, T. J., Edwards, MSI: Replikation und Validierung eines Fragebogeninstrumentes B., Poliva, O., Hink, R. F., & Reed, B. (2011). Improving digit zur Messung Musikalischer Erfahrenheit anhand einer deutschen span assessment of short-term verbal memory. Journal of Clinical Stichprobe. Musicae Scientiae, 18(4), 423–447. https:// doi. org/ and Experimental Neuropsychology, 33(1), 101–111. https:// doi. 10. 1177/ 10298 64914 541851org/ 10. 1080/ 13803 395. 2010. 493149 Shatz, I. (2017). Fast, Free, and Targeted: Reddit as a source for recruit- Zentner, M., & Gingras, B. (2019). The assessment of musical ability ing participants online. Social Science Computer Review, 35(4), and its determinants. In P. J. Rentfrow & D. J. Levitin (Eds.), 537–549. https:// doi. org/ 10. 1177/ 08944 39316 650163 Foundations in music psychology: Theory and research (pp. Smith, G. T., McCarthy, D. M., & Anderson, K. G. (2000). On the 641–683). The MIT Press. sins of short-form development. Psychological Assessment, 12(1), Zentner, M., & Strauss, H. (2017). Assessing musical ability quickly 102–111. https:// doi. org/ 10. 1037/ 1040- 3590. 12.1. 102 and objectively: Development and validation of the Short- Sun, R. R., Wang, Y., Fast, A., Dutka, C., Cadogan, K., Burton, L., PROMS and the Mini-PROMS. Annals of the New York Academy Kubay, C., & Drachenberg, D. (2021). Influence of musical back - of Sciences, 1400(1), 33–45. https://doi. or g/10. 1111/ n yas.13410 ground on surgical skills acquisition. Surgery, 170(1), 75–80. https:// doi. org/ 10. 1016/j. surg. 2021. 01. 013 Open Practices Statement The data and materials for all studies are Swaminathan, S., Kragness, H. E., & Schellenberg, E. G. (2021). The available at https:// osf. io/ au6m5/. None of the studies was preregistered. Musical Ear Test: norms and correlates from a large sample of Canadian undergraduates. Behavior Research Methods, 53(5), Publisher’s note Springer Nature remains neutral with regard to 2007–2024. https:// doi. org/ 10. 3758/ s13428- 020- 01528-8 jurisdictional claims in published maps and institutional affiliations. 1 3

Journal

Behavior Research MethodsSpringer Journals

Published: Mar 1, 2024

Keywords: Assessment; Music perception; Musicians; Musical ability; Musical aptitude; Psychometrics

There are no references for this article.