Access the full text.
Sign up today, get DeepDyve free for 14 days.
Edward Wolfe, Bradley Moulder, C. Myford (2001)
Detecting differential rater functioning over time (DRIFT) using a Rasch multi-faceted rating scale model.Journal of applied measurement, 2 3
P. Fayers (2004)
Item Response Theory for PsychologistsQuality of Life Research, 13
N. Longford (1993)
Reliability of Essay Rating and Score AdjustmentJournal of Educational Statistics, 19
C. Myford, E. Wolfe (2003)
Detecting and measuring rater effects using many-facet Rasch measurement: part I.Journal of applied measurement, 4 4
Hung-Yu Huang (2017)
Mixture IRT Model With a Higher-Order Structure for Latent TraitsEducational and Psychological Measurement, 77
P. Stephen, Brooks, Andrew, Gelman (1998)
General methods for monitoring convergence of iterative simulationsJournal of Computational and Graphical Statistics, 7
( LinacreJ. M. (1989). Many-facet Rasch measurement. MESA.)
LinacreJ. M. (1989). Many-facet Rasch measurement. MESA.LinacreJ. M. (1989). Many-facet Rasch measurement. MESA., LinacreJ. M. (1989). Many-facet Rasch measurement. MESA.
( SnyderW. (2000). An experimental investigation of syntactic satiation effects. Linguistic Inquiry, 31(3), 575–582. 10.1162/002438900554479)
SnyderW. (2000). An experimental investigation of syntactic satiation effects. Linguistic Inquiry, 31(3), 575–582. 10.1162/002438900554479SnyderW. (2000). An experimental investigation of syntactic satiation effects. Linguistic Inquiry, 31(3), 575–582. 10.1162/002438900554479, SnyderW. (2000). An experimental investigation of syntactic satiation effects. Linguistic Inquiry, 31(3), 575–582. 10.1162/002438900554479
( AnastasiA. (1979). Fields of applied psychology (2nd Ed.). McGraw Hill.)
AnastasiA. (1979). Fields of applied psychology (2nd Ed.). McGraw Hill.AnastasiA. (1979). Fields of applied psychology (2nd Ed.). McGraw Hill., AnastasiA. (1979). Fields of applied psychology (2nd Ed.). McGraw Hill.
Wen-Chung Wang, Chen-Wei Liu, Shiu-Lien Wu (2013)
The Random-Threshold Generalized Unfolding Model and Its Application of Computerized Adaptive TestingApplied Psychological Measurement, 37
G. Engelhard, Stefanie Wind (2017)
Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments
Xiaoming Xi, Pam Mollaun (2009)
HOW DO RATERS FROM INDIA PERFORM IN SCORING THE TOEFL IBT™ SPEAKING SECTION AND WHAT KIND OF TRAINING HELPS?ETS Research Report Series, 2009
( KuijpersR. E. VisserI. MolenaarD. (2021). Testing the within-state distribution in mixture models for responses and response times. Journal of Educational and Behavioral Statistics, 46(3), 348–373. 10.3102/1076998620957240)
KuijpersR. E. VisserI. MolenaarD. (2021). Testing the within-state distribution in mixture models for responses and response times. Journal of Educational and Behavioral Statistics, 46(3), 348–373. 10.3102/1076998620957240KuijpersR. E. VisserI. MolenaarD. (2021). Testing the within-state distribution in mixture models for responses and response times. Journal of Educational and Behavioral Statistics, 46(3), 348–373. 10.3102/1076998620957240, KuijpersR. E. VisserI. MolenaarD. (2021). Testing the within-state distribution in mixture models for responses and response times. Journal of Educational and Behavioral Statistics, 46(3), 348–373. 10.3102/1076998620957240
( HechtM. WeirichS. SiegleT. FreyA. (2015). Effects of design properties on parameter estimation in large-scale assessments. Educational and Psychological Measurement, 75(6), 1021–1044. 10.1177/001316441557331129795851)
HechtM. WeirichS. SiegleT. FreyA. (2015). Effects of design properties on parameter estimation in large-scale assessments. Educational and Psychological Measurement, 75(6), 1021–1044. 10.1177/001316441557331129795851HechtM. WeirichS. SiegleT. FreyA. (2015). Effects of design properties on parameter estimation in large-scale assessments. Educational and Psychological Measurement, 75(6), 1021–1044. 10.1177/001316441557331129795851, HechtM. WeirichS. SiegleT. FreyA. (2015). Effects of design properties on parameter estimation in large-scale assessments. Educational and Psychological Measurement, 75(6), 1021–1044. 10.1177/001316441557331129795851
( CummingsS. T. (1954). The clinician as judge: Judgments of adjustment from Rorschach single-card performance. Journal of Consulting Psychology, 18(4), 243–247. 10.1037/h006191913184016)
CummingsS. T. (1954). The clinician as judge: Judgments of adjustment from Rorschach single-card performance. Journal of Consulting Psychology, 18(4), 243–247. 10.1037/h006191913184016CummingsS. T. (1954). The clinician as judge: Judgments of adjustment from Rorschach single-card performance. Journal of Consulting Psychology, 18(4), 243–247. 10.1037/h006191913184016, CummingsS. T. (1954). The clinician as judge: Judgments of adjustment from Rorschach single-card performance. Journal of Consulting Psychology, 18(4), 243–247. 10.1037/h006191913184016
Issac Bejar (2012)
Rater Cognition: Implications for Validity.Educational Measurement: Issues and Practice, 31
( SinharayS. JohnsonM. S. SternH. S. (2006). Posterior predictive assessment of item response theory models. Applied Psychological Measurement, 30(4), 298–321. 10.1177/0146621605285517)
SinharayS. JohnsonM. S. SternH. S. (2006). Posterior predictive assessment of item response theory models. Applied Psychological Measurement, 30(4), 298–321. 10.1177/0146621605285517SinharayS. JohnsonM. S. SternH. S. (2006). Posterior predictive assessment of item response theory models. Applied Psychological Measurement, 30(4), 298–321. 10.1177/0146621605285517, SinharayS. JohnsonM. S. SternH. S. (2006). Posterior predictive assessment of item response theory models. Applied Psychological Measurement, 30(4), 298–321. 10.1177/0146621605285517
( EmbretsonS. E. ReiseS. P. (2000). Item response theory for psychologists. Lawrence Erlbaum Associates.)
EmbretsonS. E. ReiseS. P. (2000). Item response theory for psychologists. Lawrence Erlbaum Associates.EmbretsonS. E. ReiseS. P. (2000). Item response theory for psychologists. Lawrence Erlbaum Associates., EmbretsonS. E. ReiseS. P. (2000). Item response theory for psychologists. Lawrence Erlbaum Associates.
(1996)
Educational assessment of students (2nd ed.). Merrill
Deniz Senturk-Doganaksoy (2006)
Explanatory Item Response Models: A Generalized Linear and Nonlinear ApproachTechnometrics, 48
(2003)
2003).WinBUGS version 1.4 [computer program
E. Wolfe, A. McVay (2012)
Application of Latent Trait Models to Identifying Substantively Interesting RatersEducational Measurement: Issues and Practice, 31
Dries Debeer, R. Janssen (2012)
Modeling Item-Position Effects Within an IRT FrameworkJournal of Educational Measurement, 50
K. Hopkins, J. Stanley, B. Hopkins (1972)
Educational and Psychological Measurement and Evaluation
A. Fanchini, Janine Jongbloed, Agathe Dirani (2018)
Examining the well-being and creativity of schoolchildren in FranceCambridge Journal of Education, 49
Stefanie Wind, Wenjing Guo (2019)
Exploring the Combined Effects of Rater Misfit and Differential Rater Functioning in Performance AssessmentsEducational and Psychological Measurement, 79
( GuoW. WindS. A. (2021). An iterative parametric bootstrap approach to evaluating rater fit. Applied Psychological Measurement, 45(5), 315–330. 10.1177/0146621621101310534565938)
GuoW. WindS. A. (2021). An iterative parametric bootstrap approach to evaluating rater fit. Applied Psychological Measurement, 45(5), 315–330. 10.1177/0146621621101310534565938GuoW. WindS. A. (2021). An iterative parametric bootstrap approach to evaluating rater fit. Applied Psychological Measurement, 45(5), 315–330. 10.1177/0146621621101310534565938, GuoW. WindS. A. (2021). An iterative parametric bootstrap approach to evaluating rater fit. Applied Psychological Measurement, 45(5), 315–330. 10.1177/0146621621101310534565938
Jason Meyers, G. Miller, Walter Way (2008)
Item Position and Item Difficulty Change in an IRT-Based Common Item Equating DesignApplied Measurement in Education, 22
( HungL.-F. WangW.-C. (2012). The generalized multilevel facets model for longitudinal data. Journal of Educational and Behavioral Statistics, 37(2), 231–255. 10.3102/1076998611402503)
HungL.-F. WangW.-C. (2012). The generalized multilevel facets model for longitudinal data. Journal of Educational and Behavioral Statistics, 37(2), 231–255. 10.3102/1076998611402503HungL.-F. WangW.-C. (2012). The generalized multilevel facets model for longitudinal data. Journal of Educational and Behavioral Statistics, 37(2), 231–255. 10.3102/1076998611402503, HungL.-F. WangW.-C. (2012). The generalized multilevel facets model for longitudinal data. Journal of Educational and Behavioral Statistics, 37(2), 231–255. 10.3102/1076998611402503
Wenjing Guo, Stefanie Wind (2021)
An Iterative Parametric Bootstrap Approach to Evaluating Rater FitApplied Psychological Measurement, 45
( WesolowskiB. C. WindS. A. EngelhardG. (2015). Rater fairness in music performance assessment: Evaluating model-data fit and differential rater functioning. Musicae Scientiae, 19(2), 147–170. 10.1177/1029864915589014)
WesolowskiB. C. WindS. A. EngelhardG. (2015). Rater fairness in music performance assessment: Evaluating model-data fit and differential rater functioning. Musicae Scientiae, 19(2), 147–170. 10.1177/1029864915589014WesolowskiB. C. WindS. A. EngelhardG. (2015). Rater fairness in music performance assessment: Evaluating model-data fit and differential rater functioning. Musicae Scientiae, 19(2), 147–170. 10.1177/1029864915589014, WesolowskiB. C. WindS. A. EngelhardG. (2015). Rater fairness in music performance assessment: Evaluating model-data fit and differential rater functioning. Musicae Scientiae, 19(2), 147–170. 10.1177/1029864915589014
Cummings St (1954)
The clinician as judge: judgments of adjustment from Rorschach single-card performance.Journal of Consulting Psychology, 18
C. Myford, E. Wolfe (2004)
Detecting and measuring rater effects using many-facet Rasch measurement: Part II.Journal of applied measurement, 5 2
Renske Kuijpers, I. Visser, D. Molenaar (2020)
Testing the Within-State Distribution in Mixture Models for Responses and Response TimesJournal of Educational and Behavioral Statistics, 46
R. Sternberg (2002)
Raising the Achievement of All Students: Teaching for Successful IntelligenceEducational Psychology Review, 14
Su-Pin Hung, P. Chen, Hsueh-Chih Chen (2012)
Improving Creativity Performance Assessment: A Rater Effect Examination with Many Facet Rasch ModelCreativity Research Journal, 24
( HungS.-P. ChenP.-H. ChenH.-C. (2012) Improving creativity performance assessment: A rater effect examination with many facet Rasch model. Creativity Research Journal, 24(4), 345–357. 10.1080/10400419.2012.730331)
HungS.-P. ChenP.-H. ChenH.-C. (2012) Improving creativity performance assessment: A rater effect examination with many facet Rasch model. Creativity Research Journal, 24(4), 345–357. 10.1080/10400419.2012.730331HungS.-P. ChenP.-H. ChenH.-C. (2012) Improving creativity performance assessment: A rater effect examination with many facet Rasch model. Creativity Research Journal, 24(4), 345–357. 10.1080/10400419.2012.730331, HungS.-P. ChenP.-H. ChenH.-C. (2012) Improving creativity performance assessment: A rater effect examination with many facet Rasch model. Creativity Research Journal, 24(4), 345–357. 10.1080/10400419.2012.730331
W. Snyder (2000)
An Experimental Investigation of Syntactic Satiation EffectsLinguistic Inquiry, 31
Martin Hecht, Sebastian Weirich, Thilo Siegle, Andreas Frey (2015)
Effects of Design Properties on Parameter Estimation in Large-Scale AssessmentsEducational and Psychological Measurement, 75
( WindS. A. Sebok-SyerS. S. (2019). Examining differential rater functioning using a between-subgroup outfit approach. Journal of Educational Measurement, 56(2), 217–250. 10.1111/jedm.12198)
WindS. A. Sebok-SyerS. S. (2019). Examining differential rater functioning using a between-subgroup outfit approach. Journal of Educational Measurement, 56(2), 217–250. 10.1111/jedm.12198WindS. A. Sebok-SyerS. S. (2019). Examining differential rater functioning using a between-subgroup outfit approach. Journal of Educational Measurement, 56(2), 217–250. 10.1111/jedm.12198, WindS. A. Sebok-SyerS. S. (2019). Examining differential rater functioning using a between-subgroup outfit approach. Journal of Educational Measurement, 56(2), 217–250. 10.1111/jedm.12198
Daniel Hernández-Torrano, L. Ibrayeva (2020)
Creativity and education: A bibliometric mapping of the research literature (1975–2019)Thinking Skills and Creativity, 35
(2019)
The Cambridge Handbook of Creativity
Sharon Bailin (2002)
CREATIVITY IN CONTEXT
( MorseB. J. JohansonG. A. GriffethR. W. (2012). Using the graded response model to control spurious interactions in moderated multiple regression. Applied Psychological Measurement, 36(2), 122–146. 10.1177/0146621612438725)
MorseB. J. JohansonG. A. GriffethR. W. (2012). Using the graded response model to control spurious interactions in moderated multiple regression. Applied Psychological Measurement, 36(2), 122–146. 10.1177/0146621612438725MorseB. J. JohansonG. A. GriffethR. W. (2012). Using the graded response model to control spurious interactions in moderated multiple regression. Applied Psychological Measurement, 36(2), 122–146. 10.1177/0146621612438725, MorseB. J. JohansonG. A. GriffethR. W. (2012). Using the graded response model to control spurious interactions in moderated multiple regression. Applied Psychological Measurement, 36(2), 122–146. 10.1177/0146621612438725
( WangJ. EngelhardG. (2019). Conceptualizing rater judgments and rating processes for rater‐mediated assessments. Journal of Educational Measurement, 56(3), 582–609. 10.1111/jedm.12226)
WangJ. EngelhardG. (2019). Conceptualizing rater judgments and rating processes for rater‐mediated assessments. Journal of Educational Measurement, 56(3), 582–609. 10.1111/jedm.12226WangJ. EngelhardG. (2019). Conceptualizing rater judgments and rating processes for rater‐mediated assessments. Journal of Educational Measurement, 56(3), 582–609. 10.1111/jedm.12226, WangJ. EngelhardG. (2019). Conceptualizing rater judgments and rating processes for rater‐mediated assessments. Journal of Educational Measurement, 56(3), 582–609. 10.1111/jedm.12226
J. Kruschke (2014)
Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan
( SpiegelhalterD. J. ThomasA. BestN. G. LunnD. (2003). WinBUGS version 1.4 [computer program]. MRC Biostatistics Unit, Institute of Public Health.)
SpiegelhalterD. J. ThomasA. BestN. G. LunnD. (2003). WinBUGS version 1.4 [computer program]. MRC Biostatistics Unit, Institute of Public Health.SpiegelhalterD. J. ThomasA. BestN. G. LunnD. (2003). WinBUGS version 1.4 [computer program]. MRC Biostatistics Unit, Institute of Public Health., SpiegelhalterD. J. ThomasA. BestN. G. LunnD. (2003). WinBUGS version 1.4 [computer program]. MRC Biostatistics Unit, Institute of Public Health.
A. Repp, Gayla Nieminen, E. Olinger, Rita Brusca (1988)
Direct Observation: Factors Affecting the Accuracy of ObserversExceptional Children, 55
Hung-Yu Huang, Wen-Chung Wang, P. Chen, Chi-Ming Su (2013)
Higher-Order Item Response Models for Hierarchical Latent TraitsApplied Psychological Measurement, 37
Lai-fa Hung, Wen-Chung Wang (2012)
The Generalized Multilevel Facets Model for Longitudinal DataJournal of Educational and Behavioral Statistics, 37
Jue Wang, G. Engelhard (2019)
Conceptualizing Rater Judgments and Rating Processes for Rater‐Mediated AssessmentsJournal of Educational Measurement
( DraveN. (2011, November). Marker ‘fatigue’ and marking reliability in Hong Kong’s Language Proficiency Assessment for Teachers of English (LPATE). Paper presented at the 37th annual conference of International Association of Educational Assessment. https://www.iaea.info/documents/paper_30171b739.pdf)
DraveN. (2011, November). Marker ‘fatigue’ and marking reliability in Hong Kong’s Language Proficiency Assessment for Teachers of English (LPATE). Paper presented at the 37th annual conference of International Association of Educational Assessment. https://www.iaea.info/documents/paper_30171b739.pdfDraveN. (2011, November). Marker ‘fatigue’ and marking reliability in Hong Kong’s Language Proficiency Assessment for Teachers of English (LPATE). Paper presented at the 37th annual conference of International Association of Educational Assessment. https://www.iaea.info/documents/paper_30171b739.pdf, DraveN. (2011, November). Marker ‘fatigue’ and marking reliability in Hong Kong’s Language Proficiency Assessment for Teachers of English (LPATE). Paper presented at the 37th annual conference of International Association of Educational Assessment. https://www.iaea.info/documents/paper_30171b739.pdf
( DeCarloL. T. ZhouX. (2021). A latent class signal detection model for rater scoring with ordered perceptual distributions. Journal of Educational Measurement, 58(1), 31–53. 10.1111/jedm.12265)
DeCarloL. T. ZhouX. (2021). A latent class signal detection model for rater scoring with ordered perceptual distributions. Journal of Educational Measurement, 58(1), 31–53. 10.1111/jedm.12265DeCarloL. T. ZhouX. (2021). A latent class signal detection model for rater scoring with ordered perceptual distributions. Journal of Educational Measurement, 58(1), 31–53. 10.1111/jedm.12265, DeCarloL. T. ZhouX. (2021). A latent class signal detection model for rater scoring with ordered perceptual distributions. Journal of Educational Measurement, 58(1), 31–53. 10.1111/jedm.12265
M. Tress (2003)
Generalized, Linear, and Mixed ModelsTechnometrics, 45
( HuangH.-Y. WangW.-C. ChenP.-H. SuC.-M. (2013). Higher-order item response models for hierarchical latent traits. Applied Psychological Measurement37 (8), 619–637. 10.1177/0146621613488819)
HuangH.-Y. WangW.-C. ChenP.-H. SuC.-M. (2013). Higher-order item response models for hierarchical latent traits. Applied Psychological Measurement37 (8), 619–637. 10.1177/0146621613488819HuangH.-Y. WangW.-C. ChenP.-H. SuC.-M. (2013). Higher-order item response models for hierarchical latent traits. Applied Psychological Measurement37 (8), 619–637. 10.1177/0146621613488819, HuangH.-Y. WangW.-C. ChenP.-H. SuC.-M. (2013). Higher-order item response models for hierarchical latent traits. Applied Psychological Measurement37 (8), 619–637. 10.1177/0146621613488819
( BejarI. I. (2012). Rater cognition: Implications for validity. Educational Measurement: Issues and Practice, 31(3), 2–9. 10.1111/j.1745-3992.2012.00238.x)
BejarI. I. (2012). Rater cognition: Implications for validity. Educational Measurement: Issues and Practice, 31(3), 2–9. 10.1111/j.1745-3992.2012.00238.xBejarI. I. (2012). Rater cognition: Implications for validity. Educational Measurement: Issues and Practice, 31(3), 2–9. 10.1111/j.1745-3992.2012.00238.x, BejarI. I. (2012). Rater cognition: Implications for validity. Educational Measurement: Issues and Practice, 31(3), 2–9. 10.1111/j.1745-3992.2012.00238.x
( LingG. MollaunP. XiX. (2014). A study on the impact of fatigue on human raters when scoring speaking responses. Language Testing, 31(4), 479–499. 10.1177/0265532214530699)
LingG. MollaunP. XiX. (2014). A study on the impact of fatigue on human raters when scoring speaking responses. Language Testing, 31(4), 479–499. 10.1177/0265532214530699LingG. MollaunP. XiX. (2014). A study on the impact of fatigue on human raters when scoring speaking responses. Language Testing, 31(4), 479–499. 10.1177/0265532214530699, LingG. MollaunP. XiX. (2014). A study on the impact of fatigue on human raters when scoring speaking responses. Language Testing, 31(4), 479–499. 10.1177/0265532214530699
Wen-Chung Wang, Mark Wilson (2005)
The Rasch Testlet ModelApplied Psychological Measurement, 29
B. Hess, D. Payne (2001)
Book Review: Educational and psychological measurement and evaluation(8th ed.)Journal of Psychoeducational Assessment, 19
R. Entink, Jean-Paul Fox, W. Linden (2008)
A Multivariate Multilevel Approach to the Modeling of Accuracy and Speed of Test TakersPsychometrika, 74
Richard Patz, B. Junker, Matthew Johnson, Louis Mariano (2002)
The Hierarchical Rater Model for Rated Test Items and its Application to Large-Scale Educational Assessment DataJournal of Educational and Behavioral Statistics, 27
( LongfordN. T. (1994). Reliability of essay rating and score adjustment. Journal of Educational Statistics, 19(3), 171–200. 10.3102/10769986019003171)
LongfordN. T. (1994). Reliability of essay rating and score adjustment. Journal of Educational Statistics, 19(3), 171–200. 10.3102/10769986019003171LongfordN. T. (1994). Reliability of essay rating and score adjustment. Journal of Educational Statistics, 19(3), 171–200. 10.3102/10769986019003171, LongfordN. T. (1994). Reliability of essay rating and score adjustment. Journal of Educational Statistics, 19(3), 171–200. 10.3102/10769986019003171
Wim Linden, R. Entink, J. Fox (2010)
IRT Parameter Estimation With Response Times as Collateral InformationApplied Psychological Measurement, 34
( FischerG. H. (1995). Derivations of the Rasch model. In FischerG. MolenaarI. (Eds.), Rasch models: Foundations, recent developments, and applications (pp. 15–38). Springer-Verlag.)
FischerG. H. (1995). Derivations of the Rasch model. In FischerG. MolenaarI. (Eds.), Rasch models: Foundations, recent developments, and applications (pp. 15–38). Springer-Verlag.FischerG. H. (1995). Derivations of the Rasch model. In FischerG. MolenaarI. (Eds.), Rasch models: Foundations, recent developments, and applications (pp. 15–38). Springer-Verlag., FischerG. H. (1995). Derivations of the Rasch model. In FischerG. MolenaarI. (Eds.), Rasch models: Foundations, recent developments, and applications (pp. 15–38). Springer-Verlag.
S. Sinharay, Matthew Johnson, H. Stern (2006)
Posterior Predictive Assessment of Item Response Theory ModelsApplied Psychological Measurement, 30
( IsraelskiE. W. LenobleJ. S. (1982). Rater fatigue in job analysis surveys. Proceedings of the Human Factors Society Annual Meeting, 26(1), 35–39. 10.1177/154193128202600110)
IsraelskiE. W. LenobleJ. S. (1982). Rater fatigue in job analysis surveys. Proceedings of the Human Factors Society Annual Meeting, 26(1), 35–39. 10.1177/154193128202600110IsraelskiE. W. LenobleJ. S. (1982). Rater fatigue in job analysis surveys. Proceedings of the Human Factors Society Annual Meeting, 26(1), 35–39. 10.1177/154193128202600110, IsraelskiE. W. LenobleJ. S. (1982). Rater fatigue in job analysis surveys. Proceedings of the Human Factors Society Annual Meeting, 26(1), 35–39. 10.1177/154193128202600110
J. Kluge (2016)
Fields Of Applied Psychology
G. Fischer (1995)
Derivations of the Rasch Model
( DebeerD. JanssenR. (2013). Modeling item-position effects within an IRT framework. Journal of Educational Measurement, 50(2), 164–185. 10.1111/jedm.12009)
DebeerD. JanssenR. (2013). Modeling item-position effects within an IRT framework. Journal of Educational Measurement, 50(2), 164–185. 10.1111/jedm.12009DebeerD. JanssenR. (2013). Modeling item-position effects within an IRT framework. Journal of Educational Measurement, 50(2), 164–185. 10.1111/jedm.12009, DebeerD. JanssenR. (2013). Modeling item-position effects within an IRT framework. Journal of Educational Measurement, 50(2), 164–185. 10.1111/jedm.12009
( XiX. MollaunP. (2009). How do raters from India perform in scoring the TOEFL iBT Speaking section and what kind of training helps?(TOEFL iBT research report. No. TOEFLiBT-11). ETS.)
XiX. MollaunP. (2009). How do raters from India perform in scoring the TOEFL iBT Speaking section and what kind of training helps?(TOEFL iBT research report. No. TOEFLiBT-11). ETS.XiX. MollaunP. (2009). How do raters from India perform in scoring the TOEFL iBT Speaking section and what kind of training helps?(TOEFL iBT research report. No. TOEFLiBT-11). ETS., XiX. MollaunP. (2009). How do raters from India perform in scoring the TOEFL iBT Speaking section and what kind of training helps?(TOEFL iBT research report. No. TOEFLiBT-11). ETS.
E. Israelski, J. Lenoble (1982)
Rater Fatigue in Job Analysis SurveysProceedings of the Human Factors and Ergonomics Society Annual Meeting, 26
Sebastian Weirich, Martin Hecht, K. Böhme (2014)
Modeling Item Position Effects Using Generalized Linear Mixed ModelsApplied Psychological Measurement, 38
Wen-Chung Wang, Mark Wilson (2005)
Exploring Local Item Dependence Using a Random-Effects Facet ModelApplied Psychological Measurement, 29
( WeirichS. HechtM. BöhmeK. (2014). Modeling item position effects using generalized linear mixed models. Applied Psychological Measurement, 38(7), 535–548. 10.1177/0146621614534955)
WeirichS. HechtM. BöhmeK. (2014). Modeling item position effects using generalized linear mixed models. Applied Psychological Measurement, 38(7), 535–548. 10.1177/0146621614534955WeirichS. HechtM. BöhmeK. (2014). Modeling item position effects using generalized linear mixed models. Applied Psychological Measurement, 38(7), 535–548. 10.1177/0146621614534955, WeirichS. HechtM. BöhmeK. (2014). Modeling item position effects using generalized linear mixed models. Applied Psychological Measurement, 38(7), 535–548. 10.1177/0146621614534955
(2011)
November).Marker ‘fatigue’ and marking reliability in Hong Kong’s Language Proficiency Assessment for Teachers of English (LPATE)
Wen-Chung Wang, Chih-Yu Liu (2007)
Formulation and Application of the Generalized Multilevel Facets ModelEducational and Psychological Measurement, 67
Martin Hecht, Christian Gische, Daniel Vogel, Steffen Zitzmann (2020)
Integrating Out Nuisance Parameters for Computationally More Efficient Bayesian Estimation – An Illustration and TutorialStructural Equation Modeling: A Multidisciplinary Journal, 27
Steffen Zitzmann, Martin Hecht (2019)
Going Beyond Convergence in Bayesian Estimation: Why Precision Matters Too and How to Assess ItStructural Equation Modeling: A Multidisciplinary Journal, 26
A. Starko (1994)
Creativity in the Classroom
( ReppA. C. NieminenG. S. OlingerE. BruscaR. (1988). Direct observation: Factors affecting the accuracy of observers. Exceptional Children, 55(1), 29–36. 10.1177/001440298805500103)
ReppA. C. NieminenG. S. OlingerE. BruscaR. (1988). Direct observation: Factors affecting the accuracy of observers. Exceptional Children, 55(1), 29–36. 10.1177/001440298805500103ReppA. C. NieminenG. S. OlingerE. BruscaR. (1988). Direct observation: Factors affecting the accuracy of observers. Exceptional Children, 55(1), 29–36. 10.1177/001440298805500103, ReppA. C. NieminenG. S. OlingerE. BruscaR. (1988). Direct observation: Factors affecting the accuracy of observers. Exceptional Children, 55(1), 29–36. 10.1177/001440298805500103
Hung-Yu Huang (2020)
A Mixture IRTree Model for Performance Decline and Nonignorable Missing DataEducational and Psychological Measurement, 80
( van der LindenW. J. Klein EntinkR. H. FoxJ.-P. (2010). IRT parameter estimation with response times as collateral information. Applied Psychological Measurement, 34(5), 327–347. 10.1177/0146621609349800)
van der LindenW. J. Klein EntinkR. H. FoxJ.-P. (2010). IRT parameter estimation with response times as collateral information. Applied Psychological Measurement, 34(5), 327–347. 10.1177/0146621609349800van der LindenW. J. Klein EntinkR. H. FoxJ.-P. (2010). IRT parameter estimation with response times as collateral information. Applied Psychological Measurement, 34(5), 327–347. 10.1177/0146621609349800, van der LindenW. J. Klein EntinkR. H. FoxJ.-P. (2010). IRT parameter estimation with response times as collateral information. Applied Psychological Measurement, 34(5), 327–347. 10.1177/0146621609349800
( HopkinsK. D. (1998). Educational and psychological measurement and evaluation (8th ed.). Allyn and Bacon.)
HopkinsK. D. (1998). Educational and psychological measurement and evaluation (8th ed.). Allyn and Bacon.HopkinsK. D. (1998). Educational and psychological measurement and evaluation (8th ed.). Allyn and Bacon., HopkinsK. D. (1998). Educational and psychological measurement and evaluation (8th ed.). Allyn and Bacon.
( MeyersJ. L. MillerG. E. WayW. D. (2009). Item position and item difficulty change in an IRT-based common item equating design. Applied Measurement in Education22 (1), 38–60. 10.1080/08957340802558342)
MeyersJ. L. MillerG. E. WayW. D. (2009). Item position and item difficulty change in an IRT-based common item equating design. Applied Measurement in Education22 (1), 38–60. 10.1080/08957340802558342MeyersJ. L. MillerG. E. WayW. D. (2009). Item position and item difficulty change in an IRT-based common item equating design. Applied Measurement in Education22 (1), 38–60. 10.1080/08957340802558342, MeyersJ. L. MillerG. E. WayW. D. (2009). Item position and item difficulty change in an IRT-based common item equating design. Applied Measurement in Education22 (1), 38–60. 10.1080/08957340802558342
G. Engelhard (2012)
Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences
Guangming Ling, Pam Mollaun, Xiaoming Xi (2014)
A study on the impact of fatigue on human raters when scoring speaking responsesLanguage Testing, 31
Brian Wesolowski, Stefanie Wind, G. Engelhard (2015)
Rater fairness in music performance assessment: Evaluating model-data fit and differential rater functioningMusicae Scientiae, 19
( McCullochC. SearleS. R. (2001). Generalized, linear, and mixed models. John Wiley.)
McCullochC. SearleS. R. (2001). Generalized, linear, and mixed models. John Wiley.McCullochC. SearleS. R. (2001). Generalized, linear, and mixed models. John Wiley., McCullochC. SearleS. R. (2001). Generalized, linear, and mixed models. John Wiley.
( WangW.-C. WilsonM. (2005b). The Rasch testlet model. Applied Psychological Measurement, 29(2), 126–149. 10.1177/0146621604271053)
WangW.-C. WilsonM. (2005b). The Rasch testlet model. Applied Psychological Measurement, 29(2), 126–149. 10.1177/0146621604271053WangW.-C. WilsonM. (2005b). The Rasch testlet model. Applied Psychological Measurement, 29(2), 126–149. 10.1177/0146621604271053, WangW.-C. WilsonM. (2005b). The Rasch testlet model. Applied Psychological Measurement, 29(2), 126–149. 10.1177/0146621604271053
( Klein EntinkR. H. FoxJ.-P. van der LindenW. J. (2009). A multivariate multilevel approach to the modeling of accuracy and speed of test takers. Psychometrika, 74(1), 21–48. 10.1007/s11336-008-9075-y20037635)
Klein EntinkR. H. FoxJ.-P. van der LindenW. J. (2009). A multivariate multilevel approach to the modeling of accuracy and speed of test takers. Psychometrika, 74(1), 21–48. 10.1007/s11336-008-9075-y20037635Klein EntinkR. H. FoxJ.-P. van der LindenW. J. (2009). A multivariate multilevel approach to the modeling of accuracy and speed of test takers. Psychometrika, 74(1), 21–48. 10.1007/s11336-008-9075-y20037635, Klein EntinkR. H. FoxJ.-P. van der LindenW. J. (2009). A multivariate multilevel approach to the modeling of accuracy and speed of test takers. Psychometrika, 74(1), 21–48. 10.1007/s11336-008-9075-y20037635
By Torrance, Interpretive Manual (2012)
The Torrance Tests of Creative Thinking
( De BoeckP. WilsonM. (Eds.). (2004). Explanatory item response models: A generalized linear and nonlinear approach. Springer.)
De BoeckP. WilsonM. (Eds.). (2004). Explanatory item response models: A generalized linear and nonlinear approach. Springer.De BoeckP. WilsonM. (Eds.). (2004). Explanatory item response models: A generalized linear and nonlinear approach. Springer., De BoeckP. WilsonM. (Eds.). (2004). Explanatory item response models: A generalized linear and nonlinear approach. Springer.
( PatzR. J. JunkerB. W. JohnsonM. S. MarianoL. T. (2002). The hierarchical rater model for rated test items and its application to large-scale educational assessment data. Journal of Educational and Behavioral Statistics, 27(4), 341–384. 10.3102/10769986027004341)
PatzR. J. JunkerB. W. JohnsonM. S. MarianoL. T. (2002). The hierarchical rater model for rated test items and its application to large-scale educational assessment data. Journal of Educational and Behavioral Statistics, 27(4), 341–384. 10.3102/10769986027004341PatzR. J. JunkerB. W. JohnsonM. S. MarianoL. T. (2002). The hierarchical rater model for rated test items and its application to large-scale educational assessment data. Journal of Educational and Behavioral Statistics, 27(4), 341–384. 10.3102/10769986027004341, PatzR. J. JunkerB. W. JohnsonM. S. MarianoL. T. (2002). The hierarchical rater model for rated test items and its application to large-scale educational assessment data. Journal of Educational and Behavioral Statistics, 27(4), 341–384. 10.3102/10769986027004341
A. Gelman, X. Meng, H. Stern (1996)
POSTERIOR PREDICTIVE ASSESSMENT OF MODEL FITNESS VIA REALIZED DISCREPANCIES
( WolfeE. W. MoulderB. MyfordC. (2001). Detecting differential rater functioning over time (DRIFT) using a Rasch multi-faceted rating scale model. Journal of Applied Measurement, 2(3), 256–280.12011510)
WolfeE. W. MoulderB. MyfordC. (2001). Detecting differential rater functioning over time (DRIFT) using a Rasch multi-faceted rating scale model. Journal of Applied Measurement, 2(3), 256–280.12011510WolfeE. W. MoulderB. MyfordC. (2001). Detecting differential rater functioning over time (DRIFT) using a Rasch multi-faceted rating scale model. Journal of Applied Measurement, 2(3), 256–280.12011510, WolfeE. W. MoulderB. MyfordC. (2001). Detecting differential rater functioning over time (DRIFT) using a Rasch multi-faceted rating scale model. Journal of Applied Measurement, 2(3), 256–280.12011510
T. Eckes (2019)
Many-facet Rasch measurementQuantitative Data Analysis for Language Assessment Volume I
Stefanie Wind, S. Sebok-Syer (2019)
Examining Differential Rater Functioning Using a Between‐Subgroup Outfit ApproachJournal of Educational Measurement
( JinK.-Y. WangW.-C. (2018). A new facets model for rater’s centrality/extremity response style. Journal of Educational Measurement, 55(4), 543–563. 10.1111/jedm.12191)
JinK.-Y. WangW.-C. (2018). A new facets model for rater’s centrality/extremity response style. Journal of Educational Measurement, 55(4), 543–563. 10.1111/jedm.12191JinK.-Y. WangW.-C. (2018). A new facets model for rater’s centrality/extremity response style. Journal of Educational Measurement, 55(4), 543–563. 10.1111/jedm.12191, JinK.-Y. WangW.-C. (2018). A new facets model for rater’s centrality/extremity response style. Journal of Educational Measurement, 55(4), 543–563. 10.1111/jedm.12191
K. Jin, Wen Wang (2018)
A New Facets Model for Rater's Centrality/Extremity Response StyleJournal of Educational Measurement
Steffen Zitzmann, Sebastian Weirich, Martin Hecht (2021)
Using the Effective Sample Size as the Stopping Criterion in Markov Chain Monte Carlo with the Bayes Module in MplusPsych
( MyfordC. M. WolfeE. W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46(4), 371–389. 10.1111/j.1745-3984.2009.00088.x)
MyfordC. M. WolfeE. W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46(4), 371–389. 10.1111/j.1745-3984.2009.00088.xMyfordC. M. WolfeE. W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46(4), 371–389. 10.1111/j.1745-3984.2009.00088.x, MyfordC. M. WolfeE. W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46(4), 371–389. 10.1111/j.1745-3984.2009.00088.x
Brendan Morse, G. Johanson, R. Griffeth (2012)
Using the Graded Response Model to Control Spurious Interactions in Moderated Multiple RegressionApplied Psychological Measurement, 36
( NitkoA. J. (1996). Educational assessment of students (2nd ed.). Merrill.)
NitkoA. J. (1996). Educational assessment of students (2nd ed.). Merrill.NitkoA. J. (1996). Educational assessment of students (2nd ed.). Merrill., NitkoA. J. (1996). Educational assessment of students (2nd ed.). Merrill.
( EngelhardG. WindS. A. (2018). Invariant measurement with raters and rating scales: Rasch models for rater-mediated assessments. Routledge/Taylor & Francis Group.)
EngelhardG. WindS. A. (2018). Invariant measurement with raters and rating scales: Rasch models for rater-mediated assessments. Routledge/Taylor & Francis Group.EngelhardG. WindS. A. (2018). Invariant measurement with raters and rating scales: Rasch models for rater-mediated assessments. Routledge/Taylor & Francis Group., EngelhardG. WindS. A. (2018). Invariant measurement with raters and rating scales: Rasch models for rater-mediated assessments. Routledge/Taylor & Francis Group.
A. Gelman, J. Carlin, H. Stern, D. Dunson, Aki Vehtari, D. Rubin (2013)
Bayesian data analysis, third edition
L. DeCarlo, Xiaoliang Zhou (2020)
A Latent Class Signal Detection Model for Rater Scoring with Ordered Perceptual DistributionsJournal of Educational Measurement
G. Engelhard (1994)
Examining Rater Errors in the Assessment of Written Composition With a Many-Faceted Rasch ModelJournal of Educational Measurement, 31
P. Boeck, Sun-Joo Cho, M. Wilson (2004)
Explanatory Item Response Models
C. Myford, E. Wolfe (2009)
Monitoring Rater Performance Over Time: A Framework for Detecting Differential Accuracy and Differential Scale Category UseJournal of Educational Measurement, 46
Stefanie Wind, Yuan Ge (2021)
Detecting Rater Biases in Sparse Rater-Mediated Assessment NetworksEducational and Psychological Measurement, 81
K. Jin, Wen Wang (2014)
Item Response Theory Models for Performance Decline during Testing.Journal of Educational Measurement, 51
Rater effects are commonly observed in rater-mediated assessments. By using item response theory (IRT) modeling, raters can be treated as independent factors that function as instruments for measuring ratees. Most rater effects are static and can be addressed appropriately within an IRT framework, and a few models have been developed for dynamic rater effects. Operational rating projects often require human raters to continuously and repeatedly score ratees over a certain period, imposing a burden on the cognitive processing abilities and attention spans of raters that stems from judgment fatigue and thus affects the rating quality observed during the rating period. As a result, ratees’ scores may be influenced by the order in which they are graded by raters in a rating sequence, and the rating order effect should be considered in new IRT models. In this study, two types of many-faceted (MF)-IRT models are developed to account for such dynamic rater effects, which assume that rater severity can drift systematically or stochastically. The results obtained from two simulation studies indicate that the parameters of the newly developed models can be estimated satisfactorily using Bayesian estimation and that disregarding the rating order effect produces biased model structure and ratee proficiency parameter estimations. A creativity assessment is outlined to demonstrate the application of the new models and to investigate the consequences of failing to detect the possible rating order effect in a real rater-mediated evaluation.
Applied Psychological Measurement – SAGE
Published: Jun 1, 2023
Keywords: item response theory; rater effects; rating ordering; rater-mediated assessments
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.