Get 20M+ Full-Text Papers For Less Than $1.50/day. Subscribe now for You or Your Team.

Learn More →

The Expanded Evidence-Centered Design (e-ECD) for Learning and Assessment Systems: A Framework for Incorporating Learning Goals and Processes Within Assessment Design

The Expanded Evidence-Centered Design (e-ECD) for Learning and Assessment Systems: A Framework... ORIGINAL RESEARCH published: 26 April 2019 doi: 10.3389/fpsyg.2019.00853 The Expanded Evidence-Centered Design (e-ECD) for Learning and Assessment Systems: A Framework for Incorporating Learning Goals and Processes Within Assessment Design 1,2 3 3 1 Meirav Arieli-Attali *, Sue Ward , Jay Thomas , Benjamin Deonovic and Alina A. von Davier 1 2 ACTNext, ACT Inc., Iowa City, IA, United States, Fordham University, New York City, NY, United States, Edited by: ACT Inc., Iowa City, IA, United States Frank Goldhammer, German Institute for International Evidence-centered design (ECD) is a framework for the design and development of Educational Research (LG), Germany assessments that ensures consideration and collection of validity evidence from the onset Reviewed by: Russell G. Almond, of the test design. Blending learning and assessment requires integrating aspects of Florida State University, United States learning at the same level of rigor as aspects of testing. In this paper, we describe an Gabriels Nagy, Christian-Albrechts-Universität expansion to the ECD framework (termed e-ECD) such that it includes the specifications zu Kiel, Germany of the relevant aspects of learning at each of the three core models in the ECD, as well *Correspondence: as making room for specifying the relationship between learning and assessment within Meirav Arieli-Attali the system. The framework proposed here does not assume a specific learning theory meirav.attali@act.org or particular learning goals, rather it allows for their inclusion within an assessment Specialty section: framework, such that they can be articulated by researchers or assessment developers This article was submitted to that wish to focus on learning. Quantitative Psychology and Measurement, Keywords: task design, technology-based assessment, blended assessment and learning, development framework, a section of the journal Evidence model Frontiers in Psychology Received: 10 December 2018 Accepted: 01 April 2019 INTRODUCTION Published: 26 April 2019 Citation: There is a growing need for the development of assessments that are connected and relevant Arieli-Attali M, Ward S, Thomas J, to learning and teaching, and several attempts have been made in recent years to focus on Deonovic B and von Davier AA this topic in conferences and journals. For example, Mark Wilson’s 2016 June and September (2019) The Expanded Evidence-Centered Design (e-ECD) presidential messages in the National Council for Measurement in Education’s newsletter for Learning and Assessment addressed Classroom Assessment, and this topic was also the conference theme for the Systems: A Framework for following 2  years, 2017 and 2018. The journal Assessment in Education: Principles, Policy & Incorporating Learning Goals and Practice recently devoted a special issue on the link between assessment and learning (volume Processes Within 24, issue 3, 2017). The issue focused on the developments in the two disciplines which, Assessment Design. despite mutual influences, have taken distinctly separate paths over time. In recent years, Front. Psychol. 10:853. doi: 10.3389/fpsyg.2019.00853 systems that blend learning and assessment have been proposed all over the world Frontiers in Psychology | www.frontiersin.org 1 April 2019 | Volume 10 | Article 853 Arieli-Attali et al. Expanded Evidence-Centered Design (e.g., Razzaq et  al., 2005; Shute et al., 2008; Feng et  al., content, provides a new and significant shift in the modern 2009b; Attali and Arieli-Attali, 2014; Straatemeier, 2014). development of leaning and assessment systems. While within the educational measurement field, there are We chose to expand the well-known evidence-centered established standards and frameworks for the development design framework (ECD; Mislevy et  al., 1999, 2003, 2006). of reliable and valid assessments, those rarely take learning The ECD formulates the process of test development to ensure aspects into account. As part of our own effort to develop consideration and collection of validity evidence from the a blended learning and assessment system, we  identified a onset of the test design. The ECD is built on the premise need for a formal framework of development that includes that a test is a measurement instrument with which specific aspects of learning at the same level of detail and rigor as claims about the test scores are associated, and that a good aspects of testing. This paper describes our general approach test is a good match of the test items and the test takers’ at expanding an assessment framework, with some examples skills. The ECD framework defines several interconnected from our system to better illustrate the abstract concepts. models, three of which form the core of the framework and Our approach at expanding a principled assessment design are relevant to our discussion: the Student model(s), Evidence is primarily concerned with the inclusion of three dimensions: model(s), and Task model(s) (the combination of the three aspects of learning, such as the ability to incorporate the change models is also called the Conceptual Assessment Framework; over time in the skills to be  measured at the conceptual level; CAF; see Figure 1). Note that in more recent publications aspects of interactive and digital instructional content, such as of the ECD, the Student model is termed a Proficiency model simulations, games, practice items, feedback, scao ff lds, videos, (e.g., Almond et  al., 2015). and their associated ao ff rdances for the data collection in rich e S Th tudent or the Proficiency model(s) specifies the logfiles; and measurement models for learning that synthesize knowledge, skills, and ability (KSA; which are latent the complexities of the digital instruction and data features. competencies) that are the target of the test. This model can e exp Th anded framework proposed here allows for the design be  as simple as defining one skill (e.g., the ability θ) or a of systems for learning that are principled, valid, and focused map of interconnected subskills (e.g., fractions addition, on the learner. Systems designed in this framework are subtractions, multiplication, and division are interconnected intrinsically connected with the assessment of the skills over subskills that form the map of knowing fractions). The latent the time of instruction, as well as at the end, as summative competencies that are articulated and defined in this model tests, if so desired. This type of systems has an embedded establish the conceptual basis of the system, and they are oen ft efficacy structure, so that additional tests can be  incorporated based on a theory or previous findings related to the goal of within. Learning and assessment developers, as well as researchers, the assessment. can benefit from such a framework, as it requires articulating Since we  cannot tap directly into the latent competencies, both the assessment and learning intended goals at the start we  need to design tasks/test items such that they will elicit of the development process, and it then guides the process behaviors that can reflect on or indicate about the latent to ensure validity of the end-product. The framework proposed competencies. This is the role of the Task model(s). The Task here does not assume a specific learning theory or particular model specifies the tasks features that are supposed to elicit learning goals, rather it allows for their inclusion within the the observables, and only them, such that to allow inferences assessment framework. The measurement perspective, combined about the latent competencies. For example, if the assessment with the learning sciences perspective in the development of is intended to measure “knowledge of operating with fractions,” FIGURE 1 | The core models within the ECD framework (from Mislevy Almond & Lucas, © 2003 Educational Testing Service; used with permission); note that later versions term the Student model as Proficiency model. Frontiers in Psychology | www.frontiersin.org 2 April 2019 | Volume 10 | Article 853 Arieli-Attali et al. Expanded Evidence-Centered Design the tasks should be designed with care such that reading ability address the challenges of simulation- and game-based assessment is not an obstacle to perform well on the task and express (Rupp et  al., 2010a; Mislevy, 2013; Kim et  al., 2016). one’s fractions knowledge. e E Th vidence models then make the connection between the latent competencies [specified by the Student/Proficiency MOTIVATION FOR A PRINCIPLED model(s)] and the observables [behaviors elicited by the Task APPROACH TO THE DESIGN AND model(s)]. In other words, the Evidence models are the connecting link. e Th Evidence models include the measurement model, DEVELOPMENT OF A LEARNING AND comprised of the rubrics, the scoring method, and the statistical ASSESSMENT SYSTEM method for obtaining a total score(s). See Figure  1 for a diagram of the ECD and specifically the three CAF models Learning and assessment, although both relate to the process (note that latent competencies are symbolized as circles, while of determining whether or not a student has a particular observables as squares; and the connection between the circles knowledge, skill, or ability (KSA), differ substantially in the and squares are shown in the Evidence models). way they treat KSAs. The main difference between an assessment Two important additional models are the Assembly model tool and a learning tool is in the assumption about the focal and the Presentation model (see Figure 1). The Assembly KSA, whether it is fixed or dynamic at the time of interacting model defines how the three models in the CAF (the Student/ with the tool. The Student/Proficiency model in the ECD Proficiency, Task, and Evidence models) work together and describes a map of competencies (KSAs), and as in most specifically determines the conditions for reliability and validity psychometric models for testing, the assumption is of a latent of the system. As part of the Assembly model, the developers trait, which is “fixed” at the time of taking the test. The purpose determine the number of items/tasks and their mix of an assessment is thus to “detect” or “diagnose” that fixed (“constraints”) such they provide the necessary evidence and latent KSA at a certain point in time, similar to any measurement are balanced to properly reflect the breadth and diversity of tool (e.g., a scale measuring a person’s weight at a particular the domain being assessed. The Presentation models are point in time). On the other hand, the main purpose of a concerned with different ways to present the assessment, learning tool, such as a computer tutoring system, is to “move” whether it is a paper-and-pencil test, a computer-based test, the learner from one state of knowledge to another – that is, a hands-on activity, etc. We  will elaborate on and delve the concern is first and foremost with the change in KSAs deeper into each of the models as part of the expansion over time, or the transition. Of course, an assessment tool description below; for more details on the original ECD, per se cannot drive the desired change unless deliberate efforts see Mislevy et  al. (2003, 2006). are implemented in the design of the system (similar to a There are other alternatives frameworks for the design and scale which will not help with weight loss unless other actions development of assessment that follow a principled approach, are taken). Thus, systems that aim at blending assessment and such as the Cognitive Design System (Embretson, 1998), the learning cannot implement ECD as is, since ECD is inherently Assessment Engineering framework (Luecht, 2013), the a framework to develop assessments and not learning. Principled Design for Efficacy framework ( Nichols et al., 2015), Moreover, the availability of rich data collected via technology- or the Principled Assessment Design framework (Nichols enhanced learning and assessment systems (e.g., trial and error et al., 2016). These frameworks may be perceived as alternatives as part of the learning process, hint usage) poses challenges, to the ECD, and one might find any of them as a candidate as well as promises, for assessment design and the decision for a similar expansion the way we  demonstrate executing process of which actions to allow and what to record, either for the ECD in this paper. The reason there were several to promote efficient learning or to enable the reliable assessment assessment frameworks developed over the years stem from of the learning in order to make valid inferences about KSAs. the need to ensure validity of assessment tools. Although Computational Psychometrics (von Davier, 2017), an emerging traditional assessments were developed for about half a century discipline, blends theory-based methods and data-driven without a principled approach (i.e., by following an assessment algorithms (e.g., data mining and machine learning) for measuring manual and specifications) and validity was verified after latent KSAs. Computational Psychometrics is a framework for development, the advantage of following a principled framework analyzing large and oen un ft structured data, collected during such as the ECD or others is particularly evident when the the learning or performance process, on a theoretical learning goal is to assess complex competencies (e.g., problem solving, and psychometric basis. We  also combine aspects of reasoning, collaborative work) and/or when using complex Computational Psychometrics in our expanded design framework, performance tasks (e.g., multidimensional tasks such as similar to previous accounts that integrated data mining into performance assessment, simulations or games on computer ECD (e.g., Mislevy et  al., 2012; Ventura and Shute, 2013). or otherwise). In these cases, it is important to explicitly Combining data-driven algorithms into ECD allows knowledge identify the relevant competencies and behaviors and how discovery and models’ update from data, thereby informing they are connected, because the complexity of the focal the theory-based Student/Proficiency model and enriching the competencies and/or the rich data that the tasks provide might Evidence model. pose difficulties in making inferences from behaviors to Attempts to develop innovative assessments within games competencies. ECD has been also successfully applied to or as part of complex skills assessment and learning also Frontiers in Psychology | www.frontiersin.org 3 April 2019 | Volume 10 | Article 853 Arieli-Attali et al. Expanded Evidence-Centered Design brought about variations or expansions to ECD (e.g., Feng proportion of KSAs not taught (scaoff lded)]. In sum, Feng et  al., 2009a; Conrad et  al., 2014; Grover et  al., 2017). One and his colleagues demonstrated how an existing learning characteristic of ECD variants focuses on the task and its product can be  analyzed (and potentially improved) using connection to the Evidence model. Since game-play and the an ECDL framework. rich data from complex assessments oen r ft esult in sequences Common to the various adaptations of ECD is that they of actions, not all of which are relevant to the target competencies, were task driven. First came the tasks; then came the ECD researchers may follow an ECD approach with expansion with analysis, which resulted in adapting the ECD to address the respect to the action-data, to specify which actions are relevant complexity and intuition that were built into the tasks, expressed and should be  included in the Evidence model and in what as an expansion on one of the three models in the CAF. way (i.e., expansion on the scoring rules or both scoring and While in the first two examples of Conrad et  al. (2014) and Task model). Such an attempt was done by Grover et al. (2017). Grover et  al. (2017), the revised ECD focused on how to Grover and her colleagues expanded on the scoring rules by encode the task data to feed into the Evidence model, Feng employing data driven techniques (e.g., clustering, pattern et  al.’s (2009a) study goes further, suggesting a pedagogical recognition) in addition to theory-based hypotheses, to guide model that is feeding and being fed by all three CAF models  – the definition of the scoring rules. Another interesting variation Proficiency, Evidence, and Task. However, this pedagogical is the experiment-centered design by Conrad et  al. (2014), model seems somewhat like a “black box” that retroactively which illustrated an expansion on the scoring and the Task includes the intuitions that specified the product design (e.g., model. This approach uses an ECD-like process to simultaneously how hints and scaoff lds were determined). Additionally, it encode actions of players in one way for game design and neither specifies the nature of the connections with the original another way for assessment design. Because the game design ECD models nor does it inform how to design a learning dictates feedback on actions, and subsequent game options product from scratch (i.e., a principled approach to development). may depend on student’s actions, the game designer needs to We oer a co ff mprehensive expansion of the ECD framework, encode the actions differently than a researcher or an assessment such that learning aspects are specified for each of the three designer, who is primarily interested in estimating whether a models in the CAF and are determined a priori to the system student possesses the focal skill. In this procedure, the model design. We  describe the expanded full CAF first, followed by is first postulated around the task (experiment), and then a focus on each expanded model with examples. We  then applied separately as two models (versions), one for the game discuss the Assembly model, which allows for the specification designer, and one for the researcher, each focused on a different of the relationship between assessment and learning. We conclude encoding of student actions. However, there is only one Evidence with ramifications of the expanded framework for the model for inferring KSAs, derived from the researcher’s version development of adaptive systems. We include examples to better of the task encoding (the assessment variant scoring rule). In illustrate the general ideas, along with directions for alternative this way, the adaptation of the ECD allowed adding the decisions, to emphasis the generalizability of the assessment as a “layer” on top of the game design (stealth expanded framework. assessment), while ensuring coordination between these two layers. Work by Feng et  al. (2009a) is particularly relevant in this THE EXPANDED ECD MODEL context. The authors examine an adaptation of the ECD for learning data (ECDL), applied retroactively to the ASSISTments In our expanded ECD framework (e-ECD), we find it necessary data (Heffernan and Heffernan, 2014 ). The ECDL is an ECD to expand on all three Student/Proficiency, Evidence, and Task with an augmented pedagogical model, which has links to all models. We  do so by adding a learning layer, in parallel to three models of the CAF (Proficiency, Evidence, and Task). the assessment layer. This learning layer can be  viewed as a The pedagogical model refers to the learning and learners’ breakdown of a pedagogical model (Feng et  al., 2009a) to characteristics, including learning effectiveness and efficiency three components, the conceptual (student/proficiency), (e.g., reducing cognitive load, increasing difficulty gradually behavioral (task), and statistical (evidence) components. u Th s, during presentation, adapting the presentation of content, and each original ECD model now has an additional paired learning decomposing multistep problems to sub-steps), as well as model, culminating in six models. We  call each assessment- learner engagement factors. Since ASSISTments was initially learning pair an expanded model (e-model), i.e., the e-Proficiency developed without ECD in mind, the analysis retroactively model, the e-Task model, and the e-Evidence model (see checks which claims can support a validity argument that Figure  2). Note that we  refer to the original Proficiency model an item with its hints and scao ff lds serves the learning goal. as the KSA model (Knowledge, Skills, and Ability), which is This is done by identifying (within each item) the KSAs now part of the e-Proficiency model. required to answer it correctly, tagging each as “focal” or Within each e-model, we  denote an “observational” layer for “unfocal.” The focal KSAs are the ones which the hints/scao ff lds the assessment aspect (these are the original ECD models with should address. The relation between the focal and unfocal slight title change; the KSA model, Task model, and Observational- also serves as an indication of the system’s efficacy [a system Evidence model) and a “transitional” layer for the learning aspect with a high proportion of unfocal KSAs is less efficient than (these are the new models that address learning). The three a system with a low proportion, because this reflects the new learning models include the following: (1) at the conceptual Frontiers in Psychology | www.frontiersin.org 4 April 2019 | Volume 10 | Article 853 Arieli-Attali et al. Expanded Evidence-Centered Design FIGURE 2 | Expanded ECD (e-ECD) for learning and assessment systems. latent level and part of the e-Proficiency model – the transitional learning and assessment system, the approach is different. layer specifies learning processes as the latent competency that Detecting a “weakness” in knowledge is a trigger to foster learning. the system targets. We  denote it as the KSA-change model; How should the system aim at facilitating learning? There are (2) at the behavioral level and part of the e-Task model – the several different options, from providing customized feedback transitional layer specifies principles and features of learning and hints on how to answer that specific item, presenting support that guides the design of tasks (customized feedback, scaoff lds for the steps required or eliciting prior knowledge scao ff lds, hints, solved examples, solution, or guidance to digital that is needed to answer that item, addressing specific instructional content such as animation, simulation, games, and misconceptions that are known to be  prevalent for that specific videos). We  denote it as the Task-support model; and (3) at node of KSA, up to re-teaching the topic and showing worked the statistical level and part of the e-Evidence model – the examples, and/or presenting similar items to practice the skill. transitional layer specifies the links between the learner’s support In many learning products today, this process of defining the usage and the target learning processes, to allow inferring from learning options is conducted using content experts according behaviors to latent learning (e.g., the efficiency of the support to implicit or explicit learning goals. Using a principled approach used in achieving learning). The data could be  large process to development will dictate that the definition of the options data and may reveal behavior patterns that were not identified for learning should be  explicitly articulated at the level of the by the human expert in the original e-Proficiency model. In Task-support model, and these features are to be  in line with this framework, the e-Proficiency model and the e-Evidence the explicit conceptual learning/pedagogical model that describes model are supposed to “learn” in real time (be updated) with how to make that shift in knowledge, i.e., the KSA-change the new knowledge inferred from the data. We  denote it as model. The links between the supports and the conceptual the Transitional-Evidence model. KSA-change are defined in the Transitional-Evidence model We include also an expansion on the Assembly model, via statistical models, which provide the validity learning denoted e-Assembly model. In addition to determining the argument for the system. number and mix of tasks, the e-Assembly model also includes In the development of an assessment system that blends the specification about the relationship between the assessment learning, we  wish to help students learn, and to validate the component and the learning component of the system and claim that learning occurred, or that the system indeed helped determines how they all work together. In other words, the with the learning as intended. e Th KSA-change specifies the assembly model determines the “structure” of the system, e.g., type of changes (learning/transitions) the system is targeting, when and how learning materials appear and when and how and based on that, the tasks and the task supports are defined. assessment materials appear, and the rules for switching between In other words, the first step is to define the “learning shifts” the two. or how to “move” in the KSA model from one level/node to Consider the following situation: a student is using a system the next. The next step is to define the observables that need for learning and assessment to learn and practice scientific to be  elicited and the connections between the learning shifts reasoning skills. At some point, the student gets an item wrong. and the observables. We  elaborate on each of the expanded In a regular assessment system, another item will follow (oen ft models below. without any feedback about the correctness of the response)  – Our expanded framework shows how to incorporate a learning and if the system is an adaptive testing system, the student theory or learning principles into the ECD and can be  applied will receive an easier item, but not necessarily with the same using different learning approaches. We  illustrate this process content as the item with the incorrect response. In a blended by using examples from Knowledge-Learning-Instruction Frontiers in Psychology | www.frontiersin.org 5 April 2019 | Volume 10 | Article 853 Arieli-Attali et al. Expanded Evidence-Centered Design (Koedinger et  al., 2012) among others, but this process can Specifying and explicitly articulating the latent learning be  applied using other learning approaches (and we  provide processes and progressions that are the target of the learning some directions). is a crucial step, since this is what will guide the specification of both the e-Task model and the e-Evidence model. In the Expanded Proficiency Model following sections, we  elaborate and illustrate the KSA and In the ECD framework, the Student/Proficiency model defines KSA-change models that constitute the e-Proficiency Model. the Knowledge, Skills, and Ability (KSA) that the assessment is targeting. Although in early publications of the ECD, it is The Assessment Layer of the e-Proficiency called a Student model, in recent contexts, it is called a Model – The KSA Model “Proficiency model” (e.g., Feng et al., 2009a; Almond et al., 2015), A KSA model includes variables that are the features or attributes or referred to as a “Competency model” (e.g., Arieli-Attali and of competence that the assessment is targeting. The number Cayton-Hodges, 2014; Kim et  al., 2016), and it can also of variables and their grain size are determined by the potential be  perceived as a “Construct map” (Wilson, 2009). A similar use of the assessment, and it can range from 1 (e.g., the θ notion in the field of Intelligence Tutoring Systems is a “Domain in college admission tests such as the GRE, SAT, and ACT) model” (Quintana et al., 2000), a “Knowledge model” (Koedinger to several subskills arranged in a map or a net (e.g., a net et  al., 2012; Pelánek, 2017), or a “Cognitive model” (Anderson example, see Mislevy et  al., 1999; a math competency map, et  al., 1995). In the Intelligence Tutoring Systems’ literature, see Arieli-Attali and Cayton-Hodges, 2014; two versions of a the term “Student model” is reserved to a specific map of game-based physics competency model, see Kim et  al., 2016). skills as estimated for a particular student – which is an overlay es Th e variables can be  derived by conducting a cognitive task on the domain model (aka the expert model). Within ECD, analysis of the skill by experts, analyzing the content domain, the Student/Proficiency model includes both the desired skills or relying on a theory of knowledge and research findings. (that an expert would possess) and the updated level of skills e Th variables and their interconnections create a map in which for each particular student following responses on assessment each variable is a node connected by a link with other nodes items. To avoid confusion, within our expanded ECD, we  refer (variables). Following analysis of data from student responses to it by the general name of a KSA model. (and using the statistical models), values on these variables The KSAs are assumed to be  latent, and the goal of the define the level of mastery or the probability that a particular assessment is to infer about them from examinee’s responses student possess those particular sub-skills (nodes), i.e., a value to test items. When the assessment tool is also intended to will be  attached to each node. facilitate learning (i.e., the system provides supports when the As part of our development of a learning and assessment student does not know the correct answer), the assumption system, called the Holistic Educational Resources & Assessment is that the student’s level of KSA is changing (presumably (HERA) system for scientific thinking skills, we  developed a becoming higher as a result of learning). In the e-ECD, KSA model for data interpretation skill. Figure 3 depicts part we define a “KSA-change model” that together with the original of the model. Specifically, we  distinguish three main skills of KSA model creates the expanded-Proficiency model data interpretation depending on the data representation (Table (e-Proficiency model). The KSA-change model specifies the Reading, Graph Reading, and the skill of interpreting data from latent learning processes that need to occur in order to achieve both tables and graphs), and each skill is then divided to several specific nodes in the KSA model. Each node in the KSA subskills. For example, in Table Reading skill, we  distinguish model should have a corresponding learning-model in the between locating data points, manipulating data, identifying trend, KSA-change model, which may include prerequisite knowledge and interpolation and extrapolation. Note that these same subskills and misconceptions, and/or a progression of skills leading up (albeit in a different order) appear also under Graph Reading to that KSA node, with the pedagogical knowledge of how skill, but they entail different cognitive ability. The skill of to make the required knowledge-shift. Some examples of Tables and Graphs includes comparing, combining, and translating learning models are learning progressions (National Research information from two or more different representations. Council (NRC), 2007; e.g., Arieli-Attali et al., 2012) a Dynamic Although KSA models oen s ft pecify the links between nodes, Learning Map (Kingston et  al., 2017), or learning models and may even order the skills in a semi-progression (from based on the body of work on Pedagogical Content Knowledge basic to more sophisticated skills) as in the example of the (Posner et  al., 1982; Koehler and Mishra, 2009; Furtak et  al., HERA model in Figure 3, a knowledge model oen do ft es not 2012). The importance of Pedagogical Content Knowledge is specify how to move from one node to the next, nor does it in considering the interactions of content information, pedagogy, explicitly define learning processes. To that end we  add the and learning theory. Another approach from the learning learning layer in the e-Proficiency model – the KSA-change model. sciences and artificial intelligence is the Knowledge-Learning- Instruction framework (KLI; Koedinger et  al., 2012), which The Learning Layer in the e-Proficiency provides a taxonomy to connect knowledge components, learning processes, and teaching options. We  will illustrate our Model – The KSA-Change Model KSA-change model specification using the KLI framework, Defining a learning layer within the e-Proficiency model makes but we  will define the e-Proficiency model in a general way room for explicit articulation of the learning processes targeted such that any other learning theory can be  applied instead. by the learning and assessment system. The idea is for these Frontiers in Psychology | www.frontiersin.org 6 April 2019 | Volume 10 | Article 853 Arieli-Attali et al. Expanded Evidence-Centered Design FIGURE 3 | The KSA model for the HERA system for scientific reasoning skills. specifications to be  the result of purposeful planning, rather skill (node). For example, Kingston and his colleagues (Kingston than a coincidental outcome of system creation. In the Intelligence et  al., 2017) developed Dynamic Learning Maps in which each Tutoring literature, developers consider what they call the of the target competencies are preceded with three levels of “Learner model” (Pelánek, 2017) or the “Educational model” precursor pieces of knowledge (initial precursor, distal precursor, (Quintana et al., 2000) or more generally, processes for knowledge and proximal precursor) and succeeded by a successor piece acquisition (Koedinger et  al., 2012). This model can also of knowledge, together creating what they called “Linkage be  viewed as the “pedagogical model” and apply principles of levels.” When defining the sequence of precursors attention Pedagogical Content Knowledge (Koehler and Mishra, 2009; should be  given to the grain size, as well as to specific features Furtak et al., 2012). We call this model the “KSA-change Model” or attributes of these precursors. In KLI terminology (Koedinger for generalizability and to keep the connection with the original et  al., 2012), this would mean to characterize the Knowledge KSA model, with the emphasis on the change in KSA. Using Components of the subskills. Some Knowledge Components the title “change” makes room also for negative change (aka are: fact, association, category, concept, rule, principle, plan, “forgetting”), which albeit not desirable, is possible. schema, model, production; and whether it is verbal or A KSA-change model is the place to incorporate the specific non-verbal, declarative or procedural; or integrative knowledge learning theory or learning principles (or goals) that are at (2) the second step is to characterize the learning sequence the basis of the systems. Similar to the way a KSA map is by which kind of learning process is required to achieve the created, the KSA-change map should specify the learning aspects learning. For example, applying the KLI taxonomy (Koedinger of the particular skills. Here we  provide a general outline for et  al., 2012), we  can assign to each precursor (knowledge how to specify a KSA-change model, but in each system this component) a specific learning process that is presumed to process may take a different shape. make the desired knowledge shift. The KLI framework A KSA-change model may include variables of two types: characterizes three kinds of learning processes: memory and fluency building , induction and refinement , and understanding 1. Sequences of knowledge components, features or attributes and sense-making. Specifying which kind of process is needed 2. Learning processes within each sequence in the particular learning sequence is necessary for subsequent decisions about the supports to be  provided. For example, if es Th e two types of variables define the learning sequences and processes that are needed to facilitate learning. e Th the focal learning process is fluency building , this implies that the learning system should provide practice opportunities for KSA-change variables are derived directly from the KSA model, such that each node/skill in the KSA model has a reference that KSA. In contrast, if the focal learning process for a different KSA is understanding and sense making, then the learning in the KSA-change model in the form of how to “move” students to learn that skill. system should provide explanations and examples. Figure 4 illustrates a general e-Proficiency model with an artificial example Given a specific skill (node in the map), this may be  done in two stages: (1) the first step is to define the (linear) sequence of adding-on the learning processes to a knowledge sequence built off of three prerequisites and a successor piece. of pre-requisites or precursors needed to learn that target Frontiers in Psychology | www.frontiersin.org 7 April 2019 | Volume 10 | Article 853 Arieli-Attali et al. Expanded Evidence-Centered Design FIGURE 4 | A general diagram of the e-Proficiency model (the orange node in the KSA model is specified in the KSA-change model for learning sequence and learning processes). Similarly, we can construct a sequence for each of the other nodes (the blue, pink, and red nodes). Applying the above approach to the HERA learning and the subskills in the sequence. This is the learning model for assessment system, let us focus on the subskill of interpolation the specific skill in focus, and is usually derived based on and extrapolation from data in a graph (the last red circle expert analysis. The model in Figure 5 also specifies particular in the progression of Graph Reading skill in Figure 3). Based misconceptions that students oen exhi ft bit at each level. on our guidelines above, the first step would be  to determine Specifying misconceptions may also help determine which a sequence of subskills/precursors and to characterize them, feedback and/or learning aid to provide to students. We  show and then as a second step to specify the cognitive process(es) in the next section how to define Task and Task-support models that would make the transition from one subskill to the next. based on this example. Figure 5 presents one section of the KSA-change of the HERA er Th e are several decisions that are taken as part of the system for the subskill of interpolation and extrapolation in model specifications. One of them is the grain-size of each a graph. The model specifies the proximal, distal, and initial precursor. An alternative KSA-change model can be determined precursors as follows: the proximal precursor  =  identifying with smaller or larger grain size subskills. Another decision the rate of change in the dependent variable (y-variable) as is whether to adopt a three-level precursor skill structure, or the independent variable (x-variable) changes; distal alternatively focus on only one precursor and the different precursor  =  being able to locate the y-value for a certain misconceptions students may have. Researchers and developers x-value point on a graph, and find adjacent points and compare are encouraged to try different approaches. the relative values; initial precursor  =  understanding that the We propose to derive the KSA-change variables by conducting two variables in a graph are co-related. Now applying the a learning process analysis by experts, i.e., an analysis of the KLI knowledge components characterization, the proximal pedagogical practices in the content domain or relying on a precursor (identifying rate of change) may be  characterized theory of learning in that domain, similar to the way we illustrated as “rule”; the distal precursor (locate points and compare) above (by using the KLI taxonomy). This is also parallel to as “schema”; and the initial precursor (two variables are the way a KSA model is derived based on cognitive task analysis co-related) as a “concept.” or domain analysis. The KSA-change model constitutes a collection Next, we  determine the cognitive processes that foster the of sequences (and their processes), each addressing one node transition from one subskill to the next. For example, given in the KSA model (as illustrated in Figures 4, 5). This can an understanding of the co-variation of x and y (the initial also be  viewed as a two-dimensional map, with the sequences subskill) students need to practice finding the y-points for as the second dimension for each node. different x-points to create the mental schema and build fluency Similar to updating the KSA model for a student, here with locating points and particularly two adjacent points. too, following analysis of data from student responses and However, to “jump” to the next step of identifying the trend student behaviors in using the learning supports, values on and the rate of change requires induction and refinement to the KSA-change variables indicate level or probability that a derive the rule. The last transition from identifying rate of particular student has gone through a particular learning process change to perform interpolation & extrapolation requires sense (or that a particular knowledge shift was due to the learning making and deduction – deducing from the rule to the new support used). We  will discuss this in more detail in the situation. Given the specific learning processes, we  can later e-Evidence model section. define which learning supports would be  most appropriate (e.g., practice for fluency building, worked example and Expanded Task Model comparisons for induction, and explanation for sense making In the original ECD framework, the Task model specifies the and deduction). The model in Figure 5 shows the different features of tasks that are presumed to elicit observables to learning processes as the transitions (arrows) required between allow inference on the target KSA. An important distinction Frontiers in Psychology | www.frontiersin.org 8 April 2019 | Volume 10 | Article 853 Arieli-Attali et al. Expanded Evidence-Centered Design FIGURE 5 | A specification diagram of the KSA-change model for one node/skill of interpolation/extrapolation in a graph in the HERA’s KSA-model. introduced in ECD is between a task model design based on explanations, or guidance to further tailored instruction derived a Proficiency model and a task-centered design ( Mislevy et  al., from the latent learning processes specified in the KSA-change 1999). While in task-centered design, the primary emphasis model. In other words, the supports are determined according is on creating the task with the target of inference defined to the focal knowledge change. We  elaborate and illustrate only implicitly, as the tendency to do well on those tasks, in on Task and Task-support models below. defining a task model based on a Proficiency (and Evidence) model, we make the connections and possible inferences explicit The Assessment Layer Within the e-Task from the start, making the design easier to communicate, easier Model – The Task Model to modify, and better suited to principled generation of tasks e T Th ask model provides a framework for describing the (Mislevy et  al., 1999, p.  23). Moreover, basing a task model situation in which examinees are given the opportunity to on Proficiency and Evidence models allows us to consider exhibit their KSAs, and includes the specifications of the stimulus reliability and validity aspects of task features, and particularly materials, conditions and affordances , as well as specifications the cognitively or empirically based relevance of the task for the work product (Mislevy et  al., 1999, p.  19). The features. In other words, considerations of item reliability and characteristics of the tasks are determined by the nature of validity guide the development of items to elicit the target the behaviors that provide evidence for the KSAs. Constructing observables and only them (minimizing added “noise”). This a Task model from the latent KSA model involves considering means that at the development stage of a task, all features of the cognitive aspect of task behavior, including specifying the the task should stand to scrutiny regarding relevance to the features of the situation, the internal representation of these latent KSA. As mentioned above, if reading ability is not features, and the connection between these representations and relevant as part of the mathematics KSA, items or tasks that the problem-solving behavior the task targets. In this context, may impede students with lower reading skills should be avoided. variables that aeff ct task difficulty are essential to take into uTh s, defining a task model based on a Proficiency model account. In addition, the Task model also includes features of resembles the relationship between the latent trait and its task management and presentation. manifestation in observable behavior. The more the task relates Although the Task model is built off of the Proficiency to the target KSA, the better the inference from the observable model (or the KSA model in our notation), multiple Task to the latent KSA. models are possible in a given assessment, because each For assessment precision purposes per-se, there is no need Task model may be employed to provide evidence in a different to provide feedback to students; on the contrary, feedback form, use different representational formats, or focus evidence can be  viewed as interference in the process of assessment, on different aspects of proficiency. Similarly, the same Task and likewise scaffolds and hints introduce noise or interference model and work product can produce different evidence; i.e., to a single-point-in-time measurement. However, when the different rules could be  applied to the same work product, assessment tool is also intended for learning, the goal is to to allow inferences on different KSAs. uTh s, it is necessary support learners when a weakness was identified, in order to define within each Task model the specific variables to to help them gain the “missing” KSA. In the e-ECD we define be  considered in the evidence rules (i.e., scoring rules; a “Task-support model” that together with the original Task we  elaborate on this in the next section). model creates the expanded-Task model (e-Task model). The Consider the abovementioned KSA from the HERA model: Task-support model specifies the learning supports that are “Perform an extrapolation using data from a graph.” As part necessary and should be  provided to learners in order to of a scientific reasoning skills assessment, this skill is defined achieve KSA change. Similar to basing the Task model on in a network of other skills related to understanding data the KSA model, the Task-support model is based on the representations, as seen in Figure 5. One possible Task model KSA-change model. The supports may include customized can be: “Given a graph with a defined range for the x-axis feedback, hints and scao ff lds, practice options, worked examples, variable [a,b] and y values corresponding to all x values in Frontiers in Psychology | www.frontiersin.org 9 April 2019 | Volume 10 | Article 853 Arieli-Attali et al. Expanded Evidence-Centered Design the range, find the y-value for an x-value outside the range.” The Learning Layer Within the e-Task aTh t is, we  present the learner with a graph (defined by its Model – The Task-Support Model x- and y- axes) and a function or paired coordinates (x, y) Tasks for assessment and tasks for learning differ in the for a limited domain. The question then asks learners to predict availability of options that support learning. When we  design the y-value of an x point which is outside the domain presented tasks for learning, we  need to consider the type of “help” or in the graph. Because extrapolation assumes the continuation “teaching” that the task ao ff rds, with the same level of rigor of the trend based on the relationship between variables, a that we  put into the design of the task itself. The Task-support required characteristic of the question is to include this model thus specifies the learning supports that might assumption, explicitly or implicitly via the context (e.g. stating be  necessary and should be  provided to students in order to other variables do not change, or the same experimental achieve the desired KSA-change (i.e., increase in KSA). Similar procedure was used for a new value). Articulating the assumption to basing the task model on the KSA model, the Task-support is part of the Task model. Another option for an extrapolation model is based on the KSA-change model. Task model could be: “Given a graph with two levels of the Making room for the specification of the task support dependent variable, both showing a linear relationship with in  connection to the learning processes/goals (the focal the x-variable (i.e., same relationship trend) but with different KSA-change) is the innovative core of the proposed e-ECD slopes, find the y-value for a third level of the dependent and its significant contribution to the design of learning and variable.” That is, we  present the learner with a graph with assessment systems. Many learning systems include scaoff lds two linear relationships (two line-graphs), one for level a and or hints to accompany items and tasks, oen det ft ermined by one for level b (for example, a, b are levels of weight of content experts or teacher experience and/or practices. These different carts, and the linear relationship is between speed hints and scao ff lds help answer the particular item they accompany, and time). The question then asks learners to predict the and may also provide “teaching,” if transfer occurs to subsequent y-value for level c (c  >  a, b; larger weight car) for an x- point similar items. However, in the design process of the hints and for which we  know the y-values of level a and b; that is, scao ff lds, oen ft no explicit articulation is made regarding the extrapolation beyond the data presented. This Task model is intended effect of hints and scaoff lds beyond the particular more sophisticated than the first one, due to the complexity question, or in connection to the general learning goals. Often, of the data representation, and thus is tapping into a higher the hints or scao ff lds are task-specific; a breakdown of the task level of the skill. into smaller steps, thus decreasing the difficulty of the task. Another aspect is the operationalization of the Task model This is also reflected in the approach to assigning partial credit in a particular item. Given a Task model, the question can for an item that was answered correctly with hints, contributing take the form of a direct non-contextualized (what we  may less to the ability estimate (as evidence of lower ability; e.g., also call a “naked”) question, (e.g., asking about a value of y Wang et  al., 2010). Specifying a Task-support model per each given a specific x), or it can be  contextualized (or “wrapped”) Task model dictates a standardization of the scao ff lds and hints within the context and terminology of the graph (e.g., “suppose (and other supports) provided for a given task. How do we specify the researcher decided to examine the speed of a new cart task supports connected to the focal KSA-change? that has greater weight, and suppose the trend of the results If for example, we  define a particular (as part of the observed is maintained, what would you  expect the new result KSA-change model) learning model similar to the one depicted to be?”). The “naked” and “dressed” versions of the question in Figure 5, we  may provide as a task support a “pointer” to may involve change in the difficulty of the item; however, the precursors, in the form of a hint or a scao ff ld. u Th s, the this change needs to be  examined, to the extent that it is scaoff lds are not a breakdown of the question to sub-steps, construct- relevant or irrelevant. If it is construct-relevant, but rather each scao ff ld points to one of the precursor pieces then it should be  included in the Task model as part of the of knowledge (initial, distal, or proximal precursor). In addition, specifications. Other factors may aeff ct the difficulty as well since we  defined the kind of knowledge change between each – the type of graphic (bar-graph, line-graph, multiple lines, precursor, we  can provide the corresponding support per each scatter plot) and the complexity of the relationships between desired change. If the knowledge change is related to memory variables (linear, quadratic, logarithmic, increasing, decreasing, and uen fl cy-building, we  may provide more practice examples one y-variable or more), the familiarity of the context of the instead of the scao ff ld. Similarly, if the knowledge change is task (whether this is a phenomenon in electricity, projectile related to understanding and sense-making, we  may provide motion, genetics, etc.), the complexity of the context (commonly an explanation or reasoning, or ask the student to provide understood, or fraught with misconceptions), the response the explanation or reasoning (self-explanation was found to options (multiple choice, or open-ended), the quality of the be  beneficial in some case, Koedinger et  al., 2012). It may graph and its presentation (easy or hard to read, presented very well be  the case that similar scaoff lds will result from on a computer, smartphone or a paper, presented as a static explicating a Task-support model following an e-ECD compared graph or interactive where learners can plot points), etc. These to not doing so, however in following this procedure, the factors and others need to be  considered when specifying the design decisions are explicit and easy to communicate, justify, Task model, and their relevance to the construct should modify, replicate, and apply in a principled development be  clearly articulated. of scao ff lds. Frontiers in Psychology | www.frontiersin.org 10 April 2019 | Volume 10 | Article 853 Arieli-Attali et al. Expanded Evidence-Centered Design Similarly, other features of task support, such as feedback, terminology, it refers to the proximal precursor (recall: proximal visuals, and links to a video or wiki page, can be  supported precursor  =  identifying the rate of the change in the dependent by the articulation of the KSA-change and the connection variable as the independent variable changes). between the two. e t Th hird type of support that we  oer in a ff n AL-PI item Let us illustrate specifying a Task-support model for the is Teach-me. The Teach-me option in this case includes the example item from HERA described in the previous section. following components: (1) a general statement about the skill; Recall that the item targeted the latent KSA “Perform an i.e., a graph presents data for a limited number of values, yet extrapolation using data from a graph,” and the task materials we  can estimate or predict about new values based on the trend included a graph with a specified function, asking students in the data presented; (2) an explanation of how to identify to extrapolate a point beyond the given range (i.e., predict the trend in a graph, i.e., locating adjacent points; and (3) an the value of y for a new x-value). Also, recall Figure 5 that illustration of how once the trend was identified, we  can depicts the KSA-change model for this particular subskill. Given perform extrapolation. the proximal, distal, and initial precursors, we  can now specify In our system we  provide an illustration on a different each scao ff ld to address each of these three precursor skills. value than the one in the question in order to avoid revealing Alternatively, we can decide to address only the closest precursor the correct answer and leaving room for the learner to put (the proximal) as a scao ff ld, and if that does not help with mental effort into applying the method taught. In the Task- answering the question correctly, then refer the student to support model terminology and in relation to the KSA-change “learn” the more basic material (e.g., in a different section of model, the Teach-me option addresses all three precursors. the system, or by presenting items/content that target the initial Specifying the task support based on the learning goal and and distal precursor skills). These decisions depend on the the desired change in KSA gives direction but does not limit system design (e-Assembly model) and may vary from system the options. On the contrary, it enriches the space of the to system. decision and opens-up new options. In addition, constructing As part of our development of the HERA system for scientific task support by following the e-ECD framework gives rise to thinking skills, we  developed an item model that can be  used the hypothesis that this way of structuring scao ff lds may enhance to collect evidence for both assessment and learning, termed transfer, because the scaoff lds do not address the particular an Assessment and Learning Personalized Interactive item question, but rather address the latent skill and its precursor (AL-PI). This item looks like a regular assessment item, and skills. Empirical evidence of transfer is of course needed to only aer a ft n incorrect response, the learners are given “learning examine this hypothesis. options” to choose from. We  oer ff three types of learning supports: (1) Rephrase – rewording of the question; (2) Break- Expanded Evidence Model it-down – providing the first step out of the multi-steps required e lin Th ks made between the e-Proficiency model and the e-Task to answer the question; and (3) Teach-me – providing a text model need explication of the statistical models that allow and/or video explanation of the background of the question. inferences from the work products on the tasks to the latent Figure 6 presents a screenshot of an AL-PI item from a task KSAs. In the ECD framework, the Evidence model specifies about height-restitution of a dropped-ball, targeting the skill the links between the task’s observables (e.g., student work of extrapolation. product) and the latent KSAs targeted by that task (termed Using the terminology above, the Rephrase-option provides here as Observational-Evidence model). The Observational- the learner with another attempt at the question, with the Evidence model includes the evidence rules (scoring rubrics) potential of removing the construct irrelevance that may stem and the statistical models. The Evidence model is the heart of from the item-phrasing (for learners who did not understand the ECD, because it provides the “credible argument for how what the question is asking them, due to difficulty with the students’ behaviors constitute evidence about targeted aspects wording). In this example, a Rephrase of the question is: “e Th of proficiency” ( Mislevy et  al., 1999, p.  2). question asks you  to find the “Height attained” (the y-value) In a system designed for learning, data other than the work for a new x-value that does not appear on the graph” (see product is produced, i.e., the data produced out of the task Figure 6 upper panel). Note that the Rephrase is practically support (e.g., hints and scao ff lds usage), which may be  called “undressing” (decontextualizing) the question, pointing out the process data. The task support materials are created to foster “naked” form, or making the connection between the context learning; thus, learning systems should have a credible argument and the decontextualized skill. that these supports indeed promote learning. Partial evidence e s Th econd learning support is Break-it-down which takes for that can be  achieved by inferences about knowledge or the form of providing the first step to answer the question. what students know and can do from their work product in In the example in Figure 6 the Break-it-down states: “e Th the system, following and as a result of the use of the supports, first step to answer this question is to evaluate the rate of and this can be  obtained by the statistical models within the change in y as a function of a change in the x-variable” with Evidence model. However, the efficacy of the task supports additional marks and arrows on the graph to draw the leaner’s themselves (i.e., which support helps the most in which case), attention where to look. The Break-it-down option may look and drawing inferences from scao ff lds and hint usage about like a hint, signaling to learners where to focus, and in our “learning behavior” or “learning processes” (as defined in the Frontiers in Psychology | www.frontiersin.org 11 April 2019 | Volume 10 | Article 853 Arieli-Attali et al. Expanded Evidence-Centered Design FIGURE 6 | An example of an Assessment & Learning Personalized & Interactive item (AL-PI item) from the HERA system. KSA-change model) may need new kind of models and evidence. which a score of 1 or 0 is obtained corresponding to a correct e T Th ransitional-Evidence model within the e-Evidence model or incorrect response. In other cases, the scoring rule might addresses the data produced from the task support. be  more complex, as in performance assessment where student responses produce what we  call “process data” (i.e., a log file The Assessment Layer Within the Evidence of recorded actions on the task). A scoring rule for process Model – The Observational-Evidence Model data can take the form of grouping a sequence of actions into In the original ECD, the Observational-Evidence model addresses a “cluster” that may indicate a desired strategy, or a level on the question of how to operationalize the conceptual target a learning progression that the test is targeting. In such an competencies defined by the Proficiency model, which are example, a scoring rule can be  defined such that a score of essentially latent, in order to be  able to validly infer from 1 or 0 is assigned corresponding to the respective strategy overt behaviors about those latent competencies. The employed, or the learning progression level achieved. Of course, Observational-Evidence model includes two parts. The first scoring rules are not confined to dichotomous scores and they contains the scoring rules, which are ways to extract a “score” can also define scores between 0 and 1, continuous (particularly or an observable variable from student actions. In some cases, when the scoring rules relies on response time) or ordered the scoring rule is simple, as in a multiple-choice item, in categories of 1-to-m, for m categories (polytomous scores). Frontiers in Psychology | www.frontiersin.org 12 April 2019 | Volume 10 | Article 853 Arieli-Attali et al. Expanded Evidence-Centered Design e s Th econd part of the Observational-Evidence model contains rules with those that are learned from the data. In particular, the statistical model. The statistical model expresses how the the supervised algorithms – methodologies used in machine scores (as defined by the scoring rules) depend, probabilistically, learning (ML) – can be  useful for identifying patterns in the on the latent competencies (the KSAs). This dependency is complex logfile data. These algorithms classify the patterns by probabilistic, that is, the statistical model defines the probability skills using a training data set that contained the correct or of certain “scores” (observables) given specific latent competencies theory-based classification. The word supervised here means (combination of values on the KSAs). In other words, at the that the “correct responses” were defined by subject-matter point in time at which the student is working within the experts and that the classification algorithm learns from these system, that student is in a “latent state” of knowledge, and data that were correctly classified to extrapolate to new given that latent state, there is a certain probability for the data points. observable variables, which if observed, are evidence for the In a learning and assessment system, the Observational- latent ability. However, all we  have are the student observable Evidence model may also take into account the scao ff lds and variables, and what we  need is a psychometric model that hints usage to infer about the KSA model. Since the scao ff lds allows us to do the reverse inference from the given observables and hints reduce the difficulty of the items/tasks, they also to the latent competencies. change their evidentiary value of the observables. This can There are various statistical models that can be  used here. be  done via either using only responses without hint usage Since we are talking about an assessment and learning system, to model KSA or applying a partial credit scoring rule for let us consider a multi-dimensional latent competency, i.e., items that were answered correctly with hints, thus assigning multiple skills are targeted by the system both for assessment them less credit as a reflection of their evidentiary value (e.g., and learning. If we  assume the latent competencies to Wang et  al., 2010; Bolsinova et al., 2019a,b). be continuous, we can use a multi-dimensional Item Response To summarize, any and all statistical models that allow us Theory models (e.g., MIRT; Reckase, 2009) or Bayes-net to define the connection between overt observables and latent models (Pearl, 1988, 2014; Martin and VanLehn, 1995; Chang competencies can be used in the Observational-Evidence model. et  al., 2006; Almond et  al., 2015). In the case where the The Learning Layer Within the Evidence latent competencies are treated as categorical with several increasingly categories of proficiency in each (e.g., low-, Model – The Transitional-Evidence Model medium-, and high-level proficiency, or mastery/non-mastery Similar to the way the Observational-Evidence model connects levels), we  can use diagnostic classification models (DCM; the Task model back to the KSA model, the Transitional- Rupp et  al., 2010b). What these models enable is to “describe” Evidence model uses the task supports data to infer about (or model) the relationship between the latent traits and the learning, and to link back to the KSA-change model. Recall observables in a probabilistic way, such that the probability that the KSA-change model includes pedagogical principles of a certain observable, given a certain latent trait, is defined which are reflected in the task supports. Similar to the assessment and therefore allow us to make the reverse inference – to layer of the Evidence model, the Transitional-Evidence model estimate the probability of a certain level of a latent trait also includes two parts: the scoring rules and the given the observable. statistical models. In order to make the link between the items/tasks (the e s Th coring rules define the observable variables of the stimuli to collect observables) and the latent KSAs, we  can Transitional-Evidence model. If task supports are available by use what is called a Q-matrix (Tatsuoka, 1983). A Q-matrix choice, student choice behavior can be  modeled to make is a matrix of <items  ×  skills> (items in the rows; skills in inferences about their learning strategies. The data from the the columns), defining for each item which skills it is targeting. task supports usage (hints, scaoff lds, videos, simulations, e Th Q-matrix plays a role in the particular psychometric model, animations, etc.) as well as number of attempts or response to determine the probability of answering an item correctly time, should first be  coded (according to a scoring or evidence given the combination of skills (and whether all skills are rule) to define which of them should count and in what way. needed, or some skill can compensate for others; As before, scoring rules can be  defined by human experts or non-compensatory or compensatory model, respectively). The can be  learned from the data. Q-matrix is usually determined by content experts, but it can The statistical models in the Transitional-Evidence model also be  learned from the data (e.g., Liu et  al., 2012). need to be  selected, such that they allow us to infer about Recent developments in the field of psychometrics have change based on observables over time. A popular stochastic expanded the modeling approach to also include models that model for characterizing a changing system is a Markov model are data driven, but informed by theory, and is referred to (cf. Norris, 1998). In a Markov model, transition to the next as Computational Psychometrics (von Davier, 2017). state depends only on the current state. Because the focus Computational Psychometrics is a framework that includes here is on latent competencies, the appropriate model is then complex models such as MIRT, Bayes-net and DCM, which a hidden Markov model (HMM; e.g., Visser et al., 2002; Visser, allow us to make inferences about latent competencies; however, 2011), and specifically an input-output HMM ( Bengio and these models may not define a priori the scoring rules, but Frasconi, 1995). A HMM would allow us to infer about the rather allow for a combination of the expert-based scoring efficacy of the learning supports in making a change in the Frontiers in Psychology | www.frontiersin.org 13 April 2019 | Volume 10 | Article 853 Arieli-Attali et al. Expanded Evidence-Centered Design FIGURE 7 | An input-output hidden Markov model (HMM). latent state (proficiency level). In addition, the input-output How do we  link the learning materials (defined in the Task- HMM will allow us to make the association between learning support model) to the learning processes/goals (defined in the materials (as input) and the change in KSA (latent) based KSA-change model)? Similar to the Q-matrix in the on the observables (output), to estimate the contribution Observational-Evidence model, here too we  need a matrix that (efficacy) of each particular support to the desired change in links the learning materials (task supports) with the associated proficiency (i.e., learning). Figure 7 illustrates this model for skills-change. We  can use an S-matrix (Chen et  al., 2018), a single latent skill (KSA at time t1 and t2), a single observation which is a matrix of <supports  ×  skills> (supports in the (O at time t1 and t2) and a single learning support (l at rows; skills in the columns), defining for each support which time t1 and t2). The observation dependency on the skill skills/process it can improve. In that sense, and similar to the (i.e., O given KSA; the arrow/link from KSA to O) is modeled Q-matrix, an S-matrix is a collection of “evidence” that explicate by the Observational-Evidence model (the model from the the connection between the supports and the desired learning original ECD), while the skill dependency on the learning shifts. For example, providing a worked example is a learning support (i.e., KSA given l; the arrow/link from l to KSA) is support that may be  connected to several knowledge shifts modeled by the Transitional-Evidence model. (corresponding to subskills in the learning models), and providing Working with the above example, let us assume a student opportunities for practice is another learning support that may does not know how to identify a data trend from a graph, be connected to different desired knowledge shifts (corresponding and thus cannot extrapolate a new data point (incorrectly to different subskills). The S-matrix will specify these connections. answers a question that requires extrapolation). Suppose a e S-m Th atrix will then play a role in the HMM, to determine task support is provided, such that it draws the student’s the probability that a particular knowledge shift (learning attention to the pattern and trend in the data. We  now want process) occurred given the particular learning supports. Similar to estimate the contribution of this support in helping the to the Q-matrix, the S-matrix should be  determined by content student learn (and compare this contribution to other task experts, and/or learned or updated from the data. supports). We  have the following observables: the student’s incorrect answer in the first attempt, the student’s use of the particular task support, and the student’s revised answer in THE e-ASSEMBLY MODEL the second attempt (whether correct or not). Using an input- In the original ECD, the Assembly model determines how to output HMM will allow us to estimate the probability of transitioning from the incorrect to the correct latent state put it all together and specifies the conditions needed for obtaining the desired reliability and validity for the assessment. (or in other cases from low proficiency to high proficiency), given the use of the task support. Of course, the model will In other words, it determines the structure of the test, the number and the mix of the desired items/tasks. The Assembly be  applied across questions and students in order to infer about latent state. model is directly derived from the Proficiency model, such that it ensures, for example, the appropriate representation of e a Th bove example of a single latent skill can be  extended to a map of interconnected skills using dynamic Bayesian all skills in the map. Going back to the HERA example and the KSA-model in Figure 3, if we  were to build an assessment network (DBN; Murphy and Russell, 2002). DBN generalizes HMM by allowing the state space to be  represented in a with those target skills, we would have to ensure that we sample items/tasks for each of the skills and subskills specified on the factored form instead of as a single discrete variable. DBN extends Bayesian networks (BN) to deal with changing situations. map, and the Assembly model will specify how much of each. Frontiers in Psychology | www.frontiersin.org 14 April 2019 | Volume 10 | Article 853 Arieli-Attali et al. Expanded Evidence-Centered Design For the expanded ECD, we  do not create a parallel model The learning aspect of the system is motivated by the to the Assembly model as we  did for the three core models, goal to maximize learners’ gain and thus needs a more because in a blended learning and assessment system we  do comprehensive adaptivity, or what is often called not assemble the assessment separately and the learning separately. “recommendation model.” A recommendation model does not Rather, in the process of developing a system, aer w ft e specified only determine the next item to be  presented but it also the six core models of the e-ECD, we  assemble it all together determines which instructional or training material to in what we  call the e-Assembly model. recommend or present to the learner. A good recommendation e Th role of e-Assembly model is to specify how to put it model makes full use of all available information about both all together. It will include the specifications of number and the learner and the instructional materials to maximize the mix of items/tasks, but it will also include how and when to KSA gain for the learner. If we  have a way to estimate present the learning support materials. This can be  seen as (measure) the gain for the learner, we can feed this information determining how to switch between the “assessment” mode to the recommendation engine to determine the adaptivity of the system and the “learning” mode of the system. in the form of the next task support and/or training and e Th e-Assembly model provides an opportunity to take into instructional material needed. Thus, the additional layer of account additional pedagogical principles that are relevant to an evidence model for the learning materials (i.e., the statistical the combination of items and tasks, such as the importance models for estimating the efficacy of the task supports) provides of reducing cognitive load for learning; focusing on one skill a good candidate model for the recommendation engine. at a time; gradual increased difficulty presentation; adaptive Which materials were already used by the learner (which presentation of content, among others. Conditions to ensure ones were chosen/preferred), which supports are found more the validity of the system may also specify pedagogical principles effective for that particular learner, which skill is currently such as learning via real-world authentic tasks or learning by in focus and which supports are most effective for that doing, as well as learner engagement factors, as relevant. particular skill (e.g., practice, explained example, video lecture, Pedagogical Content Knowledge principles that include simulation demonstration, providing instructional material for knowledge of student misconceptions regarding specific a prior/prerequisite skill, etc.) are some of the decisions needed phenomena, if articulated as part of the KSA and KSA-change to be  made by a recommendation engine, and these decisions model, should be also considered here in selecting and designing rely on the statistical models that were used to evaluate and tasks, such that the misconceptions are either accounted for provide evidence for the efficacy of the task support and or avoided so the KSAs can be  validly addressed. instructional materials. e e-A Th ssembly model is also the place to take into account considerations from other relevant approaches, such as the learner-centered design approach (LCD; Soloway et  al., 1994; CONCLUSION AND FUTURE STEPS Quintana et  al., 2000), which argue that student engagement and constructivist theories of learning should be  at the core In this paper, we  propose a new way to fuse learning and of a computerized learning system. Adopting such an approach assessment at the design stage. Specifically, we  propose an will aeff ct the combination and/or navigation through the expanded framework we  developed to aid with the creation of system. For example, the system may guide students to be more a system for blended assessment and learning. We  chose the active in trying out options and making choices regarding their ECD framework as a starting point because this is a comprehensive navigation in the system. and rigorous framework for the development of assessments An important aspect of systems for learning and assessment and underlies the development of tests for most testing is whether they are adaptive to student performance and in organizations. Incorporating learning aspects, both learning goals what way. This aspect within the e-Assembly model ties directly and learning processes, in the ECD framework is challenging, to the e-Evidence model. The statistical models in the Evidence because of fundamental differences in the assumptions and model are also good candidates for determining the adaptive approaches of learning and assessment. Nevertheless, we showed algorithm in adaptive assessments. For example, if a 2PL IRT that the unique structure of Proficiency, Task, and Evidence model is used to estimate ability; this model can also be  used models lends itself to creating parallel models for consideration to select the items in a Computer Adaptive Test (CAT), as of the corresponding aspect of learning within each model. is oen do ft ne in large-scale standardized tests that are adaptive We are currently applying this framework in our work. (e.g., the old version of the GRE). Similarly, if a Bayes-net is In future work, we  hope to show examples of the learning used to estimate the map of KSAs, then the selection of items and assessment system that we  build following the e-ECD or tasks can be  done based on the Bayes-net estimates of framework. We are also working to incorporate other elements skills. Similarly, we  can use the DCM to identify weakness into the framework, primarily the consideration of motivation, in a particular skill and thus determine the next item that meta-cognition, and other non-cognitive skills. Since learners’ targets that particular weakness. This is true for any other engagement is a crucial element in a learning system, we  can model, also including data-driven models, because the purpose think of a way to incorporate elements that enhance engagement of the models is to provide a valid way to estimate KSAs, as part of the assembly of the system, by using reward system and once this is done, adaptivity within the system can or gamification in the form of points, coins, badges, etc. be  determined accordingly. Adding gamification or engagement-enhancing elements into Frontiers in Psychology | www.frontiersin.org 15 April 2019 | Volume 10 | Article 853 Arieli-Attali et al. Expanded Evidence-Centered Design a system does not currently have a designated model within SW and JT contributed to the e-Task model. BD contributed the e-ECD. We  are working to find a way to incorporate to the e-Evidence model. The authors would like to thank the these elements into the framework. reviewers for substantial contribution. AUTHOR CONTRIBUTIONS FUNDING MA-A and AAvD contributed to the conception of the framework. This work has been done as part of a research initiative at MA-A contributed to the conception and specifications of the ACTNext, by ACT, Inc. No external funds or grants supported new models, and AAvD contributed to the CP component. this study. REFERENCES analytics in measuring computational thinking in block-based programming” in Proceedings of the Seventh International Learning Analytics & Knowledge Almond, R. G., Mislevy, R. J., Steinberg, L. S., Yan, D., and Williamson, Conference (ACM). Vancouver, BC, Canada, 530–531. D. M. (2015). Bayesian networks in educational assessment. (New York, NY: Heffernan, N., and Heffernan, C. (2014). The ASSISTments ecosystem: building Springer). a platform that brings scientists and teachers together for minimally invasive Anderson, J. R., Corbett, A. T., Koedinger, K. R., and Pelletier, R. (1995). research on human learning and teaching. Int. J. Artif. Intell. Educ. 24, Cognitive tutors: lessons learned. J. Learn. Sci. 4, 167–207. doi: 10.1207/ 470–497. doi: 10.1007/s40593-014-0024-x s15327809jls0402_2 Kim, Y. J., Almond, R. G., and Shute, V. J. (2016). Applying evidence-centered Arieli-Attali, M., and Cayton-Hodges, G. (2014). Expanding the CBAL design for the development of game-based assessments in physics playground. mathematics assessments to elementary grades: the development of a Int. J. Test. 16, 142–163. doi: 10.1080/15305058.2015.1108322 competency model and a rational number learning progression. ETS Res. Kingston, N. M., Karvonen, M., Thompson, J. R., Wehmeyer, M. L., and Rep. Ser. 2014, 1–41. doi: 10.1002/ets2.12008 Shogren, K. A. (2017). Fostering inclusion of students with significant Arieli-Attali, M., Wylie, E. C., and Bauer, M. I. (2012). “e u Th se of three cognitive disabilities by using learning map models and map-based assessments. learning progressions in supporting formative assessment in middle school Inclusion 5, 110–120. doi: 10.1352/2326-6988-5.2.110 mathematics” in Annual meeting of the American Educational Research Koedinger, K. R., Corbett, A. T., and Perfetti, C. (2012). The knowledge-learning- Association. (Vancouver, Canada). instruction framework: bridging the science-practice chasm to enhance robust Attali, Y., and Arieli-Attali, M. (2014). Gamification in assessment: do points ae ff ct student learning. Cogn. Sci. 36, 757–798. doi: 10.1111/j.1551-6709.2012.01245.x test performance? Comp. Educ. 83, 57–63. doi: 10.1016/j.compedu.2014.12.012 Koehler, M., and Mishra, P. (2009). What is Technological Pedagogical Content Bengio, Y., and Frasconi, P. (1995). “An input output HMM architecture” in Knowledge (TPACK)? Contemp. Issues Technol. Teach. Educ. 9, 60–70. doi: Advances in Neural Information Processing Systems. eds. M. I. Jordan, 10.1177/002205741319300303 Y. LeCun, and S. A. Solla (Cambridge, MA, USA: MIT Press), 427–434. Liu, J., Xu, G., and Ying, Z. (2012). Data-driven learning of Q-matrix. Appl. Bolsinova, M., Deonovic, B., Arieli-Attali, M., Settles, B., Hagiwara, M., Von Psychol. Meas. 36, 548–564. doi: 10.1177/0146621612456591 Davier, A., et al. (2019a). Hints in adaptive learning systems: consequences Luecht, R. M. (2013). Assessment engineering task model maps, task models for measurement. Paper presented at the annual meeting of the National and templates as a new way to develop and implement test specifications. Council of Measurement in Education (NCME). Toronto, Canada. J. Appl. Test. Technol. 14, 1–38. Retrieved from: http://jattjournal.com/index. Bolsinova, M., Deonovic, B., Arieli-Attali, M., Settles, B., Hagiwara, M., Von php/atp/article/view/45254 Davier, A., et al. (2019b under review). Measurement of ability in adaptive Martin, J., and VanLehn, K. (1995). Student assessment using Bayesian nets. learning and assessment systems when learners use on-demand hints. Educ. Int. J. Hum. Comput. Stud. 42, 575–591. doi: 10.1006/ijhc.1995.1025 Psychol. Meas. Mislevy, R. J. (2013). Evidence-centered design for simulation-based assessment. Chang, K. M., Beck, J., Mostow, J., and Corbett, A. (2006). “A Bayes net toolkit Mil. Med. (special issue on simulation, H. O’Neil, Ed.) 178, 107–114. doi: for student modeling in intelligent tutoring systems” in International Conference 10.7205/MILMED-D-13-00213 on Intelligent Tutoring Systems. (Berlin, Heidelberg: Springer), 104–113. Mislevy, R. J., Almond, R. G., and Lukas, J. F. (2003). A brief introduction to Chen, Y., Li, X., Liu, J., and Ying, Z. (2018). Recommendation system for evidence-centered design. ETS Res. Rep. Ser. 2003. Princeton, NJ. doi: adaptive learning. Appl. Psychol. Meas. 42, 24–41. doi: 10.1177/0146621617697959 10.1002/j.2333-8504.2003.tb01908.x Conrad, S., Clarke-Midura, J., and Klopfer, E. (2014). A framework for structuring Mislevy, R. J., Behrens, J. T., Dicerbo, K. E., and Levy, R. (2012). Design and learning assessment in an educational massively multiplayer online educational discovery in educational assessment: evidence-centered design, psychometrics, game – experiment centered design. Int. J. Game Based Learn. 4, 37–59. and educational data mining. JEDM J. Educ. Data Min. 4, 11–48. Retrieved doi: 10.4018/IJGBL.2014010103 from: https://jedm.educationaldatamining.org/index.php/JEDM/article/view/22 Embretson, S. E. (1998). A cognitive design system approach to generating Mislevy, R. J., Steinberg, L. S., and Almond, R. G. (1999). “On the roles of valid tests: application to abstract reasoning. Psychol. Methods 3, 300–396. task model variables in assessment design.” in Paper presented at the Feng, M., Hansen, E. G., and Zapata-Rivera, D. (2009a). “Using evidence centered Conference “Generating Items for Cognitive Tests: Theory and Practice” design for learning (ECDL) to examine the ASSISTments system” in Paper (Princeton, NJ). presented in the annual meeting of the American Educational Research Mislevy, R. J., Steinberg, L. S., Almond, R. G., and Lukas, J. F. (2006). “Concepts, Association (AERA). (San Diego, California). terminology, and basic models of evidence-centered design” in Automated Feng, M., Heffernan, N. T., and Koedinger, K. R. (2009b). Addressing the scoring of complex tasks in computer-based testing. eds. D. M. Williamson, assessment challenge in an intelligent tutoring system that tutors as it assesses. R. J. Mislevy, and I. I. Bejar (New York, NY), 15–47. J. User Model. User Adapt Interact. 19, 243–266. doi: 10.1007/s11257-009-9063-7 Murphy, K. P., and Russell, S. (2002). Dynamic Bayesian networks: Representation, Furtak, E. M., Thompson, J., Braaten, M., and Windschitl, M. (2012). “Learning inference and learning. Doctoral dissertation. Berkeley: University of California. progressions to support ambitious teaching practices” in Learning progressions Available at: https://www.cs.ubc.ca/~murphyk/Thesis/thesis.html (Accessed in science: Current challenges and future directions. eds. A. C. Alonzo, and November 4, 2019). A. W. Gotwals (Rotterdam: Sense Publishers), 405–433. National Research Council (NRC) (2007). Taking science to school: Learning Grover, S., Bienkowski, M., Basu, S., Eagle, M., Diana, N., and Stamper, J. (2017). and teaching science in grades K-8. (Washington, DC: The National “A framework for hypothesis-driven approaches to support data-driven learning Academies Press). Frontiers in Psychology | www.frontiersin.org 16 April 2019 | Volume 10 | Article 853 Arieli-Attali et al. Expanded Evidence-Centered Design Nichols, P., Ferrara, S., and Lai, E. (2015). “Principled design for efficacy: Shute, V. J., Hansen, E. G., and Almond, R. G. (2008). You can’t fatten A hog design and development for the next generation tests” in e n Th ext generation by weighing It–Or can you? evaluating an assessment for learning system of testing: Common core standards, smarter-balanced, PARCC, and the nationwide called ACED. Int. J. Artif. Intell. Edu. 18, 289–316. https://content.iospress. testing movement. ed. R. W. Lissitz (Charlotte, NC: Information Age Publishing), com/articles/international-journal-of-artificial-intelligence-in-education/ 228–245. jai18-4-02 Nichols, P., Kobrin, J. L., Lai, E., and Koepfler, J. (2016). “e r Th ole of theories Soloway, E., Guzdial, M., and Hay, K. (1994). Learner-centered design: the challenge of learning and cognition in assessment design and development” in The for HCI in the 21st century. Interactions 1, 36–48. doi: 10.1145/174809.174813 handbook of cognition and assessment: Frameworks, methodologies, and Straatemeier, M. (2014). Math garden: a new educational and scientific instrument. applications. 1st edn. eds. A. A. Rupp and J. P. Leighton (Massachusetts, Education 57, 1813–1824. PhD thesis. ISBN9789462591257. USA: John Wiley & Sons, Inc.), 15–40. Tatsuoka, K. K. (1983). Rule space: an approach for dealing with misconceptions Norris, J. R. (1998). Markov chains. (New York, NY: Cambridge University Press). based on item response theory. J. Educ. Meas. 20, 345–354. doi: 10.1111/ Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible j.1745-3984.1983.tb00212.x inference. (San Francisco, CA: Morgan Kaufmann Publishers, Inc.). Ventura, M., and Shute, V. (2013). e Th validity of a game-based assessment of Pearl, J. (2014). Probabilistic reasoning in intelligent systems: Networks of plausible persistence. Comput. Hum. Behav. 29, 2568–2572. doi: 10.1016/j.chb.2013.06.033 inference. (Elsevier). Visser, I. (2011). Seven things to remember about hidden Markov models: a Pelánek, R. (2017). Bayesian knowledge tracing, logistic models, and beyond: tutorial on Markovian models for time series. J. Math. Psychol. 55, 403–415. an overview of learner modeling techniques. User Model. User Adap. Inter. doi: 10.1016/j.jmp.2011.08.002 27, 313–350. doi: 10.1007/s11257-017-9193-2 Visser, I., Raijmakers, M. E., and Molenaar, P. (2002). Fitting hidden Markov Posner, G. J., Strike, K. A., Hewson, P. W., and Gertzog, W. A. (1982). models to psychological data. Sci. Program. 10, 185–199. doi: 10.1155/2002/874560 Accommodation of a scientific conception: toward a theory of conceptual von Davier, A. A. (2017). Computational psychometrics in support of collaborative change. Sci. Educ. 66, 211–227. doi: 10.1002/sce.3730660207 educational assessments. J. Educ. Meas. 54, 3–11. doi: 10.1111/jedm.12129 Quintana, C., Krajcik, J., and Soloway, E. (2000). “Exploring a structured Wang, Y., Heffernan, N. T., and Beck, J. E. (2010). “Representing student definition for learner-centered design” in Fourth International Conference of performance with partial credit” in Proceeding of a conference: 3rd International the Learning Sciences. eds. B. Fishman and S. O’Connor-Divelbiss (Mahwah, Conference on Educational Data Mining. eds. R. S. J. d. Baker, A. Merceron, NJ: Erlbaum), 256–263. and P. I. Pavlik Jr. (Pittsburgh, PA, USA), 335–336. Razzaq, L., Feng, M., Nuzzo-Jones, G., Heffernan, N. T., Koedinger, K. R., Wilson, M. (2009). Measuring progressions: assessment structures underlying Junker, B., et al. (2005). “e A Th ssistment project: blending assessment and a learning progression. J. Res. Sci. Teach. 46, 716–730. doi: 10.1002/tea.20318 assisting” in Proceedings of the 12th Artici fi al Intelligence in Education . eds. C. K. Looi, G. McCalla, B. Bredeweg and J. Breuker (Amsterdam: ISO Conflict of Interest Statement: The authors declare that the research was conducted Press), 555–562. in the absence of any commercial or financial relationships that could be construed Reckase, M. D. (2009). “Multidimensional item response theory models” in as a potential conflict of interest. Multidimensional item response theory. ed. M. D. Reckase (New York, NY: Springer), 79–112. Copyright © 2019 Arieli-Attali, Ward, Thomas, Deonovic and von Davier. This is Rupp, A. A., Gushta, M., Mislevy, R. J., and Shaer ff , D. W. (2010a). Evidence- an open-access article distributed under the terms of the Creative Commons Attribution centered design of epistemic games: measurement principles for complex License (CC BY). The use, distribution or reproduction in other forums is permitted, learning environments. J. Technol. Learn. Assess. 8. Retrieved from: https:// provided the original author(s) and the copyright owner(s) are credited and that ejournals.bc.edu/ojs/index.php/jtla/article/view/1623 the original publication in this journal is cited, in accordance with accepted academic Rupp, A. A., Templin, J. L., and Henson, R. A. (2010b). Diagnostic measurement: practice. No use, distribution or reproduction is permitted which does not comply Theory, methods, and applications . (New York, NY: Guilford Press). with these terms. Frontiers in Psychology | www.frontiersin.org 17 April 2019 | Volume 10 | Article 853 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Frontiers in Psychology Pubmed Central

The Expanded Evidence-Centered Design (e-ECD) for Learning and Assessment Systems: A Framework for Incorporating Learning Goals and Processes Within Assessment Design

Frontiers in Psychology , Volume 10 – Apr 26, 2019

Loading next page...
 
/lp/pubmed-central/the-expanded-evidence-centered-design-e-ecd-for-learning-and-ay0QiMMfW7

References (52)

Publisher
Pubmed Central
Copyright
Copyright © 2019 Arieli-Attali, Ward, Thomas, Deonovic and von Davier.
ISSN
1664-1078
eISSN
1664-1078
DOI
10.3389/fpsyg.2019.00853
Publisher site
See Article on Publisher Site

Abstract

ORIGINAL RESEARCH published: 26 April 2019 doi: 10.3389/fpsyg.2019.00853 The Expanded Evidence-Centered Design (e-ECD) for Learning and Assessment Systems: A Framework for Incorporating Learning Goals and Processes Within Assessment Design 1,2 3 3 1 Meirav Arieli-Attali *, Sue Ward , Jay Thomas , Benjamin Deonovic and Alina A. von Davier 1 2 ACTNext, ACT Inc., Iowa City, IA, United States, Fordham University, New York City, NY, United States, Edited by: ACT Inc., Iowa City, IA, United States Frank Goldhammer, German Institute for International Evidence-centered design (ECD) is a framework for the design and development of Educational Research (LG), Germany assessments that ensures consideration and collection of validity evidence from the onset Reviewed by: Russell G. Almond, of the test design. Blending learning and assessment requires integrating aspects of Florida State University, United States learning at the same level of rigor as aspects of testing. In this paper, we describe an Gabriels Nagy, Christian-Albrechts-Universität expansion to the ECD framework (termed e-ECD) such that it includes the specifications zu Kiel, Germany of the relevant aspects of learning at each of the three core models in the ECD, as well *Correspondence: as making room for specifying the relationship between learning and assessment within Meirav Arieli-Attali the system. The framework proposed here does not assume a specific learning theory meirav.attali@act.org or particular learning goals, rather it allows for their inclusion within an assessment Specialty section: framework, such that they can be articulated by researchers or assessment developers This article was submitted to that wish to focus on learning. Quantitative Psychology and Measurement, Keywords: task design, technology-based assessment, blended assessment and learning, development framework, a section of the journal Evidence model Frontiers in Psychology Received: 10 December 2018 Accepted: 01 April 2019 INTRODUCTION Published: 26 April 2019 Citation: There is a growing need for the development of assessments that are connected and relevant Arieli-Attali M, Ward S, Thomas J, to learning and teaching, and several attempts have been made in recent years to focus on Deonovic B and von Davier AA this topic in conferences and journals. For example, Mark Wilson’s 2016 June and September (2019) The Expanded Evidence-Centered Design (e-ECD) presidential messages in the National Council for Measurement in Education’s newsletter for Learning and Assessment addressed Classroom Assessment, and this topic was also the conference theme for the Systems: A Framework for following 2  years, 2017 and 2018. The journal Assessment in Education: Principles, Policy & Incorporating Learning Goals and Practice recently devoted a special issue on the link between assessment and learning (volume Processes Within 24, issue 3, 2017). The issue focused on the developments in the two disciplines which, Assessment Design. despite mutual influences, have taken distinctly separate paths over time. In recent years, Front. Psychol. 10:853. doi: 10.3389/fpsyg.2019.00853 systems that blend learning and assessment have been proposed all over the world Frontiers in Psychology | www.frontiersin.org 1 April 2019 | Volume 10 | Article 853 Arieli-Attali et al. Expanded Evidence-Centered Design (e.g., Razzaq et  al., 2005; Shute et al., 2008; Feng et  al., content, provides a new and significant shift in the modern 2009b; Attali and Arieli-Attali, 2014; Straatemeier, 2014). development of leaning and assessment systems. While within the educational measurement field, there are We chose to expand the well-known evidence-centered established standards and frameworks for the development design framework (ECD; Mislevy et  al., 1999, 2003, 2006). of reliable and valid assessments, those rarely take learning The ECD formulates the process of test development to ensure aspects into account. As part of our own effort to develop consideration and collection of validity evidence from the a blended learning and assessment system, we  identified a onset of the test design. The ECD is built on the premise need for a formal framework of development that includes that a test is a measurement instrument with which specific aspects of learning at the same level of detail and rigor as claims about the test scores are associated, and that a good aspects of testing. This paper describes our general approach test is a good match of the test items and the test takers’ at expanding an assessment framework, with some examples skills. The ECD framework defines several interconnected from our system to better illustrate the abstract concepts. models, three of which form the core of the framework and Our approach at expanding a principled assessment design are relevant to our discussion: the Student model(s), Evidence is primarily concerned with the inclusion of three dimensions: model(s), and Task model(s) (the combination of the three aspects of learning, such as the ability to incorporate the change models is also called the Conceptual Assessment Framework; over time in the skills to be  measured at the conceptual level; CAF; see Figure 1). Note that in more recent publications aspects of interactive and digital instructional content, such as of the ECD, the Student model is termed a Proficiency model simulations, games, practice items, feedback, scao ff lds, videos, (e.g., Almond et  al., 2015). and their associated ao ff rdances for the data collection in rich e S Th tudent or the Proficiency model(s) specifies the logfiles; and measurement models for learning that synthesize knowledge, skills, and ability (KSA; which are latent the complexities of the digital instruction and data features. competencies) that are the target of the test. This model can e exp Th anded framework proposed here allows for the design be  as simple as defining one skill (e.g., the ability θ) or a of systems for learning that are principled, valid, and focused map of interconnected subskills (e.g., fractions addition, on the learner. Systems designed in this framework are subtractions, multiplication, and division are interconnected intrinsically connected with the assessment of the skills over subskills that form the map of knowing fractions). The latent the time of instruction, as well as at the end, as summative competencies that are articulated and defined in this model tests, if so desired. This type of systems has an embedded establish the conceptual basis of the system, and they are oen ft efficacy structure, so that additional tests can be  incorporated based on a theory or previous findings related to the goal of within. Learning and assessment developers, as well as researchers, the assessment. can benefit from such a framework, as it requires articulating Since we  cannot tap directly into the latent competencies, both the assessment and learning intended goals at the start we  need to design tasks/test items such that they will elicit of the development process, and it then guides the process behaviors that can reflect on or indicate about the latent to ensure validity of the end-product. The framework proposed competencies. This is the role of the Task model(s). The Task here does not assume a specific learning theory or particular model specifies the tasks features that are supposed to elicit learning goals, rather it allows for their inclusion within the the observables, and only them, such that to allow inferences assessment framework. The measurement perspective, combined about the latent competencies. For example, if the assessment with the learning sciences perspective in the development of is intended to measure “knowledge of operating with fractions,” FIGURE 1 | The core models within the ECD framework (from Mislevy Almond & Lucas, © 2003 Educational Testing Service; used with permission); note that later versions term the Student model as Proficiency model. Frontiers in Psychology | www.frontiersin.org 2 April 2019 | Volume 10 | Article 853 Arieli-Attali et al. Expanded Evidence-Centered Design the tasks should be designed with care such that reading ability address the challenges of simulation- and game-based assessment is not an obstacle to perform well on the task and express (Rupp et  al., 2010a; Mislevy, 2013; Kim et  al., 2016). one’s fractions knowledge. e E Th vidence models then make the connection between the latent competencies [specified by the Student/Proficiency MOTIVATION FOR A PRINCIPLED model(s)] and the observables [behaviors elicited by the Task APPROACH TO THE DESIGN AND model(s)]. In other words, the Evidence models are the connecting link. e Th Evidence models include the measurement model, DEVELOPMENT OF A LEARNING AND comprised of the rubrics, the scoring method, and the statistical ASSESSMENT SYSTEM method for obtaining a total score(s). See Figure  1 for a diagram of the ECD and specifically the three CAF models Learning and assessment, although both relate to the process (note that latent competencies are symbolized as circles, while of determining whether or not a student has a particular observables as squares; and the connection between the circles knowledge, skill, or ability (KSA), differ substantially in the and squares are shown in the Evidence models). way they treat KSAs. The main difference between an assessment Two important additional models are the Assembly model tool and a learning tool is in the assumption about the focal and the Presentation model (see Figure 1). The Assembly KSA, whether it is fixed or dynamic at the time of interacting model defines how the three models in the CAF (the Student/ with the tool. The Student/Proficiency model in the ECD Proficiency, Task, and Evidence models) work together and describes a map of competencies (KSAs), and as in most specifically determines the conditions for reliability and validity psychometric models for testing, the assumption is of a latent of the system. As part of the Assembly model, the developers trait, which is “fixed” at the time of taking the test. The purpose determine the number of items/tasks and their mix of an assessment is thus to “detect” or “diagnose” that fixed (“constraints”) such they provide the necessary evidence and latent KSA at a certain point in time, similar to any measurement are balanced to properly reflect the breadth and diversity of tool (e.g., a scale measuring a person’s weight at a particular the domain being assessed. The Presentation models are point in time). On the other hand, the main purpose of a concerned with different ways to present the assessment, learning tool, such as a computer tutoring system, is to “move” whether it is a paper-and-pencil test, a computer-based test, the learner from one state of knowledge to another – that is, a hands-on activity, etc. We  will elaborate on and delve the concern is first and foremost with the change in KSAs deeper into each of the models as part of the expansion over time, or the transition. Of course, an assessment tool description below; for more details on the original ECD, per se cannot drive the desired change unless deliberate efforts see Mislevy et  al. (2003, 2006). are implemented in the design of the system (similar to a There are other alternatives frameworks for the design and scale which will not help with weight loss unless other actions development of assessment that follow a principled approach, are taken). Thus, systems that aim at blending assessment and such as the Cognitive Design System (Embretson, 1998), the learning cannot implement ECD as is, since ECD is inherently Assessment Engineering framework (Luecht, 2013), the a framework to develop assessments and not learning. Principled Design for Efficacy framework ( Nichols et al., 2015), Moreover, the availability of rich data collected via technology- or the Principled Assessment Design framework (Nichols enhanced learning and assessment systems (e.g., trial and error et al., 2016). These frameworks may be perceived as alternatives as part of the learning process, hint usage) poses challenges, to the ECD, and one might find any of them as a candidate as well as promises, for assessment design and the decision for a similar expansion the way we  demonstrate executing process of which actions to allow and what to record, either for the ECD in this paper. The reason there were several to promote efficient learning or to enable the reliable assessment assessment frameworks developed over the years stem from of the learning in order to make valid inferences about KSAs. the need to ensure validity of assessment tools. Although Computational Psychometrics (von Davier, 2017), an emerging traditional assessments were developed for about half a century discipline, blends theory-based methods and data-driven without a principled approach (i.e., by following an assessment algorithms (e.g., data mining and machine learning) for measuring manual and specifications) and validity was verified after latent KSAs. Computational Psychometrics is a framework for development, the advantage of following a principled framework analyzing large and oen un ft structured data, collected during such as the ECD or others is particularly evident when the the learning or performance process, on a theoretical learning goal is to assess complex competencies (e.g., problem solving, and psychometric basis. We  also combine aspects of reasoning, collaborative work) and/or when using complex Computational Psychometrics in our expanded design framework, performance tasks (e.g., multidimensional tasks such as similar to previous accounts that integrated data mining into performance assessment, simulations or games on computer ECD (e.g., Mislevy et  al., 2012; Ventura and Shute, 2013). or otherwise). In these cases, it is important to explicitly Combining data-driven algorithms into ECD allows knowledge identify the relevant competencies and behaviors and how discovery and models’ update from data, thereby informing they are connected, because the complexity of the focal the theory-based Student/Proficiency model and enriching the competencies and/or the rich data that the tasks provide might Evidence model. pose difficulties in making inferences from behaviors to Attempts to develop innovative assessments within games competencies. ECD has been also successfully applied to or as part of complex skills assessment and learning also Frontiers in Psychology | www.frontiersin.org 3 April 2019 | Volume 10 | Article 853 Arieli-Attali et al. Expanded Evidence-Centered Design brought about variations or expansions to ECD (e.g., Feng proportion of KSAs not taught (scaoff lded)]. In sum, Feng et  al., 2009a; Conrad et  al., 2014; Grover et  al., 2017). One and his colleagues demonstrated how an existing learning characteristic of ECD variants focuses on the task and its product can be  analyzed (and potentially improved) using connection to the Evidence model. Since game-play and the an ECDL framework. rich data from complex assessments oen r ft esult in sequences Common to the various adaptations of ECD is that they of actions, not all of which are relevant to the target competencies, were task driven. First came the tasks; then came the ECD researchers may follow an ECD approach with expansion with analysis, which resulted in adapting the ECD to address the respect to the action-data, to specify which actions are relevant complexity and intuition that were built into the tasks, expressed and should be  included in the Evidence model and in what as an expansion on one of the three models in the CAF. way (i.e., expansion on the scoring rules or both scoring and While in the first two examples of Conrad et  al. (2014) and Task model). Such an attempt was done by Grover et al. (2017). Grover et  al. (2017), the revised ECD focused on how to Grover and her colleagues expanded on the scoring rules by encode the task data to feed into the Evidence model, Feng employing data driven techniques (e.g., clustering, pattern et  al.’s (2009a) study goes further, suggesting a pedagogical recognition) in addition to theory-based hypotheses, to guide model that is feeding and being fed by all three CAF models  – the definition of the scoring rules. Another interesting variation Proficiency, Evidence, and Task. However, this pedagogical is the experiment-centered design by Conrad et  al. (2014), model seems somewhat like a “black box” that retroactively which illustrated an expansion on the scoring and the Task includes the intuitions that specified the product design (e.g., model. This approach uses an ECD-like process to simultaneously how hints and scaoff lds were determined). Additionally, it encode actions of players in one way for game design and neither specifies the nature of the connections with the original another way for assessment design. Because the game design ECD models nor does it inform how to design a learning dictates feedback on actions, and subsequent game options product from scratch (i.e., a principled approach to development). may depend on student’s actions, the game designer needs to We oer a co ff mprehensive expansion of the ECD framework, encode the actions differently than a researcher or an assessment such that learning aspects are specified for each of the three designer, who is primarily interested in estimating whether a models in the CAF and are determined a priori to the system student possesses the focal skill. In this procedure, the model design. We  describe the expanded full CAF first, followed by is first postulated around the task (experiment), and then a focus on each expanded model with examples. We  then applied separately as two models (versions), one for the game discuss the Assembly model, which allows for the specification designer, and one for the researcher, each focused on a different of the relationship between assessment and learning. We conclude encoding of student actions. However, there is only one Evidence with ramifications of the expanded framework for the model for inferring KSAs, derived from the researcher’s version development of adaptive systems. We include examples to better of the task encoding (the assessment variant scoring rule). In illustrate the general ideas, along with directions for alternative this way, the adaptation of the ECD allowed adding the decisions, to emphasis the generalizability of the assessment as a “layer” on top of the game design (stealth expanded framework. assessment), while ensuring coordination between these two layers. Work by Feng et  al. (2009a) is particularly relevant in this THE EXPANDED ECD MODEL context. The authors examine an adaptation of the ECD for learning data (ECDL), applied retroactively to the ASSISTments In our expanded ECD framework (e-ECD), we find it necessary data (Heffernan and Heffernan, 2014 ). The ECDL is an ECD to expand on all three Student/Proficiency, Evidence, and Task with an augmented pedagogical model, which has links to all models. We  do so by adding a learning layer, in parallel to three models of the CAF (Proficiency, Evidence, and Task). the assessment layer. This learning layer can be  viewed as a The pedagogical model refers to the learning and learners’ breakdown of a pedagogical model (Feng et  al., 2009a) to characteristics, including learning effectiveness and efficiency three components, the conceptual (student/proficiency), (e.g., reducing cognitive load, increasing difficulty gradually behavioral (task), and statistical (evidence) components. u Th s, during presentation, adapting the presentation of content, and each original ECD model now has an additional paired learning decomposing multistep problems to sub-steps), as well as model, culminating in six models. We  call each assessment- learner engagement factors. Since ASSISTments was initially learning pair an expanded model (e-model), i.e., the e-Proficiency developed without ECD in mind, the analysis retroactively model, the e-Task model, and the e-Evidence model (see checks which claims can support a validity argument that Figure  2). Note that we  refer to the original Proficiency model an item with its hints and scao ff lds serves the learning goal. as the KSA model (Knowledge, Skills, and Ability), which is This is done by identifying (within each item) the KSAs now part of the e-Proficiency model. required to answer it correctly, tagging each as “focal” or Within each e-model, we  denote an “observational” layer for “unfocal.” The focal KSAs are the ones which the hints/scao ff lds the assessment aspect (these are the original ECD models with should address. The relation between the focal and unfocal slight title change; the KSA model, Task model, and Observational- also serves as an indication of the system’s efficacy [a system Evidence model) and a “transitional” layer for the learning aspect with a high proportion of unfocal KSAs is less efficient than (these are the new models that address learning). The three a system with a low proportion, because this reflects the new learning models include the following: (1) at the conceptual Frontiers in Psychology | www.frontiersin.org 4 April 2019 | Volume 10 | Article 853 Arieli-Attali et al. Expanded Evidence-Centered Design FIGURE 2 | Expanded ECD (e-ECD) for learning and assessment systems. latent level and part of the e-Proficiency model – the transitional learning and assessment system, the approach is different. layer specifies learning processes as the latent competency that Detecting a “weakness” in knowledge is a trigger to foster learning. the system targets. We  denote it as the KSA-change model; How should the system aim at facilitating learning? There are (2) at the behavioral level and part of the e-Task model – the several different options, from providing customized feedback transitional layer specifies principles and features of learning and hints on how to answer that specific item, presenting support that guides the design of tasks (customized feedback, scaoff lds for the steps required or eliciting prior knowledge scao ff lds, hints, solved examples, solution, or guidance to digital that is needed to answer that item, addressing specific instructional content such as animation, simulation, games, and misconceptions that are known to be  prevalent for that specific videos). We  denote it as the Task-support model; and (3) at node of KSA, up to re-teaching the topic and showing worked the statistical level and part of the e-Evidence model – the examples, and/or presenting similar items to practice the skill. transitional layer specifies the links between the learner’s support In many learning products today, this process of defining the usage and the target learning processes, to allow inferring from learning options is conducted using content experts according behaviors to latent learning (e.g., the efficiency of the support to implicit or explicit learning goals. Using a principled approach used in achieving learning). The data could be  large process to development will dictate that the definition of the options data and may reveal behavior patterns that were not identified for learning should be  explicitly articulated at the level of the by the human expert in the original e-Proficiency model. In Task-support model, and these features are to be  in line with this framework, the e-Proficiency model and the e-Evidence the explicit conceptual learning/pedagogical model that describes model are supposed to “learn” in real time (be updated) with how to make that shift in knowledge, i.e., the KSA-change the new knowledge inferred from the data. We  denote it as model. The links between the supports and the conceptual the Transitional-Evidence model. KSA-change are defined in the Transitional-Evidence model We include also an expansion on the Assembly model, via statistical models, which provide the validity learning denoted e-Assembly model. In addition to determining the argument for the system. number and mix of tasks, the e-Assembly model also includes In the development of an assessment system that blends the specification about the relationship between the assessment learning, we  wish to help students learn, and to validate the component and the learning component of the system and claim that learning occurred, or that the system indeed helped determines how they all work together. In other words, the with the learning as intended. e Th KSA-change specifies the assembly model determines the “structure” of the system, e.g., type of changes (learning/transitions) the system is targeting, when and how learning materials appear and when and how and based on that, the tasks and the task supports are defined. assessment materials appear, and the rules for switching between In other words, the first step is to define the “learning shifts” the two. or how to “move” in the KSA model from one level/node to Consider the following situation: a student is using a system the next. The next step is to define the observables that need for learning and assessment to learn and practice scientific to be  elicited and the connections between the learning shifts reasoning skills. At some point, the student gets an item wrong. and the observables. We  elaborate on each of the expanded In a regular assessment system, another item will follow (oen ft models below. without any feedback about the correctness of the response)  – Our expanded framework shows how to incorporate a learning and if the system is an adaptive testing system, the student theory or learning principles into the ECD and can be  applied will receive an easier item, but not necessarily with the same using different learning approaches. We  illustrate this process content as the item with the incorrect response. In a blended by using examples from Knowledge-Learning-Instruction Frontiers in Psychology | www.frontiersin.org 5 April 2019 | Volume 10 | Article 853 Arieli-Attali et al. Expanded Evidence-Centered Design (Koedinger et  al., 2012) among others, but this process can Specifying and explicitly articulating the latent learning be  applied using other learning approaches (and we  provide processes and progressions that are the target of the learning some directions). is a crucial step, since this is what will guide the specification of both the e-Task model and the e-Evidence model. In the Expanded Proficiency Model following sections, we  elaborate and illustrate the KSA and In the ECD framework, the Student/Proficiency model defines KSA-change models that constitute the e-Proficiency Model. the Knowledge, Skills, and Ability (KSA) that the assessment is targeting. Although in early publications of the ECD, it is The Assessment Layer of the e-Proficiency called a Student model, in recent contexts, it is called a Model – The KSA Model “Proficiency model” (e.g., Feng et al., 2009a; Almond et al., 2015), A KSA model includes variables that are the features or attributes or referred to as a “Competency model” (e.g., Arieli-Attali and of competence that the assessment is targeting. The number Cayton-Hodges, 2014; Kim et  al., 2016), and it can also of variables and their grain size are determined by the potential be  perceived as a “Construct map” (Wilson, 2009). A similar use of the assessment, and it can range from 1 (e.g., the θ notion in the field of Intelligence Tutoring Systems is a “Domain in college admission tests such as the GRE, SAT, and ACT) model” (Quintana et al., 2000), a “Knowledge model” (Koedinger to several subskills arranged in a map or a net (e.g., a net et  al., 2012; Pelánek, 2017), or a “Cognitive model” (Anderson example, see Mislevy et  al., 1999; a math competency map, et  al., 1995). In the Intelligence Tutoring Systems’ literature, see Arieli-Attali and Cayton-Hodges, 2014; two versions of a the term “Student model” is reserved to a specific map of game-based physics competency model, see Kim et  al., 2016). skills as estimated for a particular student – which is an overlay es Th e variables can be  derived by conducting a cognitive task on the domain model (aka the expert model). Within ECD, analysis of the skill by experts, analyzing the content domain, the Student/Proficiency model includes both the desired skills or relying on a theory of knowledge and research findings. (that an expert would possess) and the updated level of skills e Th variables and their interconnections create a map in which for each particular student following responses on assessment each variable is a node connected by a link with other nodes items. To avoid confusion, within our expanded ECD, we  refer (variables). Following analysis of data from student responses to it by the general name of a KSA model. (and using the statistical models), values on these variables The KSAs are assumed to be  latent, and the goal of the define the level of mastery or the probability that a particular assessment is to infer about them from examinee’s responses student possess those particular sub-skills (nodes), i.e., a value to test items. When the assessment tool is also intended to will be  attached to each node. facilitate learning (i.e., the system provides supports when the As part of our development of a learning and assessment student does not know the correct answer), the assumption system, called the Holistic Educational Resources & Assessment is that the student’s level of KSA is changing (presumably (HERA) system for scientific thinking skills, we  developed a becoming higher as a result of learning). In the e-ECD, KSA model for data interpretation skill. Figure 3 depicts part we define a “KSA-change model” that together with the original of the model. Specifically, we  distinguish three main skills of KSA model creates the expanded-Proficiency model data interpretation depending on the data representation (Table (e-Proficiency model). The KSA-change model specifies the Reading, Graph Reading, and the skill of interpreting data from latent learning processes that need to occur in order to achieve both tables and graphs), and each skill is then divided to several specific nodes in the KSA model. Each node in the KSA subskills. For example, in Table Reading skill, we  distinguish model should have a corresponding learning-model in the between locating data points, manipulating data, identifying trend, KSA-change model, which may include prerequisite knowledge and interpolation and extrapolation. Note that these same subskills and misconceptions, and/or a progression of skills leading up (albeit in a different order) appear also under Graph Reading to that KSA node, with the pedagogical knowledge of how skill, but they entail different cognitive ability. The skill of to make the required knowledge-shift. Some examples of Tables and Graphs includes comparing, combining, and translating learning models are learning progressions (National Research information from two or more different representations. Council (NRC), 2007; e.g., Arieli-Attali et al., 2012) a Dynamic Although KSA models oen s ft pecify the links between nodes, Learning Map (Kingston et  al., 2017), or learning models and may even order the skills in a semi-progression (from based on the body of work on Pedagogical Content Knowledge basic to more sophisticated skills) as in the example of the (Posner et  al., 1982; Koehler and Mishra, 2009; Furtak et  al., HERA model in Figure 3, a knowledge model oen do ft es not 2012). The importance of Pedagogical Content Knowledge is specify how to move from one node to the next, nor does it in considering the interactions of content information, pedagogy, explicitly define learning processes. To that end we  add the and learning theory. Another approach from the learning learning layer in the e-Proficiency model – the KSA-change model. sciences and artificial intelligence is the Knowledge-Learning- Instruction framework (KLI; Koedinger et  al., 2012), which The Learning Layer in the e-Proficiency provides a taxonomy to connect knowledge components, learning processes, and teaching options. We  will illustrate our Model – The KSA-Change Model KSA-change model specification using the KLI framework, Defining a learning layer within the e-Proficiency model makes but we  will define the e-Proficiency model in a general way room for explicit articulation of the learning processes targeted such that any other learning theory can be  applied instead. by the learning and assessment system. The idea is for these Frontiers in Psychology | www.frontiersin.org 6 April 2019 | Volume 10 | Article 853 Arieli-Attali et al. Expanded Evidence-Centered Design FIGURE 3 | The KSA model for the HERA system for scientific reasoning skills. specifications to be  the result of purposeful planning, rather skill (node). For example, Kingston and his colleagues (Kingston than a coincidental outcome of system creation. In the Intelligence et  al., 2017) developed Dynamic Learning Maps in which each Tutoring literature, developers consider what they call the of the target competencies are preceded with three levels of “Learner model” (Pelánek, 2017) or the “Educational model” precursor pieces of knowledge (initial precursor, distal precursor, (Quintana et al., 2000) or more generally, processes for knowledge and proximal precursor) and succeeded by a successor piece acquisition (Koedinger et  al., 2012). This model can also of knowledge, together creating what they called “Linkage be  viewed as the “pedagogical model” and apply principles of levels.” When defining the sequence of precursors attention Pedagogical Content Knowledge (Koehler and Mishra, 2009; should be  given to the grain size, as well as to specific features Furtak et al., 2012). We call this model the “KSA-change Model” or attributes of these precursors. In KLI terminology (Koedinger for generalizability and to keep the connection with the original et  al., 2012), this would mean to characterize the Knowledge KSA model, with the emphasis on the change in KSA. Using Components of the subskills. Some Knowledge Components the title “change” makes room also for negative change (aka are: fact, association, category, concept, rule, principle, plan, “forgetting”), which albeit not desirable, is possible. schema, model, production; and whether it is verbal or A KSA-change model is the place to incorporate the specific non-verbal, declarative or procedural; or integrative knowledge learning theory or learning principles (or goals) that are at (2) the second step is to characterize the learning sequence the basis of the systems. Similar to the way a KSA map is by which kind of learning process is required to achieve the created, the KSA-change map should specify the learning aspects learning. For example, applying the KLI taxonomy (Koedinger of the particular skills. Here we  provide a general outline for et  al., 2012), we  can assign to each precursor (knowledge how to specify a KSA-change model, but in each system this component) a specific learning process that is presumed to process may take a different shape. make the desired knowledge shift. The KLI framework A KSA-change model may include variables of two types: characterizes three kinds of learning processes: memory and fluency building , induction and refinement , and understanding 1. Sequences of knowledge components, features or attributes and sense-making. Specifying which kind of process is needed 2. Learning processes within each sequence in the particular learning sequence is necessary for subsequent decisions about the supports to be  provided. For example, if es Th e two types of variables define the learning sequences and processes that are needed to facilitate learning. e Th the focal learning process is fluency building , this implies that the learning system should provide practice opportunities for KSA-change variables are derived directly from the KSA model, such that each node/skill in the KSA model has a reference that KSA. In contrast, if the focal learning process for a different KSA is understanding and sense making, then the learning in the KSA-change model in the form of how to “move” students to learn that skill. system should provide explanations and examples. Figure 4 illustrates a general e-Proficiency model with an artificial example Given a specific skill (node in the map), this may be  done in two stages: (1) the first step is to define the (linear) sequence of adding-on the learning processes to a knowledge sequence built off of three prerequisites and a successor piece. of pre-requisites or precursors needed to learn that target Frontiers in Psychology | www.frontiersin.org 7 April 2019 | Volume 10 | Article 853 Arieli-Attali et al. Expanded Evidence-Centered Design FIGURE 4 | A general diagram of the e-Proficiency model (the orange node in the KSA model is specified in the KSA-change model for learning sequence and learning processes). Similarly, we can construct a sequence for each of the other nodes (the blue, pink, and red nodes). Applying the above approach to the HERA learning and the subskills in the sequence. This is the learning model for assessment system, let us focus on the subskill of interpolation the specific skill in focus, and is usually derived based on and extrapolation from data in a graph (the last red circle expert analysis. The model in Figure 5 also specifies particular in the progression of Graph Reading skill in Figure 3). Based misconceptions that students oen exhi ft bit at each level. on our guidelines above, the first step would be  to determine Specifying misconceptions may also help determine which a sequence of subskills/precursors and to characterize them, feedback and/or learning aid to provide to students. We  show and then as a second step to specify the cognitive process(es) in the next section how to define Task and Task-support models that would make the transition from one subskill to the next. based on this example. Figure 5 presents one section of the KSA-change of the HERA er Th e are several decisions that are taken as part of the system for the subskill of interpolation and extrapolation in model specifications. One of them is the grain-size of each a graph. The model specifies the proximal, distal, and initial precursor. An alternative KSA-change model can be determined precursors as follows: the proximal precursor  =  identifying with smaller or larger grain size subskills. Another decision the rate of change in the dependent variable (y-variable) as is whether to adopt a three-level precursor skill structure, or the independent variable (x-variable) changes; distal alternatively focus on only one precursor and the different precursor  =  being able to locate the y-value for a certain misconceptions students may have. Researchers and developers x-value point on a graph, and find adjacent points and compare are encouraged to try different approaches. the relative values; initial precursor  =  understanding that the We propose to derive the KSA-change variables by conducting two variables in a graph are co-related. Now applying the a learning process analysis by experts, i.e., an analysis of the KLI knowledge components characterization, the proximal pedagogical practices in the content domain or relying on a precursor (identifying rate of change) may be  characterized theory of learning in that domain, similar to the way we illustrated as “rule”; the distal precursor (locate points and compare) above (by using the KLI taxonomy). This is also parallel to as “schema”; and the initial precursor (two variables are the way a KSA model is derived based on cognitive task analysis co-related) as a “concept.” or domain analysis. The KSA-change model constitutes a collection Next, we  determine the cognitive processes that foster the of sequences (and their processes), each addressing one node transition from one subskill to the next. For example, given in the KSA model (as illustrated in Figures 4, 5). This can an understanding of the co-variation of x and y (the initial also be  viewed as a two-dimensional map, with the sequences subskill) students need to practice finding the y-points for as the second dimension for each node. different x-points to create the mental schema and build fluency Similar to updating the KSA model for a student, here with locating points and particularly two adjacent points. too, following analysis of data from student responses and However, to “jump” to the next step of identifying the trend student behaviors in using the learning supports, values on and the rate of change requires induction and refinement to the KSA-change variables indicate level or probability that a derive the rule. The last transition from identifying rate of particular student has gone through a particular learning process change to perform interpolation & extrapolation requires sense (or that a particular knowledge shift was due to the learning making and deduction – deducing from the rule to the new support used). We  will discuss this in more detail in the situation. Given the specific learning processes, we  can later e-Evidence model section. define which learning supports would be  most appropriate (e.g., practice for fluency building, worked example and Expanded Task Model comparisons for induction, and explanation for sense making In the original ECD framework, the Task model specifies the and deduction). The model in Figure 5 shows the different features of tasks that are presumed to elicit observables to learning processes as the transitions (arrows) required between allow inference on the target KSA. An important distinction Frontiers in Psychology | www.frontiersin.org 8 April 2019 | Volume 10 | Article 853 Arieli-Attali et al. Expanded Evidence-Centered Design FIGURE 5 | A specification diagram of the KSA-change model for one node/skill of interpolation/extrapolation in a graph in the HERA’s KSA-model. introduced in ECD is between a task model design based on explanations, or guidance to further tailored instruction derived a Proficiency model and a task-centered design ( Mislevy et  al., from the latent learning processes specified in the KSA-change 1999). While in task-centered design, the primary emphasis model. In other words, the supports are determined according is on creating the task with the target of inference defined to the focal knowledge change. We  elaborate and illustrate only implicitly, as the tendency to do well on those tasks, in on Task and Task-support models below. defining a task model based on a Proficiency (and Evidence) model, we make the connections and possible inferences explicit The Assessment Layer Within the e-Task from the start, making the design easier to communicate, easier Model – The Task Model to modify, and better suited to principled generation of tasks e T Th ask model provides a framework for describing the (Mislevy et  al., 1999, p.  23). Moreover, basing a task model situation in which examinees are given the opportunity to on Proficiency and Evidence models allows us to consider exhibit their KSAs, and includes the specifications of the stimulus reliability and validity aspects of task features, and particularly materials, conditions and affordances , as well as specifications the cognitively or empirically based relevance of the task for the work product (Mislevy et  al., 1999, p.  19). The features. In other words, considerations of item reliability and characteristics of the tasks are determined by the nature of validity guide the development of items to elicit the target the behaviors that provide evidence for the KSAs. Constructing observables and only them (minimizing added “noise”). This a Task model from the latent KSA model involves considering means that at the development stage of a task, all features of the cognitive aspect of task behavior, including specifying the the task should stand to scrutiny regarding relevance to the features of the situation, the internal representation of these latent KSA. As mentioned above, if reading ability is not features, and the connection between these representations and relevant as part of the mathematics KSA, items or tasks that the problem-solving behavior the task targets. In this context, may impede students with lower reading skills should be avoided. variables that aeff ct task difficulty are essential to take into uTh s, defining a task model based on a Proficiency model account. In addition, the Task model also includes features of resembles the relationship between the latent trait and its task management and presentation. manifestation in observable behavior. The more the task relates Although the Task model is built off of the Proficiency to the target KSA, the better the inference from the observable model (or the KSA model in our notation), multiple Task to the latent KSA. models are possible in a given assessment, because each For assessment precision purposes per-se, there is no need Task model may be employed to provide evidence in a different to provide feedback to students; on the contrary, feedback form, use different representational formats, or focus evidence can be  viewed as interference in the process of assessment, on different aspects of proficiency. Similarly, the same Task and likewise scaffolds and hints introduce noise or interference model and work product can produce different evidence; i.e., to a single-point-in-time measurement. However, when the different rules could be  applied to the same work product, assessment tool is also intended for learning, the goal is to to allow inferences on different KSAs. uTh s, it is necessary support learners when a weakness was identified, in order to define within each Task model the specific variables to to help them gain the “missing” KSA. In the e-ECD we define be  considered in the evidence rules (i.e., scoring rules; a “Task-support model” that together with the original Task we  elaborate on this in the next section). model creates the expanded-Task model (e-Task model). The Consider the abovementioned KSA from the HERA model: Task-support model specifies the learning supports that are “Perform an extrapolation using data from a graph.” As part necessary and should be  provided to learners in order to of a scientific reasoning skills assessment, this skill is defined achieve KSA change. Similar to basing the Task model on in a network of other skills related to understanding data the KSA model, the Task-support model is based on the representations, as seen in Figure 5. One possible Task model KSA-change model. The supports may include customized can be: “Given a graph with a defined range for the x-axis feedback, hints and scao ff lds, practice options, worked examples, variable [a,b] and y values corresponding to all x values in Frontiers in Psychology | www.frontiersin.org 9 April 2019 | Volume 10 | Article 853 Arieli-Attali et al. Expanded Evidence-Centered Design the range, find the y-value for an x-value outside the range.” The Learning Layer Within the e-Task aTh t is, we  present the learner with a graph (defined by its Model – The Task-Support Model x- and y- axes) and a function or paired coordinates (x, y) Tasks for assessment and tasks for learning differ in the for a limited domain. The question then asks learners to predict availability of options that support learning. When we  design the y-value of an x point which is outside the domain presented tasks for learning, we  need to consider the type of “help” or in the graph. Because extrapolation assumes the continuation “teaching” that the task ao ff rds, with the same level of rigor of the trend based on the relationship between variables, a that we  put into the design of the task itself. The Task-support required characteristic of the question is to include this model thus specifies the learning supports that might assumption, explicitly or implicitly via the context (e.g. stating be  necessary and should be  provided to students in order to other variables do not change, or the same experimental achieve the desired KSA-change (i.e., increase in KSA). Similar procedure was used for a new value). Articulating the assumption to basing the task model on the KSA model, the Task-support is part of the Task model. Another option for an extrapolation model is based on the KSA-change model. Task model could be: “Given a graph with two levels of the Making room for the specification of the task support dependent variable, both showing a linear relationship with in  connection to the learning processes/goals (the focal the x-variable (i.e., same relationship trend) but with different KSA-change) is the innovative core of the proposed e-ECD slopes, find the y-value for a third level of the dependent and its significant contribution to the design of learning and variable.” That is, we  present the learner with a graph with assessment systems. Many learning systems include scaoff lds two linear relationships (two line-graphs), one for level a and or hints to accompany items and tasks, oen det ft ermined by one for level b (for example, a, b are levels of weight of content experts or teacher experience and/or practices. These different carts, and the linear relationship is between speed hints and scao ff lds help answer the particular item they accompany, and time). The question then asks learners to predict the and may also provide “teaching,” if transfer occurs to subsequent y-value for level c (c  >  a, b; larger weight car) for an x- point similar items. However, in the design process of the hints and for which we  know the y-values of level a and b; that is, scao ff lds, oen ft no explicit articulation is made regarding the extrapolation beyond the data presented. This Task model is intended effect of hints and scaoff lds beyond the particular more sophisticated than the first one, due to the complexity question, or in connection to the general learning goals. Often, of the data representation, and thus is tapping into a higher the hints or scao ff lds are task-specific; a breakdown of the task level of the skill. into smaller steps, thus decreasing the difficulty of the task. Another aspect is the operationalization of the Task model This is also reflected in the approach to assigning partial credit in a particular item. Given a Task model, the question can for an item that was answered correctly with hints, contributing take the form of a direct non-contextualized (what we  may less to the ability estimate (as evidence of lower ability; e.g., also call a “naked”) question, (e.g., asking about a value of y Wang et  al., 2010). Specifying a Task-support model per each given a specific x), or it can be  contextualized (or “wrapped”) Task model dictates a standardization of the scao ff lds and hints within the context and terminology of the graph (e.g., “suppose (and other supports) provided for a given task. How do we specify the researcher decided to examine the speed of a new cart task supports connected to the focal KSA-change? that has greater weight, and suppose the trend of the results If for example, we  define a particular (as part of the observed is maintained, what would you  expect the new result KSA-change model) learning model similar to the one depicted to be?”). The “naked” and “dressed” versions of the question in Figure 5, we  may provide as a task support a “pointer” to may involve change in the difficulty of the item; however, the precursors, in the form of a hint or a scao ff ld. u Th s, the this change needs to be  examined, to the extent that it is scaoff lds are not a breakdown of the question to sub-steps, construct- relevant or irrelevant. If it is construct-relevant, but rather each scao ff ld points to one of the precursor pieces then it should be  included in the Task model as part of the of knowledge (initial, distal, or proximal precursor). In addition, specifications. Other factors may aeff ct the difficulty as well since we  defined the kind of knowledge change between each – the type of graphic (bar-graph, line-graph, multiple lines, precursor, we  can provide the corresponding support per each scatter plot) and the complexity of the relationships between desired change. If the knowledge change is related to memory variables (linear, quadratic, logarithmic, increasing, decreasing, and uen fl cy-building, we  may provide more practice examples one y-variable or more), the familiarity of the context of the instead of the scao ff ld. Similarly, if the knowledge change is task (whether this is a phenomenon in electricity, projectile related to understanding and sense-making, we  may provide motion, genetics, etc.), the complexity of the context (commonly an explanation or reasoning, or ask the student to provide understood, or fraught with misconceptions), the response the explanation or reasoning (self-explanation was found to options (multiple choice, or open-ended), the quality of the be  beneficial in some case, Koedinger et  al., 2012). It may graph and its presentation (easy or hard to read, presented very well be  the case that similar scaoff lds will result from on a computer, smartphone or a paper, presented as a static explicating a Task-support model following an e-ECD compared graph or interactive where learners can plot points), etc. These to not doing so, however in following this procedure, the factors and others need to be  considered when specifying the design decisions are explicit and easy to communicate, justify, Task model, and their relevance to the construct should modify, replicate, and apply in a principled development be  clearly articulated. of scao ff lds. Frontiers in Psychology | www.frontiersin.org 10 April 2019 | Volume 10 | Article 853 Arieli-Attali et al. Expanded Evidence-Centered Design Similarly, other features of task support, such as feedback, terminology, it refers to the proximal precursor (recall: proximal visuals, and links to a video or wiki page, can be  supported precursor  =  identifying the rate of the change in the dependent by the articulation of the KSA-change and the connection variable as the independent variable changes). between the two. e t Th hird type of support that we  oer in a ff n AL-PI item Let us illustrate specifying a Task-support model for the is Teach-me. The Teach-me option in this case includes the example item from HERA described in the previous section. following components: (1) a general statement about the skill; Recall that the item targeted the latent KSA “Perform an i.e., a graph presents data for a limited number of values, yet extrapolation using data from a graph,” and the task materials we  can estimate or predict about new values based on the trend included a graph with a specified function, asking students in the data presented; (2) an explanation of how to identify to extrapolate a point beyond the given range (i.e., predict the trend in a graph, i.e., locating adjacent points; and (3) an the value of y for a new x-value). Also, recall Figure 5 that illustration of how once the trend was identified, we  can depicts the KSA-change model for this particular subskill. Given perform extrapolation. the proximal, distal, and initial precursors, we  can now specify In our system we  provide an illustration on a different each scao ff ld to address each of these three precursor skills. value than the one in the question in order to avoid revealing Alternatively, we can decide to address only the closest precursor the correct answer and leaving room for the learner to put (the proximal) as a scao ff ld, and if that does not help with mental effort into applying the method taught. In the Task- answering the question correctly, then refer the student to support model terminology and in relation to the KSA-change “learn” the more basic material (e.g., in a different section of model, the Teach-me option addresses all three precursors. the system, or by presenting items/content that target the initial Specifying the task support based on the learning goal and and distal precursor skills). These decisions depend on the the desired change in KSA gives direction but does not limit system design (e-Assembly model) and may vary from system the options. On the contrary, it enriches the space of the to system. decision and opens-up new options. In addition, constructing As part of our development of the HERA system for scientific task support by following the e-ECD framework gives rise to thinking skills, we  developed an item model that can be  used the hypothesis that this way of structuring scao ff lds may enhance to collect evidence for both assessment and learning, termed transfer, because the scaoff lds do not address the particular an Assessment and Learning Personalized Interactive item question, but rather address the latent skill and its precursor (AL-PI). This item looks like a regular assessment item, and skills. Empirical evidence of transfer is of course needed to only aer a ft n incorrect response, the learners are given “learning examine this hypothesis. options” to choose from. We  oer ff three types of learning supports: (1) Rephrase – rewording of the question; (2) Break- Expanded Evidence Model it-down – providing the first step out of the multi-steps required e lin Th ks made between the e-Proficiency model and the e-Task to answer the question; and (3) Teach-me – providing a text model need explication of the statistical models that allow and/or video explanation of the background of the question. inferences from the work products on the tasks to the latent Figure 6 presents a screenshot of an AL-PI item from a task KSAs. In the ECD framework, the Evidence model specifies about height-restitution of a dropped-ball, targeting the skill the links between the task’s observables (e.g., student work of extrapolation. product) and the latent KSAs targeted by that task (termed Using the terminology above, the Rephrase-option provides here as Observational-Evidence model). The Observational- the learner with another attempt at the question, with the Evidence model includes the evidence rules (scoring rubrics) potential of removing the construct irrelevance that may stem and the statistical models. The Evidence model is the heart of from the item-phrasing (for learners who did not understand the ECD, because it provides the “credible argument for how what the question is asking them, due to difficulty with the students’ behaviors constitute evidence about targeted aspects wording). In this example, a Rephrase of the question is: “e Th of proficiency” ( Mislevy et  al., 1999, p.  2). question asks you  to find the “Height attained” (the y-value) In a system designed for learning, data other than the work for a new x-value that does not appear on the graph” (see product is produced, i.e., the data produced out of the task Figure 6 upper panel). Note that the Rephrase is practically support (e.g., hints and scao ff lds usage), which may be  called “undressing” (decontextualizing) the question, pointing out the process data. The task support materials are created to foster “naked” form, or making the connection between the context learning; thus, learning systems should have a credible argument and the decontextualized skill. that these supports indeed promote learning. Partial evidence e s Th econd learning support is Break-it-down which takes for that can be  achieved by inferences about knowledge or the form of providing the first step to answer the question. what students know and can do from their work product in In the example in Figure 6 the Break-it-down states: “e Th the system, following and as a result of the use of the supports, first step to answer this question is to evaluate the rate of and this can be  obtained by the statistical models within the change in y as a function of a change in the x-variable” with Evidence model. However, the efficacy of the task supports additional marks and arrows on the graph to draw the leaner’s themselves (i.e., which support helps the most in which case), attention where to look. The Break-it-down option may look and drawing inferences from scao ff lds and hint usage about like a hint, signaling to learners where to focus, and in our “learning behavior” or “learning processes” (as defined in the Frontiers in Psychology | www.frontiersin.org 11 April 2019 | Volume 10 | Article 853 Arieli-Attali et al. Expanded Evidence-Centered Design FIGURE 6 | An example of an Assessment & Learning Personalized & Interactive item (AL-PI item) from the HERA system. KSA-change model) may need new kind of models and evidence. which a score of 1 or 0 is obtained corresponding to a correct e T Th ransitional-Evidence model within the e-Evidence model or incorrect response. In other cases, the scoring rule might addresses the data produced from the task support. be  more complex, as in performance assessment where student responses produce what we  call “process data” (i.e., a log file The Assessment Layer Within the Evidence of recorded actions on the task). A scoring rule for process Model – The Observational-Evidence Model data can take the form of grouping a sequence of actions into In the original ECD, the Observational-Evidence model addresses a “cluster” that may indicate a desired strategy, or a level on the question of how to operationalize the conceptual target a learning progression that the test is targeting. In such an competencies defined by the Proficiency model, which are example, a scoring rule can be  defined such that a score of essentially latent, in order to be  able to validly infer from 1 or 0 is assigned corresponding to the respective strategy overt behaviors about those latent competencies. The employed, or the learning progression level achieved. Of course, Observational-Evidence model includes two parts. The first scoring rules are not confined to dichotomous scores and they contains the scoring rules, which are ways to extract a “score” can also define scores between 0 and 1, continuous (particularly or an observable variable from student actions. In some cases, when the scoring rules relies on response time) or ordered the scoring rule is simple, as in a multiple-choice item, in categories of 1-to-m, for m categories (polytomous scores). Frontiers in Psychology | www.frontiersin.org 12 April 2019 | Volume 10 | Article 853 Arieli-Attali et al. Expanded Evidence-Centered Design e s Th econd part of the Observational-Evidence model contains rules with those that are learned from the data. In particular, the statistical model. The statistical model expresses how the the supervised algorithms – methodologies used in machine scores (as defined by the scoring rules) depend, probabilistically, learning (ML) – can be  useful for identifying patterns in the on the latent competencies (the KSAs). This dependency is complex logfile data. These algorithms classify the patterns by probabilistic, that is, the statistical model defines the probability skills using a training data set that contained the correct or of certain “scores” (observables) given specific latent competencies theory-based classification. The word supervised here means (combination of values on the KSAs). In other words, at the that the “correct responses” were defined by subject-matter point in time at which the student is working within the experts and that the classification algorithm learns from these system, that student is in a “latent state” of knowledge, and data that were correctly classified to extrapolate to new given that latent state, there is a certain probability for the data points. observable variables, which if observed, are evidence for the In a learning and assessment system, the Observational- latent ability. However, all we  have are the student observable Evidence model may also take into account the scao ff lds and variables, and what we  need is a psychometric model that hints usage to infer about the KSA model. Since the scao ff lds allows us to do the reverse inference from the given observables and hints reduce the difficulty of the items/tasks, they also to the latent competencies. change their evidentiary value of the observables. This can There are various statistical models that can be  used here. be  done via either using only responses without hint usage Since we are talking about an assessment and learning system, to model KSA or applying a partial credit scoring rule for let us consider a multi-dimensional latent competency, i.e., items that were answered correctly with hints, thus assigning multiple skills are targeted by the system both for assessment them less credit as a reflection of their evidentiary value (e.g., and learning. If we  assume the latent competencies to Wang et  al., 2010; Bolsinova et al., 2019a,b). be continuous, we can use a multi-dimensional Item Response To summarize, any and all statistical models that allow us Theory models (e.g., MIRT; Reckase, 2009) or Bayes-net to define the connection between overt observables and latent models (Pearl, 1988, 2014; Martin and VanLehn, 1995; Chang competencies can be used in the Observational-Evidence model. et  al., 2006; Almond et  al., 2015). In the case where the The Learning Layer Within the Evidence latent competencies are treated as categorical with several increasingly categories of proficiency in each (e.g., low-, Model – The Transitional-Evidence Model medium-, and high-level proficiency, or mastery/non-mastery Similar to the way the Observational-Evidence model connects levels), we  can use diagnostic classification models (DCM; the Task model back to the KSA model, the Transitional- Rupp et  al., 2010b). What these models enable is to “describe” Evidence model uses the task supports data to infer about (or model) the relationship between the latent traits and the learning, and to link back to the KSA-change model. Recall observables in a probabilistic way, such that the probability that the KSA-change model includes pedagogical principles of a certain observable, given a certain latent trait, is defined which are reflected in the task supports. Similar to the assessment and therefore allow us to make the reverse inference – to layer of the Evidence model, the Transitional-Evidence model estimate the probability of a certain level of a latent trait also includes two parts: the scoring rules and the given the observable. statistical models. In order to make the link between the items/tasks (the e s Th coring rules define the observable variables of the stimuli to collect observables) and the latent KSAs, we  can Transitional-Evidence model. If task supports are available by use what is called a Q-matrix (Tatsuoka, 1983). A Q-matrix choice, student choice behavior can be  modeled to make is a matrix of <items  ×  skills> (items in the rows; skills in inferences about their learning strategies. The data from the the columns), defining for each item which skills it is targeting. task supports usage (hints, scaoff lds, videos, simulations, e Th Q-matrix plays a role in the particular psychometric model, animations, etc.) as well as number of attempts or response to determine the probability of answering an item correctly time, should first be  coded (according to a scoring or evidence given the combination of skills (and whether all skills are rule) to define which of them should count and in what way. needed, or some skill can compensate for others; As before, scoring rules can be  defined by human experts or non-compensatory or compensatory model, respectively). The can be  learned from the data. Q-matrix is usually determined by content experts, but it can The statistical models in the Transitional-Evidence model also be  learned from the data (e.g., Liu et  al., 2012). need to be  selected, such that they allow us to infer about Recent developments in the field of psychometrics have change based on observables over time. A popular stochastic expanded the modeling approach to also include models that model for characterizing a changing system is a Markov model are data driven, but informed by theory, and is referred to (cf. Norris, 1998). In a Markov model, transition to the next as Computational Psychometrics (von Davier, 2017). state depends only on the current state. Because the focus Computational Psychometrics is a framework that includes here is on latent competencies, the appropriate model is then complex models such as MIRT, Bayes-net and DCM, which a hidden Markov model (HMM; e.g., Visser et al., 2002; Visser, allow us to make inferences about latent competencies; however, 2011), and specifically an input-output HMM ( Bengio and these models may not define a priori the scoring rules, but Frasconi, 1995). A HMM would allow us to infer about the rather allow for a combination of the expert-based scoring efficacy of the learning supports in making a change in the Frontiers in Psychology | www.frontiersin.org 13 April 2019 | Volume 10 | Article 853 Arieli-Attali et al. Expanded Evidence-Centered Design FIGURE 7 | An input-output hidden Markov model (HMM). latent state (proficiency level). In addition, the input-output How do we  link the learning materials (defined in the Task- HMM will allow us to make the association between learning support model) to the learning processes/goals (defined in the materials (as input) and the change in KSA (latent) based KSA-change model)? Similar to the Q-matrix in the on the observables (output), to estimate the contribution Observational-Evidence model, here too we  need a matrix that (efficacy) of each particular support to the desired change in links the learning materials (task supports) with the associated proficiency (i.e., learning). Figure 7 illustrates this model for skills-change. We  can use an S-matrix (Chen et  al., 2018), a single latent skill (KSA at time t1 and t2), a single observation which is a matrix of <supports  ×  skills> (supports in the (O at time t1 and t2) and a single learning support (l at rows; skills in the columns), defining for each support which time t1 and t2). The observation dependency on the skill skills/process it can improve. In that sense, and similar to the (i.e., O given KSA; the arrow/link from KSA to O) is modeled Q-matrix, an S-matrix is a collection of “evidence” that explicate by the Observational-Evidence model (the model from the the connection between the supports and the desired learning original ECD), while the skill dependency on the learning shifts. For example, providing a worked example is a learning support (i.e., KSA given l; the arrow/link from l to KSA) is support that may be  connected to several knowledge shifts modeled by the Transitional-Evidence model. (corresponding to subskills in the learning models), and providing Working with the above example, let us assume a student opportunities for practice is another learning support that may does not know how to identify a data trend from a graph, be connected to different desired knowledge shifts (corresponding and thus cannot extrapolate a new data point (incorrectly to different subskills). The S-matrix will specify these connections. answers a question that requires extrapolation). Suppose a e S-m Th atrix will then play a role in the HMM, to determine task support is provided, such that it draws the student’s the probability that a particular knowledge shift (learning attention to the pattern and trend in the data. We  now want process) occurred given the particular learning supports. Similar to estimate the contribution of this support in helping the to the Q-matrix, the S-matrix should be  determined by content student learn (and compare this contribution to other task experts, and/or learned or updated from the data. supports). We  have the following observables: the student’s incorrect answer in the first attempt, the student’s use of the particular task support, and the student’s revised answer in THE e-ASSEMBLY MODEL the second attempt (whether correct or not). Using an input- In the original ECD, the Assembly model determines how to output HMM will allow us to estimate the probability of transitioning from the incorrect to the correct latent state put it all together and specifies the conditions needed for obtaining the desired reliability and validity for the assessment. (or in other cases from low proficiency to high proficiency), given the use of the task support. Of course, the model will In other words, it determines the structure of the test, the number and the mix of the desired items/tasks. The Assembly be  applied across questions and students in order to infer about latent state. model is directly derived from the Proficiency model, such that it ensures, for example, the appropriate representation of e a Th bove example of a single latent skill can be  extended to a map of interconnected skills using dynamic Bayesian all skills in the map. Going back to the HERA example and the KSA-model in Figure 3, if we  were to build an assessment network (DBN; Murphy and Russell, 2002). DBN generalizes HMM by allowing the state space to be  represented in a with those target skills, we would have to ensure that we sample items/tasks for each of the skills and subskills specified on the factored form instead of as a single discrete variable. DBN extends Bayesian networks (BN) to deal with changing situations. map, and the Assembly model will specify how much of each. Frontiers in Psychology | www.frontiersin.org 14 April 2019 | Volume 10 | Article 853 Arieli-Attali et al. Expanded Evidence-Centered Design For the expanded ECD, we  do not create a parallel model The learning aspect of the system is motivated by the to the Assembly model as we  did for the three core models, goal to maximize learners’ gain and thus needs a more because in a blended learning and assessment system we  do comprehensive adaptivity, or what is often called not assemble the assessment separately and the learning separately. “recommendation model.” A recommendation model does not Rather, in the process of developing a system, aer w ft e specified only determine the next item to be  presented but it also the six core models of the e-ECD, we  assemble it all together determines which instructional or training material to in what we  call the e-Assembly model. recommend or present to the learner. A good recommendation e Th role of e-Assembly model is to specify how to put it model makes full use of all available information about both all together. It will include the specifications of number and the learner and the instructional materials to maximize the mix of items/tasks, but it will also include how and when to KSA gain for the learner. If we  have a way to estimate present the learning support materials. This can be  seen as (measure) the gain for the learner, we can feed this information determining how to switch between the “assessment” mode to the recommendation engine to determine the adaptivity of the system and the “learning” mode of the system. in the form of the next task support and/or training and e Th e-Assembly model provides an opportunity to take into instructional material needed. Thus, the additional layer of account additional pedagogical principles that are relevant to an evidence model for the learning materials (i.e., the statistical the combination of items and tasks, such as the importance models for estimating the efficacy of the task supports) provides of reducing cognitive load for learning; focusing on one skill a good candidate model for the recommendation engine. at a time; gradual increased difficulty presentation; adaptive Which materials were already used by the learner (which presentation of content, among others. Conditions to ensure ones were chosen/preferred), which supports are found more the validity of the system may also specify pedagogical principles effective for that particular learner, which skill is currently such as learning via real-world authentic tasks or learning by in focus and which supports are most effective for that doing, as well as learner engagement factors, as relevant. particular skill (e.g., practice, explained example, video lecture, Pedagogical Content Knowledge principles that include simulation demonstration, providing instructional material for knowledge of student misconceptions regarding specific a prior/prerequisite skill, etc.) are some of the decisions needed phenomena, if articulated as part of the KSA and KSA-change to be  made by a recommendation engine, and these decisions model, should be also considered here in selecting and designing rely on the statistical models that were used to evaluate and tasks, such that the misconceptions are either accounted for provide evidence for the efficacy of the task support and or avoided so the KSAs can be  validly addressed. instructional materials. e e-A Th ssembly model is also the place to take into account considerations from other relevant approaches, such as the learner-centered design approach (LCD; Soloway et  al., 1994; CONCLUSION AND FUTURE STEPS Quintana et  al., 2000), which argue that student engagement and constructivist theories of learning should be  at the core In this paper, we  propose a new way to fuse learning and of a computerized learning system. Adopting such an approach assessment at the design stage. Specifically, we  propose an will aeff ct the combination and/or navigation through the expanded framework we  developed to aid with the creation of system. For example, the system may guide students to be more a system for blended assessment and learning. We  chose the active in trying out options and making choices regarding their ECD framework as a starting point because this is a comprehensive navigation in the system. and rigorous framework for the development of assessments An important aspect of systems for learning and assessment and underlies the development of tests for most testing is whether they are adaptive to student performance and in organizations. Incorporating learning aspects, both learning goals what way. This aspect within the e-Assembly model ties directly and learning processes, in the ECD framework is challenging, to the e-Evidence model. The statistical models in the Evidence because of fundamental differences in the assumptions and model are also good candidates for determining the adaptive approaches of learning and assessment. Nevertheless, we showed algorithm in adaptive assessments. For example, if a 2PL IRT that the unique structure of Proficiency, Task, and Evidence model is used to estimate ability; this model can also be  used models lends itself to creating parallel models for consideration to select the items in a Computer Adaptive Test (CAT), as of the corresponding aspect of learning within each model. is oen do ft ne in large-scale standardized tests that are adaptive We are currently applying this framework in our work. (e.g., the old version of the GRE). Similarly, if a Bayes-net is In future work, we  hope to show examples of the learning used to estimate the map of KSAs, then the selection of items and assessment system that we  build following the e-ECD or tasks can be  done based on the Bayes-net estimates of framework. We are also working to incorporate other elements skills. Similarly, we  can use the DCM to identify weakness into the framework, primarily the consideration of motivation, in a particular skill and thus determine the next item that meta-cognition, and other non-cognitive skills. Since learners’ targets that particular weakness. This is true for any other engagement is a crucial element in a learning system, we  can model, also including data-driven models, because the purpose think of a way to incorporate elements that enhance engagement of the models is to provide a valid way to estimate KSAs, as part of the assembly of the system, by using reward system and once this is done, adaptivity within the system can or gamification in the form of points, coins, badges, etc. be  determined accordingly. Adding gamification or engagement-enhancing elements into Frontiers in Psychology | www.frontiersin.org 15 April 2019 | Volume 10 | Article 853 Arieli-Attali et al. Expanded Evidence-Centered Design a system does not currently have a designated model within SW and JT contributed to the e-Task model. BD contributed the e-ECD. We  are working to find a way to incorporate to the e-Evidence model. The authors would like to thank the these elements into the framework. reviewers for substantial contribution. AUTHOR CONTRIBUTIONS FUNDING MA-A and AAvD contributed to the conception of the framework. This work has been done as part of a research initiative at MA-A contributed to the conception and specifications of the ACTNext, by ACT, Inc. No external funds or grants supported new models, and AAvD contributed to the CP component. this study. REFERENCES analytics in measuring computational thinking in block-based programming” in Proceedings of the Seventh International Learning Analytics & Knowledge Almond, R. G., Mislevy, R. J., Steinberg, L. S., Yan, D., and Williamson, Conference (ACM). Vancouver, BC, Canada, 530–531. D. M. (2015). Bayesian networks in educational assessment. (New York, NY: Heffernan, N., and Heffernan, C. (2014). The ASSISTments ecosystem: building Springer). a platform that brings scientists and teachers together for minimally invasive Anderson, J. R., Corbett, A. T., Koedinger, K. R., and Pelletier, R. (1995). research on human learning and teaching. Int. J. Artif. Intell. Educ. 24, Cognitive tutors: lessons learned. J. Learn. Sci. 4, 167–207. doi: 10.1207/ 470–497. doi: 10.1007/s40593-014-0024-x s15327809jls0402_2 Kim, Y. J., Almond, R. G., and Shute, V. J. (2016). Applying evidence-centered Arieli-Attali, M., and Cayton-Hodges, G. (2014). Expanding the CBAL design for the development of game-based assessments in physics playground. mathematics assessments to elementary grades: the development of a Int. J. Test. 16, 142–163. doi: 10.1080/15305058.2015.1108322 competency model and a rational number learning progression. ETS Res. Kingston, N. M., Karvonen, M., Thompson, J. R., Wehmeyer, M. L., and Rep. Ser. 2014, 1–41. doi: 10.1002/ets2.12008 Shogren, K. A. (2017). Fostering inclusion of students with significant Arieli-Attali, M., Wylie, E. C., and Bauer, M. I. (2012). “e u Th se of three cognitive disabilities by using learning map models and map-based assessments. learning progressions in supporting formative assessment in middle school Inclusion 5, 110–120. doi: 10.1352/2326-6988-5.2.110 mathematics” in Annual meeting of the American Educational Research Koedinger, K. R., Corbett, A. T., and Perfetti, C. (2012). The knowledge-learning- Association. (Vancouver, Canada). instruction framework: bridging the science-practice chasm to enhance robust Attali, Y., and Arieli-Attali, M. (2014). Gamification in assessment: do points ae ff ct student learning. Cogn. Sci. 36, 757–798. doi: 10.1111/j.1551-6709.2012.01245.x test performance? Comp. Educ. 83, 57–63. doi: 10.1016/j.compedu.2014.12.012 Koehler, M., and Mishra, P. (2009). What is Technological Pedagogical Content Bengio, Y., and Frasconi, P. (1995). “An input output HMM architecture” in Knowledge (TPACK)? Contemp. Issues Technol. Teach. Educ. 9, 60–70. doi: Advances in Neural Information Processing Systems. eds. M. I. Jordan, 10.1177/002205741319300303 Y. LeCun, and S. A. Solla (Cambridge, MA, USA: MIT Press), 427–434. Liu, J., Xu, G., and Ying, Z. (2012). Data-driven learning of Q-matrix. Appl. Bolsinova, M., Deonovic, B., Arieli-Attali, M., Settles, B., Hagiwara, M., Von Psychol. Meas. 36, 548–564. doi: 10.1177/0146621612456591 Davier, A., et al. (2019a). Hints in adaptive learning systems: consequences Luecht, R. M. (2013). Assessment engineering task model maps, task models for measurement. Paper presented at the annual meeting of the National and templates as a new way to develop and implement test specifications. Council of Measurement in Education (NCME). Toronto, Canada. J. Appl. Test. Technol. 14, 1–38. Retrieved from: http://jattjournal.com/index. Bolsinova, M., Deonovic, B., Arieli-Attali, M., Settles, B., Hagiwara, M., Von php/atp/article/view/45254 Davier, A., et al. (2019b under review). Measurement of ability in adaptive Martin, J., and VanLehn, K. (1995). Student assessment using Bayesian nets. learning and assessment systems when learners use on-demand hints. Educ. Int. J. Hum. Comput. Stud. 42, 575–591. doi: 10.1006/ijhc.1995.1025 Psychol. Meas. Mislevy, R. J. (2013). Evidence-centered design for simulation-based assessment. Chang, K. M., Beck, J., Mostow, J., and Corbett, A. (2006). “A Bayes net toolkit Mil. Med. (special issue on simulation, H. O’Neil, Ed.) 178, 107–114. doi: for student modeling in intelligent tutoring systems” in International Conference 10.7205/MILMED-D-13-00213 on Intelligent Tutoring Systems. (Berlin, Heidelberg: Springer), 104–113. Mislevy, R. J., Almond, R. G., and Lukas, J. F. (2003). A brief introduction to Chen, Y., Li, X., Liu, J., and Ying, Z. (2018). Recommendation system for evidence-centered design. ETS Res. Rep. Ser. 2003. Princeton, NJ. doi: adaptive learning. Appl. Psychol. Meas. 42, 24–41. doi: 10.1177/0146621617697959 10.1002/j.2333-8504.2003.tb01908.x Conrad, S., Clarke-Midura, J., and Klopfer, E. (2014). A framework for structuring Mislevy, R. J., Behrens, J. T., Dicerbo, K. E., and Levy, R. (2012). Design and learning assessment in an educational massively multiplayer online educational discovery in educational assessment: evidence-centered design, psychometrics, game – experiment centered design. Int. J. Game Based Learn. 4, 37–59. and educational data mining. JEDM J. Educ. Data Min. 4, 11–48. Retrieved doi: 10.4018/IJGBL.2014010103 from: https://jedm.educationaldatamining.org/index.php/JEDM/article/view/22 Embretson, S. E. (1998). A cognitive design system approach to generating Mislevy, R. J., Steinberg, L. S., and Almond, R. G. (1999). “On the roles of valid tests: application to abstract reasoning. Psychol. Methods 3, 300–396. task model variables in assessment design.” in Paper presented at the Feng, M., Hansen, E. G., and Zapata-Rivera, D. (2009a). “Using evidence centered Conference “Generating Items for Cognitive Tests: Theory and Practice” design for learning (ECDL) to examine the ASSISTments system” in Paper (Princeton, NJ). presented in the annual meeting of the American Educational Research Mislevy, R. J., Steinberg, L. S., Almond, R. G., and Lukas, J. F. (2006). “Concepts, Association (AERA). (San Diego, California). terminology, and basic models of evidence-centered design” in Automated Feng, M., Heffernan, N. T., and Koedinger, K. R. (2009b). Addressing the scoring of complex tasks in computer-based testing. eds. D. M. Williamson, assessment challenge in an intelligent tutoring system that tutors as it assesses. R. J. Mislevy, and I. I. Bejar (New York, NY), 15–47. J. User Model. User Adapt Interact. 19, 243–266. doi: 10.1007/s11257-009-9063-7 Murphy, K. P., and Russell, S. (2002). Dynamic Bayesian networks: Representation, Furtak, E. M., Thompson, J., Braaten, M., and Windschitl, M. (2012). “Learning inference and learning. Doctoral dissertation. Berkeley: University of California. progressions to support ambitious teaching practices” in Learning progressions Available at: https://www.cs.ubc.ca/~murphyk/Thesis/thesis.html (Accessed in science: Current challenges and future directions. eds. A. C. Alonzo, and November 4, 2019). A. W. Gotwals (Rotterdam: Sense Publishers), 405–433. National Research Council (NRC) (2007). Taking science to school: Learning Grover, S., Bienkowski, M., Basu, S., Eagle, M., Diana, N., and Stamper, J. (2017). and teaching science in grades K-8. (Washington, DC: The National “A framework for hypothesis-driven approaches to support data-driven learning Academies Press). Frontiers in Psychology | www.frontiersin.org 16 April 2019 | Volume 10 | Article 853 Arieli-Attali et al. Expanded Evidence-Centered Design Nichols, P., Ferrara, S., and Lai, E. (2015). “Principled design for efficacy: Shute, V. J., Hansen, E. G., and Almond, R. G. (2008). You can’t fatten A hog design and development for the next generation tests” in e n Th ext generation by weighing It–Or can you? evaluating an assessment for learning system of testing: Common core standards, smarter-balanced, PARCC, and the nationwide called ACED. Int. J. Artif. Intell. Edu. 18, 289–316. https://content.iospress. testing movement. ed. R. W. Lissitz (Charlotte, NC: Information Age Publishing), com/articles/international-journal-of-artificial-intelligence-in-education/ 228–245. jai18-4-02 Nichols, P., Kobrin, J. L., Lai, E., and Koepfler, J. (2016). “e r Th ole of theories Soloway, E., Guzdial, M., and Hay, K. (1994). Learner-centered design: the challenge of learning and cognition in assessment design and development” in The for HCI in the 21st century. Interactions 1, 36–48. doi: 10.1145/174809.174813 handbook of cognition and assessment: Frameworks, methodologies, and Straatemeier, M. (2014). Math garden: a new educational and scientific instrument. applications. 1st edn. eds. A. A. Rupp and J. P. Leighton (Massachusetts, Education 57, 1813–1824. PhD thesis. ISBN9789462591257. USA: John Wiley & Sons, Inc.), 15–40. Tatsuoka, K. K. (1983). Rule space: an approach for dealing with misconceptions Norris, J. R. (1998). Markov chains. (New York, NY: Cambridge University Press). based on item response theory. J. Educ. Meas. 20, 345–354. doi: 10.1111/ Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible j.1745-3984.1983.tb00212.x inference. (San Francisco, CA: Morgan Kaufmann Publishers, Inc.). Ventura, M., and Shute, V. (2013). e Th validity of a game-based assessment of Pearl, J. (2014). Probabilistic reasoning in intelligent systems: Networks of plausible persistence. Comput. Hum. Behav. 29, 2568–2572. doi: 10.1016/j.chb.2013.06.033 inference. (Elsevier). Visser, I. (2011). Seven things to remember about hidden Markov models: a Pelánek, R. (2017). Bayesian knowledge tracing, logistic models, and beyond: tutorial on Markovian models for time series. J. Math. Psychol. 55, 403–415. an overview of learner modeling techniques. User Model. User Adap. Inter. doi: 10.1016/j.jmp.2011.08.002 27, 313–350. doi: 10.1007/s11257-017-9193-2 Visser, I., Raijmakers, M. E., and Molenaar, P. (2002). Fitting hidden Markov Posner, G. J., Strike, K. A., Hewson, P. W., and Gertzog, W. A. (1982). models to psychological data. Sci. Program. 10, 185–199. doi: 10.1155/2002/874560 Accommodation of a scientific conception: toward a theory of conceptual von Davier, A. A. (2017). Computational psychometrics in support of collaborative change. Sci. Educ. 66, 211–227. doi: 10.1002/sce.3730660207 educational assessments. J. Educ. Meas. 54, 3–11. doi: 10.1111/jedm.12129 Quintana, C., Krajcik, J., and Soloway, E. (2000). “Exploring a structured Wang, Y., Heffernan, N. T., and Beck, J. E. (2010). “Representing student definition for learner-centered design” in Fourth International Conference of performance with partial credit” in Proceeding of a conference: 3rd International the Learning Sciences. eds. B. Fishman and S. O’Connor-Divelbiss (Mahwah, Conference on Educational Data Mining. eds. R. S. J. d. Baker, A. Merceron, NJ: Erlbaum), 256–263. and P. I. Pavlik Jr. (Pittsburgh, PA, USA), 335–336. Razzaq, L., Feng, M., Nuzzo-Jones, G., Heffernan, N. T., Koedinger, K. R., Wilson, M. (2009). Measuring progressions: assessment structures underlying Junker, B., et al. (2005). “e A Th ssistment project: blending assessment and a learning progression. J. Res. Sci. Teach. 46, 716–730. doi: 10.1002/tea.20318 assisting” in Proceedings of the 12th Artici fi al Intelligence in Education . eds. C. K. Looi, G. McCalla, B. Bredeweg and J. Breuker (Amsterdam: ISO Conflict of Interest Statement: The authors declare that the research was conducted Press), 555–562. in the absence of any commercial or financial relationships that could be construed Reckase, M. D. (2009). “Multidimensional item response theory models” in as a potential conflict of interest. Multidimensional item response theory. ed. M. D. Reckase (New York, NY: Springer), 79–112. Copyright © 2019 Arieli-Attali, Ward, Thomas, Deonovic and von Davier. This is Rupp, A. A., Gushta, M., Mislevy, R. J., and Shaer ff , D. W. (2010a). Evidence- an open-access article distributed under the terms of the Creative Commons Attribution centered design of epistemic games: measurement principles for complex License (CC BY). The use, distribution or reproduction in other forums is permitted, learning environments. J. Technol. Learn. Assess. 8. Retrieved from: https:// provided the original author(s) and the copyright owner(s) are credited and that ejournals.bc.edu/ojs/index.php/jtla/article/view/1623 the original publication in this journal is cited, in accordance with accepted academic Rupp, A. A., Templin, J. L., and Henson, R. A. (2010b). Diagnostic measurement: practice. No use, distribution or reproduction is permitted which does not comply Theory, methods, and applications . (New York, NY: Guilford Press). with these terms. Frontiers in Psychology | www.frontiersin.org 17 April 2019 | Volume 10 | Article 853

Journal

Frontiers in PsychologyPubmed Central

Published: Apr 26, 2019

There are no references for this article.