Print Page   |   Contact Us   |   Report Abuse   |   Sign In   |   Join or Create a Guest Account
LTRC 2018: Symposia (3)
Share |

Repeated test-taking and longitudinal analysis of L2 test score data

William Bonk (Chair); Alistair Van Moere, Tony Green, Yeonsuk Cho, Ian Blood, Sean Hanlon


The measurement and tracking of learning gains over time using standardized assessment data is an under-researched area in second language testing. As Barkaoui (2013) points out, many validation studies examine factors that contribute to variability in single assessment instances, but few investigate variability longitudinally by examining score changes over time. Yet this area requires more attention if we are to understand not only sources of score variability, but also learners’ development of language skills over time.

Various researchers have investigated repeated test-taking under pre/post conditions. Ling, Powers and Adler (2014) analyzed score data from 111 learners who took two TOEFL iBT practice tests with English programs in the intervening 6-month period. They determined that learners improved their proficiency at least moderately, and that more hours spent studying was associated with greater score gains. Learners made gains differentially on the skills emphasized during their course of study – those based in China gained more in receptive skills such as reading, while those studying abroad in the US increased their speaking skills more. Sawaki (2017) investigated whether instructional feedback on reading tasks would improve TOEFL iBT reading scores, but although the 193 participants increased scores on average, there were no effects associated with amount of feedback they received during intervention. However, familiarity with the test itself was identified as a predictor positively related to test score gain.

Other repeated measures studies did not involve experimental pre/post conditions. Green (2005) compared the writing scores of 15,380 candidates who each took IELTS twice over approximately three years. He found that writing scores on test occasion 1 was a better predictor of writing score on test occasion 2 than was the interval of time between the two tests; learners with lower scores on occasion 1 made more rapid gains than learners with higher scores on occasion 1. Similarly, Zhang (2008) investigated scores of 12,385 candidates who repeated TOEFL IBT at least once within a 30-day period. Scores improved slightly on the second testing occasion; reading scores improved the most, while speaking scores improved the least.

These studies highlight the challenge of understanding which factors lead to score changes, as well as their magnitudes. Practice effects, complexity-accuracy-fluency trade-offs, differential growth in skills, and time elapsed between tests may all play a role. Furthermore, if exams are used to quantify learning gains over a course of study, as, for example in the case of APTIS (British Council) or Progress (Pearson), then apparent proficiency gains must be teased apart from measurement error.

This symposium further addresses patterns of change on test performance, and individual differences in change patterns over time. These questions can best be answered when at least three repeated measures are available for each learner. When such data is available multilevel modelling can be employed, which allows for within-person dependency. Thus, rather than treating each learner’s data as an independent snapshot, a learner’s repeated measures are nested within that individual. This method can be used to investigate patterns in longitudinal data over sustained periods of time.


Association Management Software Powered by YourMembership  ::  Legal