Member Login

Automatic Translation


ILTA 2010 Best Article Award

Chapelle, Carol, Chung, Yoo-Ree, Hegelheimer, Volker, Pendar, Nick & Xu, Jing. 2010. Towards a computer-delivered test of productive grammatical ability. Language Testing 27(4) 443–469.

All authors are from Iowa State University in the USA.

The study by Chapelle, Chung, Hegelheimer, Pendar and Xu explored the utility of findings from research on second language acquisition - more precisely, from research on grammatical development - for the design of a test of productive grammatical ability in English as a second language. Findings on the development of morphosyntactic, syntactic, functional knowledge were synthesized into a framework of grammatical features, which was then used as the basis for designing items measuring productive grammatical ability. The study employed paper-and-pencil versions of items but the authors’ aim is to create a computerized version of the test which would ultimately be used together with an essay test to make placement decisions about incoming ESL students.

Five different sentence level task types and one paragraph level task type were created; all tasks require production, rather than selection, of different grammatical forms.

The study was conducted as a series of trials over four semesters in 2007 and 2008 with several groups of students from a range of language backgrounds and proficiency levels at Iowa State University. In total, there were about 700 learners in the study.

The research questions that the study sought to answer were as follows:

1) Does the test constructed on the basis of research on grammatical development produce scores with acceptable reliability?

2) Do the means of the items correspond to what would be predicted by their respective positions in the test framework?

3) Do students’ scores on other tests of language development correlate positively as would be expected with the scores on the grammar test?

4) Do students at three different levels of language development perform significantly differently on the test according to their proficiency levels?

The authors frame their study in terms of an interpretive validity argument based on the work by Kane and others, which has increasingly been used in language testing literature in recent years.

As the authors of the article say, “Results indicated promise for developing test items based on findings from research on L2 grammatical development…” The test versions trialed in the study demonstrated acceptable reliability for such relatively short instruments; the plan is, however, to lengthen the test somewhat to ensure adequate reliability of the envisaged computerized version of the test. In the most recent version of the test, the empirical difficulty of the items corresponded with the predictions drawn from the test framework. Also, the correlation between the new test of productive grammar and the other measures of language proficiency - a writing test and the TOEFL iBT - were in the expected range. Importantly, the test was also able to clearly distinguish three learner groups known to be at different levels of proficiency, thus providing backing for extrapolation of the construct measured by the test.

Overall, the results, thus, provide backing for the inferences in the validity argument presented by the authors. To make the test practical for regular use for placement purposes would require delivery and scoring by computer, and the authors conclude their article with a discussion of the steps and requirements of such a development.

What follows is a brief description of what the ILTA 2010 Best Article Award Committee found to be the specific merits of the Chapelle, Chung, Hegelheimer, Pendar and Xu article:

There is much call for linking findings of SLA research and the development of language assessments, but little actual work in that area. This study moves beyond simply engaging in rhetoric about the need to interface the two fields to actually bridging the gap between them. This is accomplished through a careful review of literature and explication of the specific findings from SLA research that form the theoretical basis for the construct of the test being developed, followed by systematic development and trialing of items (including some innovative item formats).

The interpretive argument for the test of productive grammar being developed is carefully laid out, and results provide backing for the assumptions underlying the warrants supporting the 5 inferences of the interpretive argument. Thus, the article provides an illustration of the process of building an interpretive argument for a test, a model others in the field of language testing can learn from. Language testers are also likely to benefit through this example of how SLA theory can not only inform, but also be operationalized through the construction of carefully designed test items.

The article is also likely to be highly relevant to SLA researchers, and generating testable hypotheses structured around the interpretive argument may be novel to at least some SLA researchers. Using the findings from this study as the basis for the development of a computer-mediated test of grammatical ability is likely to be of interest to both groups of researchers and adds another dimension to the paper.

The article is very clearly written, particularly the outline of the interpretive argument for the test of productive grammar, presentation of the phases of the study, and the ways that the empirical findings provide backing for the assumptions.

The ILTA 2010 Best Article Award Committee congratulates Carol Chapelle, Yoo-Ree Chung, Volker Hegelheimer, Nick Pendar and Jing Xu for their important and timely contribution to the field of language testing and wishes them all the best in their further work on this topic.

© 2014 International Language Testing Association