Print Page   |   Contact Us   |   Report Abuse   |   Sign In   |   Join or Create a Guest Account
Corpus-based development and validation of language tests: using corpora of and for language testing
Share |

Corpus-based development and validation of language tests: using corpora of and for
language testing

Darren Perrett, Cambridge Assessment English
Brigita Séguis, Cambridge Assessment English

Intro/Rationale: Corpora are being increasingly used in language testing for a wide range of purposes
and in a variety of ways. While there is a general consensus that a corpus can be defined as a
collection of written or spoken materials, there is less understanding about the specific properties
that are required for a collection of texts to be classified as a corpus, the various functions that
corpora can fulfil specifically in the context of language testing and the advantages and limitations of
the various tools that are being developed for corpus analysis.
In this workshop, we will explore two major connections between corpus linguistics and language
testing, namely using corpora for language testing, as well as creating corpora of language testing.
Prerequisites: attendees are requested to bring their own laptops. As part of the workshop,
participants will be given access to L1 corpora, as well as temporary access to the full Cambridge
Learner Corpus (CLC) via a corpus analysis tool called ‘Sketch Engine’.


  • We will provide participants with the necessary skills for accessing and navigating the
    existing corpora, and demonstrate how they can be used to perform a range of activities
    relevant to development and validation of language tests, such as item writing and
    identification of criterial features at different proficiency levels;
  • We will also equip participants with the practical tools to create their own corpora by
    providing an overview of the available corpus interfaces, tagging, as well as understanding
    the importance of high-quality metadata.

Workshop Agenda

Part 1: In this introductory part we will discuss and look at the existing corpora of both L1 (Corpus
of Contemporary American English, COCA, and the British National Corpus, BNC) and L2 English (CLC)
and what methods and tools currently exist to aid with item production and validation within
language assessment. We will consider the tagging process for part-of-speech and what the
strengths and weaknesses are for each analytical method within a corpus, including collocations, n-
grams, concordance lines and basic word frequencies.

Part 2:  reviews the importance of meta-data and the role it plays when creating a corpus for/of
language assessment. Participants will be given a series of practical tasks to discover variances
across candidate responses within the Cambridge Learner Corpus (CLC) highlighting the importance
of having detailed meta-data and then also learning how to combine it with the texts to be uploaded
into a corpus.

Part 3: In this part participants will have the opportunity to create their own corpus with temporary
logins to Sketch Engine. They will be supplied with authentic written responses from Cambridge
Assessment exams and candidate meta-data from across five different proficiency levels on the CEFR
scale (A1-C1). Applying their knowledge from part 2, participants will upload and create a new
corpus from which they will then run lexical analysis previously discussed in part 1. Finally, we will
then undertake a series of practical tasks in relation to candidate written responses, followed by
group discussion.

Part 4: focuses on receptive skills and gives the participants an evidence-based approach to item
production using L1 corpora. Authentic source texts used for Reading task types at various
proficiency levels will be provided for participants to upload to their new corpus. Drawing on the
methods discussed in earlier parts, they will then run parallel analysis against L1 corpora to judge the
suitability of the lexis for the target level. During follow-up discussion we will consider the
implications of this approach in identifying criterial features of texts at a particular proficiency level
and the impact on test construction and validation.

Association Management Software Powered by YourMembership  ::  Legal