ILTA: Draft Code of Practice: Version 3 Introduction On behalf of the ILTA Code of Practice Committee [Alan Davies (Chair), Charles Alderson, Glenn Fulcher, Randy Thrasher, Liz Hamp-Lyons, Antony Kunnan, Charles Stansfield], I am sending this Version 3 to the President of ILTA with the recommendation that the document should now be circulated as a draft CoP to ILTA members so that they all have a chance to read it before the Ottawa meeting. At the Alan Davies
|
Part 1 Code of Good Testing Practice (Based on JLTA) A. Basic Considerations for good testing practice in all situations 1. The test developer's understanding of just what the test, and each sub-part of it, is supposed to measure (its construct) must be clearly stated. 2. All tests, regardless of their purpose or use, must provide information which allows valid inferences to be made. Validity refers to the accuracy of the inferences and uses that are made on the basis of the test's scores. If, for example, the test purports to be measuring the ability to use English in business communication, the test score inference of interpretation is valid to the degree that it does in fact measure that ability. However, the ability to use English in business communication is a construct. The test developer must spell out just what that construct is or what it consists of. The test score inference or interpretation can be valid only if the test construct offers as accurate as possible a picture of the skill or ability it is supposed to measure. 3. All tests, regardless of their purpose or use, must be reliable. Reliability refers to the consistency of the test results, to what extent they are generalizable and therefore comparable across time and across settings. 1. Test design should include a determination and explicit statement of the test's intended purpose(s). 2. A test designer must decide on the construct to be measured and state explicitly how that construct is to be operationalized. 3. The specifications of the test and the test tasks should be spelled out in detail. 4. The work of the task and item writers needs to be edited before pretesting. If pretesting is not possible, the tasks and items should be analysed after the test has been administered but before the results are reported. Malfunctioning or misfitting tasks and items should not be included in the calculation of individual test takers' reported scores. 5. Information guides on scoring (also known as grading or marking schemes) must be prepared for test tasks requiring hand scoring. These guides must be tried out to demonstrate that they permit reliable evaluation of the test takers' performance. 6. Those doing the scoring should be trained for the task and both inter and intra-rater reliability should be calculated and published. 7. Test materials should be kept in a safe place and handled in such a way that no test taker is allowed to gain an unfair advantage over the other test takers. 8. Care must be taken to ensure that all test takers are treated in the same way in the administration of the test. 9. Scoring procedures must be carefully followed and score processing routines checked to make certain that no mistakes have been made. 10. Reports of the test results should be presented in such a way that they can be easily understood by the test takers and other stakeholders. Institutions (schools, companies, certification bodies, etc.) developing and administering entrance, certification, or other high stakes examinations must utilize test designers and task and item writers who are well versed in current language testing theory and practice and have native or near native competence in the language being tested. Items written by non-native speakers of the language being tested must be checked by competent native speakers of the language. Responsibilities to test takers and related stakeholders: (Before the test is administered) The institution should provide all potential test takers with adequate information about the purpose of the test, the construct (or constructs) the test is attempting to measure and the extent to which that has been achieved. Information should also be provided as to how the scores/grades will be allocated and how the results will be reported. (At the time of administration) The institution shall provide facilities for the administration of the test that do not disadvantage any test taker. Test administration materials should be carefully prepared and proctors trained and supervised so that each administration of the test can be uniform, assuring that all test takers receive the same instructions, time to do the test, and access to any permitted aids. If something occurs that calls into question the uniformity of the administration of the test, the problem should be identified and any remedial action to be taken to offset the negative impact on the affected test takers should be promptly announced. In the case of speaking tests, the facilities shall be capable of proper invigilation and oversight, providing a safe and secure environment in professional surroundings for both the rater(s)/interlocutors and the test takers. (At the time of scoring) The institution shall take the steps necessary to see that each test taker's test paper is scored/graded accurately and the result correctly placed in the data-base used in the assessment. There should be on-going quality control checks to assure that the scoring process is working as intended. (Other considerations) If a decision must be made on candidates who did not all take the same test or the same form of a test, care must be taken to ensure that the different measures used are in fact comparable. If more than one form of the test is used, inter-form reliability estimates should be published as soon as they are available. They should: 1. Make a clear statement as to what groups the test is appropriate for and for which groups it is not appropriate. 2. Make a clear statement of the construct the test is designed to measure in terms a layperson can understand. 3. Publish validity and reliability estimates and bias reports for the test along with sufficient explanation to allow potential test takers and test users to decide if the test is suitable in their situation. 4. Report the results in a form that will allow test users to draw the correct inferences from them. 5. Refrain from making any false or misleading claims about the test. 6. Publish a test takers' handbook which: 1. Explains the relevant measurement concepts so that they can be understood by 2. Reports evidence of the reliability and validity of the test for the purpose for 3. Describes the scoring procedure and, if multiple forms exist, the steps taken to 4. Explains the proper interpretation of test results and any limitations on their Persons who utilize test results for decision making must: 1. Use results from a test that is sufficiently reliable and valid to allow fair decisions to be made. 2. Make certain that the test construct is relevant to the decision to be made. 3. Clearly understand the limitations of the test results on which they will base their decision. 4. Take into consideration the standard error of measurement (SEM) of the device that provides the data for their decision. 5. Be prepared to explain and provide evidence of the fairness and accuracy of their decision making process. In norm-referenced testing The characteristics of the population on which the test was normed must be reported so that test users can determine if this group is appropriate as a standard to which their test takers can be compared. In criterion referenced testing The appropriateness of the criterion must be confirmed by experts in the area being tested. Since correlation is not a suitable way of determining the reliability and validity of criterion referenced tests, methods appropriate for such test data must be used. In computer adaptive testing The sample sizes must be large enough to assure the stability of the IRT estimates. Test takers and other stakeholders must be informed of the rationale of computer adaptive testing and informed of the difference between paper and pencil tests and computer adaptive tests. Part 2
As a test taker, you have the responsibility to:
|