LTRC 2
San Francisco 1980


The Participants
Colloquium on the Construct Validation of
Oral Proficiency Tests (Colloquium #109)
TESOL '80, March 4-5
San Francisco

19 February 1980

Dear Participant:

We have heard from many of you and are now able to put together a final schedule for the colloquium on the Construct Validation of Oral Proficiency Tests.

The colloquium is scheduled for Tuesday and Wednesday, March 4-5. We assume that the times will be the same as for last year (9:30-12:30 and 1:30-4:30 each day). We have not received a final schedule from TESOL, so we cannot be absolutely sure of the above times. Nor can we tell you which room the colloquium will be in--again because TESOL has not told us. This information will be available at the registration desk, however.

Assuming the times given above are correct, we have scheduled the papers as follows:

TUESDAY MORNING

9:30-10:00
INTRODUCTION & DISTRIBUTION OF LATE PAPERS
Lyle Bachman
Adrian Palmer

10:15-11:30
DISCUSSION AND CRITIQUE OF THE PILOT STUDY
Lyle Bachman & Adrian Palmer.
"The Construct Validation of the Constructs 'Communicative Competence in Speaking' and 'Communicative Competence in Reading': A Pilot Study"

Jack Upshur.
"Critique of Bachman & Palmer's Pilot Study"

11:45-12:30
DISCUSSION OF PAPERS: RESEARCH IN CONSTRUCT VALIDATION
Ryuichi Yorozuya & John Oller.
"Oral Proficiency Scales: Construct Validity and the Halo Effect"

Pardee Lowe & Ray Clifford.
"Development of an Indirect Measure of Overall Oral Proficiency (ROPE)"

TUESDAY AFTERNOON

1:30-2:30
DISCUSSION OF PAPERS

Susanna Brutsch.
"Convergent-Discriminant Validation of Proficiency in Oral and Written Production of French"

Harold Madsen & Randy Jones.
"A Survey of Oral Proficiency Tests: Phase II"

2:45-4:30
DISCUSSION OF PAPERS

Meredith Pike.
"An investigation of the Interviewer's Role in Oral Proficiency Testing"

Brendan Carroll.
"Measuring the Communicative Value of an Oral Performance"

Arthur Hughes.
"A Closer Look at Conversational Cloze"

WEDNESDAY MORNING

9:30-11:00
DISCUSSION OF PAPERS NOT CIRCULATED IN ADVANCE

Marianne Adams.
"The Distance Between the S-Levels"

Elana Shohamy.
"The Construct Validity of the Oral Interview Rating Scale"

Douglas Stevenson.
To be announced.

11:45-12:30
DISCUSSION OF THE COMPATIBILITY OF MODELS OF COMMUNICATIVE COMPETENCE WITH RESEARCH FINDINGS

WEDNESDAY AFTERNOON

1:30-2:30
DISCUSSION OF DESIGNS FOR FOLLOW-UP STUDIES: HYPOTHESES AND METHOD

2:45-4:30
DISCUSSION OF PRACTICAL CONSIDERATIONS IN THE IMPLEMENTATION OF FOLLOW-UP STUDIES

We would like to ask you to bring extra copies of your paper to the colloquium. If you mailed out your paper in advance, 10 extra copies should be adequate for those participants who failed to receive your paper in time. If you did not mail out your paper in advance, please bring 35 copies.


REPORT ON THE TWO-DAY COLLOQUIUM ON THE CONSTRUCT VALIDATION OF ORAL PROFICIENCY TESTS

Adrian S. Palmer
University of Utah

For the second year in a row, a small group of researchers in language testing have met for two days during the TESOL national convention. The first colloquium was held in Boston during TESOL '79 and was chaired by Peter J.M. Groot (Institute of Applied Linguistics, University of Utrecht, Wilhelminapark 11, Utrecht, Holland) and Adrian Palmer. It was organized to bring researchers together to discuss not only the problems of testing oral proficiency but also test validation procedures in general and construct validation procedures in particular. One of the outcomes of this colloquium was the development of a design (using a model proposed by Campbell and Fiske) for a pilot multitrait-multimethod convergent-divergent construct validation study of tests of the traits "communicative competence in speaking" and "communicative competence in reading." Another outcome was the preparation of a volume of papers on the validation of oral proficiency tests (Palmer and Groot, editors, forthcoming). A third outcome was the decision to continue to meet to consider the results of the pilot study as well as other advances in oral testing.

The second colloquium is the one reported on here. Organized by Adrian Palmer and Lyle F. Bachman (Division of ESL, 3070 Foreign Languages Bldg., University of Illinois, Urbana, Ill. 61801), the colloquium met March 4-5, 1980 at the TESOL national convention in San Francisco. Papers were prepared and circulated in advance to the 20 participants. The colloquium itself was devoted to brief summaries of the papers followed by extensive discussion. The final working session was devoted to planning coordinated follow-up studies.

The Papers

Lyle Bachman and Adrian Palmer presented a paper entitled "The Construct Validation of the Constructs 'Communicative Competence in Speaking' and 'Communicative Competence in Reading': A Pilot Study." Competence in speaking and reading were tested by three methods: interview, translation, and self-ratings. The subjects were 75 native speakers of Mandarin Chinese at the University of Illinois. The researchers found strong evidence for convergent validity and weak evidence for discriminant validity of tests of speaking and reading using correlational analysis and analysis of variance. Subsequent to the colloquium, they used confirmatory factor analysis and found that a two trait model (speaking and reading) accounted for the data significantly better than a one trait (unitary language factor) model.

Jack Upshur was asked prior to the colloquium to critique the pilot (Bachman-Palmer) study. Because Upshur was not directly involved in planning the pilot study, it was felt that he could provide an unbiased evaluation of the design, implementation, and conclusions. In his paper "Critique of Bachman-Palmer Pilot Study on Communicative Competence in Speaking and Reading," Upshur pointed out that the number of traits (2) and methods (3) used resulted in an under-identified model. This, in conjunction with the subject selection procedure, which may have involved choosing subjects who had been affected by a highly similar set of variables, may have made discriminant validity very difficult to demonstrate. He expressed reservations about reaching conclusions about convergent and discriminant validity based upon satisfaction of the Campbell-Fiske criteria.

John Oller (Department of Linguistics, University of New Mexico) presented a paper (co-authored with Ryuichi Yorozuya) entitled "Oral Proficiency Scales: Construct Validity and the Halo Effect." He investigated the construct validity of four 10 point scales of oral proficiency (grammar, vocabulary, pronunciation, and fluency) in a study using interviews with 10 foreign students, evaluated by 15 native speakers of English. Oller found that there was no unique reliable variance which could be attributed to the separate constructs of grammar, vocabulary, pronunciation, and fluency.

Pardee Lowe and Ray Clifford (C.I.A. Language School) described a recorded oral proficiency test developed at the CIA for administration under conditions where a face-to-face interview was impractical. The test incorporates many of the Features of the live interview used at the CIA (the systematic use of Lowe's question types for eliciting performance at different levels).

Susanna Brutsch (University of Minnesota, Minneapolis) presented a paper entitled "Convergent/ Discriminant Validation of Proficiency in Oral and Written Production of French." In this study, using 82 subjects, two traits, and three methods, she found evidence for convergent validity, but not discriminant validity.

Harold Madsen and Randy Jones (Departments of Linguistics and German, Brigham Young University) presented a paper entitled "A Survey of Oral Proficiency Tests: Phase II." This was an updated report of their large scale study of over 180 different oral tests they have collected over the last two years. They summarized their findings as related to test description and scoring procedures.

Meredith Pike (Center for Developing English Language Teaching, Faculty of Education, Ain Shams University, Cairo, Egypt) presented a paper entitled "An Investigation of the Interviewer's Role in Oral Proficiency Testing." She analyzed one interviewer's behavior in a series of oral interviews with foreign graduate students at UCLA. The behavior was analyzed on the basis of form and function, and these analyses contributed to a preliminary definition of interviewer consistency.

Brendan Carroll (English Teaching Information Centre, British Council, London, England) presented a paper entitled "Measuring the Communicative Value of an Oral Performance." He examined the adequacy of current criteria for assessing oral performance and outlined a model which incorporates communicative factors from the Munby model and which subsumes the language aspects (pronunciation, vocabulary, grammar, and fluency) commonly used as the basis for descriptions of spoken performance.

Arthur Hughes (Department of Linguistic Science, University of Reading, England) presented a paper entitled "A Closer Look at Conversational Cloze." He described the features which distinguish the conversational cloze from the prose cloze and attempted to discover how the presence of any or all of them accounted for the greater success of conversational cloze in predicting oral ability.

Marianne Adams (Testing and Publications Office, School of Language Studies, Foreign Service Institute, Department of State, Washington, DC) presented a paper entitled "The FSI Oral Interview: Test/Conversation." She described some of the basic features of the FSI oral interview (one of the tests used in the Bachman-Palmer study) and explained how the test is used at the FSI.

Elana Shohamy (School of Education, Stanford University) presented a paper entitled "The Construct Validity of the Oral Interview Rating Scale." She evaluated the extent to which speaking proficiency as rated by teachers or linguists trained in using the FSI oral interview rating scales correlated with the ratings of speaking proficiency by lay native speakers using their own evaluation criteria. She found high inter-rater reliability for both lay and trained raters and a high correlation between the ratings assigned by both groups.

Helmut Vollmer (Universitat Osnabruck, Federal Republic of West Germany) presented a paper entitled "On the Psycholinguistic Construct of an Internalized Expectancy Grammar." He reviewed some of the evidence for a general language proficiency (one factor) model and criticized it as a methodological artifact. He argued against the use of principle components factor analysis and proposed the use of designs that would allow investigators to apply confirmatory favor analytic techniques.

Andrew Cohen (The Hebrew University of Jerusalem, Israel) presented a paper entitled "Developing a Rating Scale for Testing Functional Speaking Ability." He described the development of a rating scale for assessing sociocultural competence based on contrastive analysis of sociocultural patterns.

Discussion of Directions for Future Research

In addition to the presentation and discussion of the papers, an entire afternoon was devoted to a discussion of possible directions for follow-up studies. The participants seemed to agree that there was sufficient evidence for discriminant validity of tests of communicative competence in speaking and reading to warrant a continued investigation into the internal structure of the construct "communicative competence. It was decided that the primary focus of the study should be to attempt to assess whether raters could reliably distinguish between aspects of communicative competence such as those proposed by Canale and Swain in their framework. It was also suggested that actual tests used in the study should be as "natural" as possible in the hope that the results of the study would then be of greater interest to researchers in Europe interested in functional language tests. Finally, it was suggested that future colloquia allow for the discussion not only of construct validation studies but also of criterion related validation.

Other Participants in the Colloquia

In addition to the individuals cited above, a number of other researchers have contributed to the construct validation project, both by presenting papers at the first colloquium in Boston and by participating in the discussion sessions at the colloquia. These individuals are named below.

Dr. Michael Canale
Project Director, FSL Project
The Ontario Institute for Studies in Education

Dr. Francis Cartier

Dr. John L.D. Clark
Educational Testing Service

Dr. Alan Davies
Department of Linguistics
University of Edinburg

Dr. Frances B. Hinofotis
Department of English ESL
UCLA

Donna Ilyin
Placement Officer
San Francisco Community College

Dr. Dale Lange
University of Minnesota

Stephen B. Ross
American Language Program
California State University

George E. Scholz
Director of Studiesbr> American Language Institute
Portugal

Dr. Charles Stansfield
Dept. of Spanish and Portugese
University of Colorado

Dr. Douglas K. Stevenson
Gesamthochschule Essen
West Germany


The Participants
Oral Proficiency Test Construct Validation Project
Boston and San Francisco, 1979-1980

Dear Participant:

With the recent conclusion of the Second Colloquium on the Construct Validation of Oral Proficiency Tests, we would like to take this opportunity to summarize the results achieved to date and to indicate possible future directions for the project. Before we do so, however, we would like to thank those of you who participated in the San Francisco colloquium. The papers were focused and the discussion (facilitated by having the papers circulated in advance) enlightening. Evidence of the success of the project in general is our progress in three areas: the use of increasingly sophisticated research methodology, the development of new tests of communicative competence, and the completion of a number of empirical construct validation studies.

We would like briefly to summarize the results of the colloquium and then suggest some possibilities for the future of the project.

The first step in the construct validation of tests of "communicative competence in speaking," as specified in our letter of March 18, 1978, was to determine whether there is evidence that this trait can be measured independently of the trait "communicative competence in reading." The 2 trait x 3 method matrix reported in the pilot study was under identified, so we were not able to present strong evidence either for or against discriminant validity in the colloquium. Following the colloquium, Lyle Bachman reanalyzed the data considering the different raters as different methods, resulting in an over-identified model. He has run a number of confirmatory factor analyses of the data and has found a significantly better fit with a two trait model than with a one trait model. These results, along with a cleaner correlation matrix and ANOVA (the result of our finding several reversed scales), are included with this letter.

A second objective of the colloquium was to bring researchers with different research objectives into closer contact. The primary area in which this was most noticeable was the contact between those researches primarily interested in functional language testing (as represented by Carroll, Hughes, and major U.S. testing institutions such as FSI, the CIA, the Army Language School in Monterey, and ETS) in contact with those interested primarily in construct validation. There seemed to be a general feeling that research in criterion reference validation should be included in any follow up colloquia in order to give appropriate consideration to functional language testing, and that each group of researchers could benefit from interaction with the other.

A third objective of the colloquium was to determine directions for future research. At the Boston Colloquium, we agreed to carry out the construct validation project in two stages. In stage one, we would investigate the construct validity of tests of two language use skills differing both in direction (production versus reception) and in channal (oral versus visual). If we found evidence of both convergent and discriminant validity, we would, as a next step, investigate the tests of components of communicative competence. We believe that the results of the pilot study (particularly the updated results included in this letter) support a two trait model and, therefore, warrant moving to the second stage of the project. We have funding for a follow-up study as well as an indication that several of you will be conducting parallel studies.

For those of you not able to attend the San Francisco colloquium, we have appended a more complete report on the colloquium prepared for the next issue of SLANT.

We feel we are now at a stage in this project at which a number of important decisions need to be made. We will list the issues below for your consideration.

1. The nature of future colloquia

a) Closed. Restricted to a small number of active researchers--as in the preceding two colloquia.

b) Open. Open to all interested persons.

2. Time of future colloquia

a) Two days prior to the beginning of the TESOL convention (Sunday and Monday). Several of us attended the two-day discussion of a framework for evaluating communicative competence that Michael Canale and Merrill Swain organized prior to the Boston TESOL convention. We felt that this schedule and format offered a number of advantages--no conflict with other TESOL functions, more time for socializing, no need to follow the TESOL time schedule, and the avoidance of problems of closed participation.

b) During the first two days of the TESOL convention--as with the past colloquia. The main advantage to this is less time spent away from our host institutions.

3. Participation.

a) Restricted to active researchers. This policy has allowed us to present highly technical papers with little introduction and has allowed us to focus our discussion on a small number of issues. It has also allowed us to circulate papers in advance, freeing more colloquium time for discussion.

b) Open to general participation. TESOL reported considerable interest in our colloquium from outside the group of invited participants. In order to avoid an unnecessarily exclusive position, we believe an effort should be made to broaden the audience. We propose the following. To accommodate the needs of the relative small group of active researchers, we should continue to hold a closed session. In addition, we should hold a half-day panel discussion during the second day of the regular convention (Wednesday). The members of the panel, drawn from the closed session, would summarize the results of the colloquium in a form comprehensible to a general audience and would answer questions from the audience.

4. Location of the closed session in 1981

a) Toronto. We have talked with Michael Canale, who suggested the possibility of holding the colloquium at OISE in Toronto. OISE has excellent facilities, and members of the staff have been actively involved in developing frameworks and tests for evaluating communicative competence. Toronto is about 300 miles east of Detroit, and transportation to Detroit would be easy to arrange.

b) Ann Arbor. We have also talked with Jack Upshur, who suggested Ann Arbor as a possible site. The University of Michigan also has excellent facilities for hosting such a conference. Ann Arbor is about 40 miles west of Detroit.

5. Sponsorship of the closed session

a) Co-sponsorship by TESOL and the host institution. Some of the participants in the colloquia, particularly those from Europe, have asked for official letters of invitation to facilitate obtaining travel funding. If future colloquia are to be held outside of the regular TESOL convention, perhaps it would be useful to arrange for joint sponsorship by TESOL and the host institution.

b) Involvement of the AILA commission on Language Tests and Testing. Some involvement of AILA would, perhaps, make it easier for us to involve our colleagues from Europe in the project.

Jack Upshur will be contacting each of you shortly regarding the future of the limited participation colloquium. Lyle and I will continue to be involved in the follow-up study. Lyle and I will also prepare a proposal for TESOL '81 for a half day panel discussion of the outcomes of the closed session.

This has been a most rewarding project for Peter, Lyle, and myself. We have received a number of letters from you which we would like to answer personally, but time prevents us from doing so. Let us, once again, thank you all for your participation and support over the last two and a half years. We are looking forward to working together in the future.

Sincerely,

Adrian S. Palmer
University of Utah

Lyle F. Bachman
University of Illinois

Peter J. M. Groot
University of Utrecht

Enclosures:
Report on San Francisco Colloquium
Master mailing list for oral test validation project
Updated results of pilot study