VIETNAM NATIONAL UNIVERSITY, HANOI
UNIVERSITY OF LANGUAGES AND INTERNATIONAL STUDIES
******
NGUYỄN THỊ QUỲNH YẾN
DOCTORAL DISSERTATION
AN INVESTIGATION INTO THE CUT-SCORE VALIDITY
OF THE VSTEP.3-5 LISTENING TEST
MAJOR: ENGLISH LANGUAGE TEACHING METHODOLOGY
CODE: 9140231.01
HANOI, 2018
VIETNAM NATIONAL UNIVERSITY, HANOI
UNIVERSITY OF LANGUAGES AND INTERNATIONAL STUDIES
******
NGUYỄN THỊ QUỲNH YẾN
DOCTORAL DISSERTATION
AN INVESTIGATION INTO THE CUT-SCORE VALIDITY
OF THE VSTEP.3-5 LISTENING TEST
(Nghiên cứu xác trị các điểm cắt của kết quả bài thi Nghe
Đánh giá năng lực tiếng Anh từ bậc 3 đến bậc 5 theo
Khung năng lực Ngoại ngữ 6 bậc dành cho Việt Nam)
MAJOR: ENGLISH LANGUAGE TEACHING METHODOLOGY
CODE: 9140231.01
SUPERVISORS:
1. PROF. NGUYỄN HÒA
2. PROF. FRED DAVIDSON
HANOI, 2018
This dissertation was completed at the University of Languages and
International Studies, Vietnam National University, Hanoi.
This dissertation was defended on 10th May 2018
This dissertation can be found at:
- National Liberary of Vietnam
- Liberary and Information Center -Vietnam National University, Hanoi
i
DECLARATION OF AUTHORSHIP
I hereby certify that the thesis I am submitting is entirely my own original
work except where otherwise indicated. I am aware of the University's
regulations concerning plagiarism, including those regulations concerning
disciplinary actions that may result from plagiarism. Any use of the works of
any other author, in any form, is properly acknowledged at their point of use.
Date of submission:
_____________________________
Ph.D Candidate’s Signature:
_____________________________
ii
I certify that I have read this dissertation and that, in my opinion, it
is fully adequate in scope and quality as a dissertation for the degree
of Doctor of Philosophy.
_____________________________________________
Prof. Nguyễn Hòa
(Supervisor)
I certify that I have read this dissertation and that, in my opinion, it
is fully adequate in scope and quality as a dissertation for the degree
of Doctor of Philosophy.
_____________________________________________
Prof. Fred Davidson
(Co-supervisor)
iii
TABLE OF CONTENTS
LIST OF FIGURES………………………………………………………………………..
viii
LIST OF TABLES…………………………………………………………………………
ix
LIST OF KEY TERMS…………………………………………………………………….
xiii
ABSTRACT………………………………………………………………………………..
xvii
ACKNOWLEDGMENTS………………………………………………………………….
xix
CHAPTER I: INTRODUCTION………………………………………………………...
1
1. Statement of the problem………………………………………………………………...
1
2. Objectives of the study…………………………………………………………………..
4
3. Significance of the study ….…………………………………………………………….
4
4. Scope of the study………………………………………………………………………..
4
5. Statement of research questions…………………………………………………………
5
6. Organization of the study………………………………………………………………..
5
CHAPTER II: LITERATURE REVIEW……………………………………………….
7
1. Validation in language testing……….…………………………………………………..
7
1.1. The evolution of the concept of validity...………………………………………..
7
1.2. Aspects of validity.……………………………………………………………….
9
1.3. Argument-based approach to validation…………………………………………
11
2. Standard setting for an English proficiency test…………………………………………
15
2.1. Definition of standard setting……………………..……………………………....
15
2.2. Overview of standard setting methods…………………………………………....
17
2.3. Common elements in standard setting…………………………………………….
21
2.3.1. Selecting a standard-setting method………………………………………..
21
2.3.2. Choosing a standard setting panel………………………………………….
23
2.3.3. Preparing descriptions of performance-level descriptors…………………..
24
2.3.4. Training panelists…………………………………………………………..
24
2.3.5. Providing feedback to panelists……………………………………………
26
2.3.6. Compiling ratings and obtain cut scores…………………………………...
27
2.3.7. Evaluating standard setting………………………………………………...
27
2.4. Evaluating standard setting…………………………………………………….…
28
2.4.1. Procedural evidence.………………………………….……………………
30
iv
2.4.2. Internal evidence…………………………………………………………...
32
2.4.3. External evidence…………………………………………………………..
32
2.4.3.1. Comparisons to other standard-setting methods…………………...
33
2.4.3.2. Comparisons to other sources of information……………………...
33
2.4.3.3. Reasonableness of cut scores………………………………………
34
3. Testing listening….…………………………………………………………………..….
34
3.1. Communicative language testing…………………………………………………
34
3.2. Listening construct………………………………………………………………..
36
4. Statistical analysis for a language test………………………………………………….
42
4.1. Statistical analysis of multiple choice (MC) items………………………………..
42
4.2. Investigating reliability of a language test………………………………………...
46
5. Review of validation studies……………………………………………………………..
49
5.1. Review of validation studies on standard setting…………………………………
49
5.2. Review of studies employing argument-based approach in validating language
tests……………………………………………………………………………….
52
6. Summary…………………………………………………………………………………
60
CHAPTER III: METHODOLOGY……………………………………………………...
61
1. Context of the study……………………………………………………………………...
61
1.1. About the VTEP.3-.5 test…………………………………………………………
61
1.1.1 The development history of the VSTEP.3-5 test…………………………...
61
1.1.2. The administration of the VSTEP.3-5 test in Vietnam…………………….
62
1.1.3. Test takers………………………………………………………………….
62
1.1.4. Test structure and scoring rubrics………………………………………….
62
1.1.5. The establishment of the cut scores ……………………………………….
63
1.2. About the VSTEP.3-5 listening test………………………………………………
64
1.2.1. Test purpose………………………………………………………………
64
1.2.2. Test format………………………………………………………………..
64
1.2.3. Performance standards …………………………………………………...
64
1.2.4. The establishment for the cut scores of the VSTEP.3-5 listening test……
68
2. Building an interpretive argument for the VSTEP.3-5 listening test……………………
68
3. Methodology…………………………………………………………………………….
70
v
3.1. Research questions…………………………………………………………….....
70
3.2. Description of methods of the study……………………………………………...
71
3.2.1. Analysis of the test tasks and test items…………………………………..
72
3.2.1.1. Analysis of test tasks…………………………………………….
72
3.2.1.2. Analysis of test items……………………………………………
73
3.2.2. Analysis of test reliability………………………………………………...
75
3.2.3. Validation of cut-scores…………………………………………………..
76
3.2.3.1. Procedural……………………………………………………….
76
3.2.3.2. Internal…………………………………………………………..
76
3.2.3.3. External………………………………………………………….
77
3.3. Description of Bookmark standard setting procedures ………………………….
78
3.4. Selection of participants of the study……………………………………………..
81
3.4.1. Test takers of early 2017 administration………………………………….
81
3.4.2. Participants for Bookmark standard setting method……………………...
82
3.5. Descriptions of tools for data analysis……………………………………………
83
3.5.1. Text analyzing tools………………………………………………………
83
3.5.1.1. English Profile…………………………………………………..
83
3.5.1.2. Readable.io………………………………………………………
84
3.5.2. Speech rate analyzing tool………………………………………………..
84
3.5.3. Statistical analyzing tools………………………………………………..
85
3.5.3.1. WINSTEPS (3.92.1)……………………………………………..
85
3.5.3.2. Iteman 4.3 ……………………………………………………….
86
4. Summary………………………………………………………………………………...
87
CHAPTER IV: DATA ANALYSIS……………………………………………………...
89
1. Analysis of the test tasks and test items…………..…………………………………….
89
1.1. Analysis of the test tasks……………………………………………….………….
89
1.1.1. Characteristics of the test rubric…………………………………………..
89
1.1.2. Characteristics of the input………………………………………………..
94
1.1.3. Relationship between the input and response……………………………..
102
1.2. Analysis of the test items…………………………………………………………..
102
1.2.1. Overall statistics of item difficulty and item discrimination………………
102
vi
1.2.2. Item analysis……………………………………………………………..
107
2. Analysis of the test reliability…….….…………………………………………………
128
3. Analysis of the cut-scores……..………………………………………………………..
130
3.1. Procedural evidence………………………………………………………………
130
3.2. Internal evidence………………………………………………………………….
131
3.3. External evidence…………………………………………………………………
132
CHAPTER V: FINDINGS AND DISCUSSIONS………………………………………
145
1. The characteristics of the test tasks and test items………………………………………
145
2. The reliability of the VSTEP.3-5 listening test………………………………………….
151
3. The accuracy of the cut scores of the VSTEP.3-5 listening test ………………………..
151
CHAPTER VI: CONCLUSION ……………………….…...……………………………
154
1. Overview of the thesis…………………………………………………………………...
154
2. Contributions of the study……………………………………………………………….
157
3. Limitations of the study………………………………………………………………….
158
4. Implications of the study….……………………………………………………………..
158
5. Suggestions for further research…………………………………………………………
159
LIST OF THESIS-RELATED PUBLICATIONS…………………………………………
161
REFERENCES……………………………………………………………………………..
162
APPENDIX 1: Structure of the VSTEP.3-5 test…………………………………………..
172
APPENDIX 2: Summary of the directness and interactiveness between the texts and the
questions of the VSTEP.3-5 listening test………………………………………………….
174
APPENDIX 3: Consent form (workshops)………………………………………………...
177
APPENDIX 4: Agenda for Bookmark standard-setting procedure………………………..
179
APPENDIX 5: Panelist recording form……………………………………………………
180
APPENDIX 6: Evaluation form for standard-setting participants ………………………...
181
APPENDIX 7: Control file for WINSTEPS………………………………………………..
183
APPENDIX 8: Timeline of the VSTEP.3-5 test administration…………………………...
185
APPENDIX 9: List of the VSTEP.3-5 developers…………………………………………
186
vii
LIST OF FIGURES
Figure 2.1: Model of Toulmin’s argument structure (1958, 2003)………………………...
12
Figure 2.2: Sources variance in test scores (Bachman, 1990)……………………………...
47
Figure 2.3: Overview of interpretive argument for ESL writing course placements………
57
Figure 4.1: Item map of the VSTEP.3-5 listening test……………………...…... ………..
105
Figure 4.2: Graph for item 2………………………………………………………………..
108
Figure 4.3: Graph for item 3………………………………………………………….........
110
Figure 4.4: Graph for item 6………………………………………………………………..
112
Figure 4.5: Graph for item 13………………………………………………………………
115
Figure 4.6: Graph for item 14………………………………………………………………
117
Figure 4.7: Graph for item 15……………………………………………………………..
119
Figure 4.8: Graph for item 19……………………………………………………………..
121
Figure 4.9: Graph for item 20……………………………………………………………..
123
Figure 4.10: Graph for item 28……………………………………………………….........
125
Figure 4.11: Graph for item 34……………………………………………………….........
126
Figure 4.12: Total score for the scored items……………………………………………....
129
viii
LIST OF TABLES
Table 2.1: Review of standard-setting methods (Hambleton & Pitoniak, 2006)…………...
21
Table 2.2: Standard setting Evaluation Elements (Cizek & Bunch, 2007)….……………… 30
Table 2.3: Common steps required for standard setting (Cizek & Bunch, 2007)………….
32
Table 2.4: A framework for defining listening task characteristics (Buck, 2001)…………
38
Table 2.5: Criteria for item selection and interpretation of item difficulty index…………..
44
Table 2.6: Criteria for item selection and interpretation of item discrimination index…….
46
Table 2.7: General guideline for interpreting test reliability (Bachman, 2004)…………….
48
Table 2.8: Number of proficiency levels & test reliability…………………………………
48
Table 2.9: Summary of the warrant and assumptions associated with each inference in the
TOEFL interpretive argument (Chapelle et al., 2008)……………………..…………………….
56
Table 3.1: Structure of the VSTEP.3-5 test……………………………………………….
63
Table 3.2: The cut scores of the VSTEP.3-5 test………………………………………….
63
Table 3.3: Performance standard of Overall Listening Comprehension (CEFR: learning,
teaching, assessment)………………………………………………………………………..
65
Table 3.4: Performance standard of Understanding conversation between native speakers
(CEFR: learning, teaching, assessment)…………………………………………………….
66
Table 3.5: Performance standard of Listening as a member of a live audience (CEFR:
learning, teaching, assessment)……………………………………………………………...
66
Table 3.6: Performance standard of Listening to announcements and instructions (CEFR:
learning, teaching, assessment)……………………………………………………………...
67
Table 3.7: Performance standard of Listening to audio media and recordings (CEFR:
learning, teaching, assessment)……………………………………………………………...
67
Table 3.8: The cut scores of the VSTEP.3-5 test……..…………………………………….
68
Table 3.9: Criteria for item selection and interpretation of item difficulty index……..……
74
Table 3.10: Criteria for item selection and interpretation of item discrimination index……
75
Table 3.11: Number of proficiency levels & test reliability………………………………..
76
Table 3.12: The venue for Angoff and Bookmark standard setting method……………….
77
Table 3.13: Comparison between the Flesch-Kincaid readability analysis and the CEFR IELTS grading systems……………………………………………………………………...
ix
85
Table 3.14: Summary of the interpretative argument for the interpretation and use of the
VSTEP.3-5 listening cut-scores …………………………………………………………….
88
Table 4.1: General instruction of the VSTEP.3-5 listening test…….……………………...
90
Table 4.2: Instruction for Part 1……….……………….…………………………………...
91
Table 4.3: Instruction for Part 2………..…………………………………………………...
92
Table 4.4: Instruction for Part 3…….……………………………………………………....
93
Table 4.5: Information provided in the specifications for the VSTEP.3-5 listening test……...
94
Table 4.6: Summary of the texts for items 1-8………..……………………………………
96
Table 4.7: Description of language levels for texts of items 1 -8 in the specification……..
97
Table 4.8: Summary of the texts for items 9-20…………………………………………....
98
Table 4.9: Description of language levels for texts of items 9 -20 in the specification…....
99
Table 4.10: Summary of the texts for items 21-35…………………………………………
100
Table 4.11: Description of language levels for texts of items 21-35 in the specification……...
101
Table 4.12: Summary of item discrimination and item difficulty………………………….
104
Table 4.13: Summary statistics for the flagged items……………………………………....
106
Table 4.14: Information for item 2………………………………………………………….
108
Table 4.15: Item statistics for item 2………………………………………………………..
109
Table 4.16: Option statistics for item 2……………………………………………………..
109
Table 4.17: Quantile plot data for item 2…………………………………………………...
109
Table 4.18: Information for item 3………………………………………………………….
110
Table 4.19: Item statistics for item 3………………………………………………………..
110
Table 4.20: Option statistics for item 3……………………………………………………..
111
Table 4.21: Quantile plot data for item 3……………………………………………….......
111
Table 4.22: Information for item 6………………………………………………………….
112
Table 4.23: Item statistics for item 6………………………………………………………..
112
Table 4.24: Option statistics for item 6……………………………………………………..
113
Table 4.25: Quantile plot data for item 6…………………………………………………...
113
Table 4.26: Information for item 13………………………………………………………...
115
Table 4.27: Item statistics for item 13………………………………………………………
115
Table 4.28: Option statistics for item 13……………………………………………………
116
Table 4.29: Quantile plot data for item 13………………………………………………….
116
x
Table 4.30: Information for item 14………………………………………………………..
118
Table 4.31: Item statistics for item 14………………………………………………………
118
Table 4.32 Option statistics for item 14……………………………………………………
118
Table 4.33: Quantile plot data for item 14………………………………………………….
118
Table 4.34: Information for item 15………………………………………………………...
120
Table 4.35: Item statistics for item 15………………………………………………………
120
Table 4.36: Option statistics for item 15……………………………………………………
120
Table 4.37: Quantile plot data for item 15………………………………………………….
120
Table 4.38: Information for item 19………………………………………………………..
121
Table 4.39: Item statistics for item 19………………………………………………………
121
Table 4.40: Option statistics for item 19……………………………………………………
122
Table 4.41: Quantile plot data for item 19………………………………………………….
122
Table 4.42: Information for item 20………………………………………………………...
123
Table 4.43: Item statistics for item 20………………………………………………………
123
Table 4.44: Option statistics for item 20……………………………………………………
124
Table 4.45: Quantile plot data for item 20………………………………………………….
124
Table 4.46: Information for item 28……………………………………………………..….
125
Table 4.47: Item statistics for item 28………………………………………………………
125
Table 4.48: Option statistics for item 28……………………………………………………
125
Table 4.49: Quantile plot data for item 28………………………………………………….
126
Table 4.50: Information for item 34………………………………………………………...
127
Table 4.51: Item statistics for item 34………………………………………………………
127
Table 4.52: Option statistics for item 34……………………………………………………
127
Table 4.53: Quantile plot data for item 34………………………………………………….
127
Table 4.54: Summary of statistics…………………………………………………………..
129
Table 4.55: Test reliability ……………………………..…….…………………………….
129
Table 4.56: The person reliability and item reliability of the test…………………………..
130
Table 4.57: Number of proficiency levels and test reliability…………………….…………
131
Table 4.58: The test reliability of the VSTEP.3-5 listening test……….………………….
132
Table 4.59: Order of items in the booklet…………………………………………………..
133
Table 4.60: Summary of Output from Round 1 of Bookmark standard-setting Procedure ……..
135
xi
Table 4.61: Conversion table……………………………………………………………….
136
Table 4.62: Summary of statistics in raw score metric for round 1………………………...
137
Table 4.63: Summary of Output from Round 2 of Bookmark standard-setting Procedure……….
139
Table 4.64: Round 3 Feedback for Bookmark Standard-setting Procedure………………...
141
Table 4.65: Summary of Output from Round 3 of Bookmark standard-setting Procedure………...
143
Table 4.66: The cut scores set for the VSTEP.3-5 listening test by Bookmark method……
144
Table 4.67: The cut scores set for the VSTEP.3-5 listening test by Angoff method……….
144
Table 4.68: Comparison between the results of two standard-setting methods…………….
144
xii
LIST OF KEY TERMS
Construct: A construct refers to the knowledge, skill or ability that's being tested.
In a more technical and specific sense, it refers to a hypothesized ability or mental
trait which cannot necessarily be directly observed or measured, for example,
listening ability. Language tests attempt to measure the different constructs which
underlie language ability.
Cut score: A score that represents achievement of the criterion, the line between
success and failure, mastery and non-mastery.
Descriptor: A brief description accompanying a band on a rating scale, which
summarizes the degree of proficiency or type of performance expected for a test
taker to achieve that particular score.
Distractor: The incorrect options in multiple-choice items.
Expert panel: A group of target language experts or subject matter experts who
provide comments about a test.
High-stakes test: A high-stakes test is any test used to make important decisions
about test takers.
Inference: A conclusion that is drawn about something based on evidence and
reasoning.
Input: Input material provided in a test task for the test taker to use in order to
produce an appropriate response.
Interpretive argument: Statements that specify the interpretation and use of the
test performances in terms of the inferences and assumptions used to get from a
person’s test performance to the conclusions and decisions based on the test results.
Item (also, test item): Each testing point in a test which is given a separate score or
scores. Examples are: one gap in a cloze test; one multiple choice question with
xiii
three or four options; one sentence for grammatical transformation; one question to
which a sentence-length response is expected.
Key: The correct option or response to a test item.
Multiple-choice item: A type of test item which consists of a question or
incomplete sentence (stem), with a choice of answers or ways of completing the
sentence (options). The test taker’s task is to choose the correct option (key) from a
set of possibilities. There may be any number of incorrect possibilities (distractors).
Options: The range of possibilities in a multiple-choice item or matching tasks
from which the correct one (key) must be selected.
Panelist: A target language expert or subject matter expert who provides comments
about a test.
Performance level description: Brief operational definitions of the specific
knowledge, skills, or abilities that are expected of examinees whose performance on
a test results in their classification into a certain performance; elaborations of the
achievement expectations connoted by performance level labels.
Performance level label: A hierarchical group of single words or short phrases that
are used to label the two or more performance categories created by the application
of cut scores to examinee performance on a test.
Performance standard: The abstract conceptualization of the minimum level of
performance distinguishing examinees who possess an acceptable level of
knowledge, skill, or ability judged necessary to be assigned to a category, or for
some other specific purpose, and those who do not possess that level. This term is
sometimes used interchangeably with cut score.
Proficiency test: A test which measures how much of a language someone has
learned. Proficiency tests are designed to measure the language ability of examinees
regardless of how, when, why, or under what circumstances they may have
experienced the language.
xiv
Readability: Readability is the ease with which a reader can understand a written
text. The readability of text depends on its content (the complexity of its vocabulary
and syntax) and its presentation (such as typographic aspects like font size, line
height, and line length).
Reliability: The reliability of a test is concerned with the consistency of scoring and
the accuracy of the administration procedures of the test.
Response probability (RP) criterion: In the context of Bookmark and similar
item-mapping standard-setting procedures, the criterion used to operationalize
participants’ judgment regarding the probability of a correct response (for
dichotomously scored items) or the probability of achieving a given score point or
higher (for polytomously scored items). In practical applications, two PR criteria
appear to be used most frequently (RP50 and RP67); other PR criteria have also
been used though considerably less frequently.
Rubric: A set of instructions or guidelines on an exam paper.
Selected-response: An item format in which the test taker must choose the correct
answer from alternative provided.
Specifications (also, test specifications): A description of the characteristics of a
test, including what is tested, how it is tested, and details such as number and length
of forms, item types used.
Standard setting: A measurement activity in which a procedure is applied to
systematically gather and analyze human judgment for the purpose of deriving one
or more cut scores for a test.
Standardized test: A standardized test is any form of test that (1) requires all test
takers to answer the same questions, or a selection of questions from common bank
of questions, in the same way, and that (2) is scored in a “standard” or consistent
manner, which makes it possible to compare the relative performance of individual
students or groups of students.
xv
Test form: Test forms refer to different versions of tests that are designed in the
same format and used for different administrations.
Validation: An action of checking or proving the validity or accuracy of something.
The validity of a test can only be established through a process of validation.
Validity: The degree to which a test measures what it is supposed to measure, or
can be used successfully for the purpose for which it is intended. A number of
different statistical procedures can be applied to a test to estimate its validity. Such
procedures generally seek to determine what the test measures, and how well it does
so.
Validity argument: A set of statements that provide a critical evaluation of the
interpretive argument.
Warrant: The underlying connection between the claim and evidence in an
interpretive argument.
* These key terms are taken from the glossary provided by Cizek & Michael (2007)
and from the glossary on the website of Second Language Testing, Inc
(https://www.2lti.com/glossary/).
xvi
ABSTRACT
Standard setting is an important phase in the development of an examination
program, especially for a high-stakes test. Standard setting studies are designed to
identify reasonable cut scores and to provide backing for this choice of cut scores.
This study was aimed at investigating the validity of the cut scores established for a
VSTEP.3-5 listening test administered in early 2017 on 1562 test takers by one
institution permitted by the Ministry of Education and Training, Vietnam to design
and administer the VSTEP.3-5 tests. The study adopted the current argument-based
validation approach with a focus on three main inferences constructing the validity
argument. They were (1) test tasks and items, (2) test reliability and (3) cut scores.
The argument is that in order for the cut-scores of the VSTEP.3-5 listening test to
be valid, the test tasks and test items first needed to be designed in accordance with
the characteristics specified in the specifications. Second, the listening test scores
should be sufficiently reliable so as to reasonably reflect test-takers’ listening
proficiency. Third, the cut scores were reasonably established for the VSTEP.3-5
listening test.
In this study, both qualitative and quantitative methods were combined and
structured to back for and against the assumptions in each of these three inferences.
With regards to the first inference and second inference, an analysis of the test tasks
and the test items was conducted whereas test reliability was investigated in order to
see if it was in the acceptable range or not. In terms of the third inference about the
cut scores of the VSTEP.3-5 listening test, Bookmark standard setting method was
implemented and the results were compared with those currently applied for the
test. This study offers contributions in three areas. First, this study supports the
widely-held notion of validity as a unitary concept and validation is the process of
building an interpretive argument and collecting evidence in support of that
argument. Second, this study contributes towards raising the awareness of the
xvii
importance of
evaluating the cut scores of the high stakes language tests in
Vietnam so that fairness can be ensured for all of the test takers. Third, this study
contributes to the construction of a systematic, transparent and defensible body of
validity argument for the VSTEP.3-5 test in general and its listening component in
particular. The results of this study are helpful in providing informative feedback to
the establishment of the cut scores for the VSTEP.3-5 listening test, the test
specifications, and the test development process. The positive results can provide
evidence to strengthen the reasonableness of the cut scores, the specifications and
the quality of the VSTEP.3-5 listening test. The negative results can give
suggestions for changes or improvement in the cut scores, the specifications and the
design of the VSTEP.3-5 listening test.
xviii
- Xem thêm -