This study investigated the influence of topics and raters on the speaking scores of 20 sophomores at FELTE, ULIS, VNU, as well as whether or not there exist differences between the two raters in the scoring process. The speaking topics were derived from an English test preparation course that these students are currently studying, and its format was the same as part 3 of the IELTS speaking test. A survey and interviews were utilized to collect the data, which were then analysed by paired-samples T-Test and content-based analysis respectively. The analysis of the result revealed that no significant differences were detected between the two raters’ scores of the candidates. However, the change in topics might influence the scores of the candidates. In addition, interviews with raters indicated that there exist huge differences between the two raters in their rating process, yet surprisingly, these differences did not seem to exert any significant impact on the scores.
VIETNAM NATIONAL UNIVERSITY, HANOI
UNIVERSITY OF LANGUAGES AND INTERNATIONAL STUDIES
FACULTY OF ENGLISH LANGUAGE TEACHER EDUCATION
GRADUATION PAPER
AN INVESTIGATION INTO STUDENTS’
VARIATION OF SPEAKING SCORES WITH
DIFFERENT RATERS AND TOPICS
A CASE STUDY: SECOND-YEAR ENGLISHMAJORED STUDENTS AT A LANGUAGE
UNIVERSITY
Supervisor: Dương Thu Mai
Student: Ngô Phương Nga
Course: QH2014.F1.E6
HÀ NỘI – 2018
ĐẠI HỌC QUỐC GIA HÀ NỘI
TRƯỜNG ĐẠI HỌC NGOẠI NGỮ
KHOA SƯ PHẠM TIẾNG ANH
KHÓA LUẬN TỐT NGHIỆP
NGHIÊN CỨU VỀ SỰ DAO ĐỘNG ĐIỂM BÀI THI
NÓI CỦA HỌC SINH VỚI CÁC CHỦ ĐỀ KHÁC
NHAU VÀ NGƯỜI CHẤM KHÁC NHAU: MỘT
NGHIÊN CỨU VỀ SINH VIÊN NĂM THỨ HAI
CHUYÊN NGÀNH TIẾNG ANH TẠI MỘT
TRƯỜNG ĐẠI HỌC ĐÀO TẠO NGOẠI NGỮ
Giáo viên hướng dẫn: Dương Thu Mai
Sinh viên: Ngô Phương Nga
Khóa: QH2014.F1.E6
HÀ NỘI – 2018
ACCEPTANCE PAGE
I hereby state that I: Ngo Phuong Nga, QH2014.F1.E6SP, being a candidate
for the degree of Bachelor of Arts English Language Teacher Education accept
the requirements of the College relating to the retention and use of Bachelor’s
Graduation Paper deposited in the library.
In terms of these conditions, I agree that the origin of my paper deposited in
the library should be accessible for the purposes of study and research, in
accordance with the normal conditions established by the librarian for the
care, loan or reproduction of the paper.
Signature
………………………………
May 2018
ACKNOWLEGEMENTS
Firstly, I would like to express my sincere gratitude to my supervisor,
Ms.Duong Thu Mai, Ph.D., for her patience, profound knowledge, as well as
her whole-hearted assistance in my researching and writing time. I could not
imagine having a better supervisor for my thesis than her.
Also, I would like to grasp this opportunity to send my thanks to the two
teachers at FELTE, ULIS, VNU for their enthusiastic participation in my study,
and for the incentive they offered throughout the realization of this paper.
Besides my supervisor and teachers, I would like to send my thanks to
the sophomores at FELTE, ULIS, VNU for their zeal in participating in the
research.
Last but not least, my deepest sincere thanks goes to my friend and my
family, especially my grandmother for her tremendous support and
encouragement. Without them, this research could not be fulfilled.
i
ABSTRACT
This study investigated the influence of topics and raters on the speaking
scores of 20 sophomores at FELTE, ULIS, VNU, as well as whether or not
there exist differences between the two raters in the scoring process. The
speaking topics were derived from an English test preparation course that these
students are currently studying, and its format was the same as part 3 of the
IELTS speaking test. A survey and interviews were utilized to collect the data,
which were then analysed by paired-samples T-Test and content-based analysis
respectively. The analysis of the result revealed that no significant differences
were detected between the two raters‟ scores of the candidates. However, the
change in topics might influence the scores of the candidates. In addition,
interviews with raters indicated that there exist huge differences between the
two raters in their rating process, yet surprisingly, these differences did not
seem to exert any significant impact on the scores.
ii
TABLE OF CONTENTS
ACKNOWLEGEMENTS ....................................................................................... i
ABSTRACT ............................................................................................................ ii
TABLE OF CONTENTS ...................................................................................... iii
LIST OF ABBREVIATIONS ............................................................................... vi
LIST OF FIGURES .............................................................................................. vii
LIST OF TABLES ............................................................................................... viii
PART A: INTRODUCTION ................................................................................. 1
1. Rationale ......................................................................................................... 2
2. Statement of research problem & questions ................................................... 3
3. Scope of the research ...................................................................................... 4
4. Significance .................................................................................................... 4
5. Organization of the study ............................................................................... 5
PART B: DEVELOPMENT .................................................................................. 6
CHAPTER I. LITERATURE REVIEW .............................................................. 7
1. Performance-based assessment ...................................................................... 7
1.1. Definitions of performance-based assessment ........................................... 7
1.2. Characteristics of performance-based assessment ...................................... 8
2. Assessing the speaking skills ......................................................................... 8
2.1.
Spoken versus written language .............................................................. 8
2.2.
Model of oral assessment ...................................................................... 10
2.3.
Oral assessment process ........................................................................ 13
3. Topics in oral assessment ............................................................................. 18
3.1. Definition of topics ................................................................................... 18
3.2. The importance of topics in oral assessment ............................................ 18
iii
3.3. Related studies on topic influence in oral assessment .............................. 19
4. Raters in oral assessment.............................................................................. 20
4.1. The raters factor in second language performance-based assessment...... 20
4.2. Definition of rater reliability in language assessment .............................. 21
4.3. Factors that affect rating operation ........................................................... 22
4.4. Rater effects .............................................................................................. 24
4.5. Related studies on inter-rater reliability and rater effects in assessment.. 27
5. Chapter summary ......................................................................................... 28
CHAPTER II. METHODOLOGY ..................................................................... 30
1. Research questions .......................................................................................... 30
2. Research participants and the selection of participants .................................. 31
3. Data collection instruments ............................................................................. 33
4. Data collection procedure ............................................................................... 34
5. Data analysis procedure .................................................................................. 36
6. Chapter summary ............................................................................................ 40
CHAPTER III. FINDINGS AND DISCUSSION .............................................. 41
1. Research question 1: The students‟ speaking competence ............................. 41
2. Research question 2: The variation of the students‟ scores under the
influence of different topics ................................................................................ 44
3. Research question 3: The variation of the students‟ scores under the
influence of different raters ................................................................................. 46
5. Research Question 4: Differences between raters in the rating operation ...... 51
6. Chapter summary ............................................................................................ 53
PART C: CONCLUSION .................................................................................... 55
1. Summary of the findings and discussion ........................................................ 56
2. Implications ..................................................................................................... 57
iv
3. Limitation of the study .................................................................................... 57
4. Suggestions for further research ..................................................................... 58
REFERENCES ..................................................................................................... 59
APPENDICES....................................................................................................... 63
APPENDIX 1: Questions for the speaking test .................................................. 63
APPENDIX 2: Speaking scores of the test-takers .............................................. 64
APPENDIX 3: INTERVIEW FORM ................................................................. 77
v
LIST OF ABBREVIATIONS
CEFR
The Common European Framework of Reference for
Languages
IELTS
International English Language Testing System
FELTE
Faculty of English Language Teacher Education
ULIS
University of Languages and International Studies
VNU
Vietnam National University, Hanoi
vi
LIST OF FIGURES
Figure 1. A conceptual framework for performance testing (Milanovic &
Saville, 1996) .......................................................................................................... 10
Figure 2. An expanded model of speaking test performance (Fulcher, 2003) ...... 12
Figure 3. A framework for describing the construct definition for a test of
second language speaking (Fulcher, 2003) ............................................................ 16
Figure 4. Traditional fixed response assessment and assessment involving
judgement (McNamara, 1996) ............................................................................... 21
Figure 5..IELTS and the CEFR ("Common European Framework", n.d.) ........... 38
Figure 6. Descriptive statistics for the scores of topic 1 by rater 1 ....................... 41
Figure 7. Descriptive statistics for the scores of topic 2 by rater 1 ....................... 41
Figure 8. Descriptive statistics for the scores of topic 1 by rater 2 ....................... 42
vii
LIST OF TABLES
Table 1. .................................................................................................................. 37
Interpretation of Correlation Coefficient (Zou, Tuncali, & Silverman, 2003) ....... 37
Table 2. .................................................................................................................. 43
Descriptive statistics for students‟ speaking competence ...................................... 43
Table 3. .................................................................................................................. 44
Paired Samples Statistics for two topics by Rater 1 ............................................... 44
Table 4. .................................................................................................................. 44
Paired Samples Test Result for two topics by Rater 1 ........................................... 44
Table 5. .................................................................................................................. 45
Paired Samples Statistics for two topics by Rater 2 ............................................... 45
Table 6. .................................................................................................................. 46
Paired Samples Test Result for two topics by Rater 2 ........................................... 46
Table 7. .................................................................................................................. 47
Pearson correlation for topic 1 ............................................................................... 47
Table 8. .................................................................................................................. 47
Pearson correlation for topic 2 ............................................................................... 47
Table 9. .................................................................................................................. 48
Paired Samples Statistics for topic 1 by two raters ................................................ 48
Table 10. ................................................................................................................ 49
Paired Samples Test Result for topic 1 by two raters ............................................ 49
Table 11. ................................................................................................................ 49
Paired Samples Statistics for topic 2 by two raters ................................................ 49
Table 12. ................................................................................................................ 50
Paired Samples Test Result for topic 2 by two raters ............................................ 50
viii
PART A: INTRODUCTION
This initial part of the study aims to cover the background of the study,
as well as its scope and significance. Also, the three research questions for this
study will be mentioned. This chapter ends with the organization of the study to
equip readers with better orientation and understanding of the structure.
1
1.
Rationale
Performance-based assessment has gained more popularity in English
teaching and learning community over the last few decades as a set of
strategies for knowledge and skills acquisition is represented through
meaningful task performance (Hibbard, 1996). The importance of such
assessment is also demonstrated by Baker, Oneil, & Linn (1993) when these
authors claimed that this type of test format had been considered to be the
“centerpiece of effort” or “the focal point” of some large-scale reforms. At the
same time, McNamara (1996) has made it much clearer by pointing out the
dominance of performance-based assessment over the traditional ones.
Oral interviews are a prime method of performance-based assessment.
Because of their usefulness in reinforcing the importance of communicative
language teaching and assessment, they are now being used in a number of
classrooms all over the world. Bachman & Palmer (1996) emphasized the high
degree of content and face validity of such assessment.
A great number of factors can potentially affect the results of the oral
interviews. First, despite the assumed equivalent difficulty of different topics
in the same speaking test, a plethora of studies have been conducted to point
out some topic-related problems, including the topics themselves, the interest
of the examinees, their opinions about the topics, or their prior knowledge of
the topic, which could present the unfair advantages or disadvantages in
scoring to certain groups of test-takers (Jenning et al., 1999). Bachman (1990)
has stated that topic was one of the elements in the testing environment facet
which could affect performance on a language test. In another work by Nguyen
& Tran (2015) which studied two hundred and three students and ten English
teachers, topical knowledge was also mentioned as one important factor
exerting a certain influence on the performance of the test-takers.
Papajohn (2002) stated that not only the performances of the examinees
on the test but also the interpretation of the raters could pose an impact on the
2
final score of the examinees. The rater effects is another variable in the rating
operation (Myford & Wolfe, 2015). As the oral interviews are graded by
examiners, a certain degree of subjectivity is included. Cronbach (1990)
mentioned the “complex and error-prone cognitive process” that every single
rater undergoes during the scoring procedure. There is no guarantee that
different raters could grade the same examinees similarly. Such problem raises
the question of inter-rater reliability: whether the raters assign the same score
to the same candidate.
In Vietnam, there exists a notion that many Vietnamese students who
could perform well in English written tests, still suffer from the
underperformance in the speaking tests. A significant number of them find
presenting orally really challenging. This scenario even occurs in some leading
institutions of foreign languages like the University of Languages and
International Studies (ULIS) when the average speaking band score of students
is quite disappointing in comparison with their reading, listening and writing
scores. Although many studies about this problem have been conducted
worldwide, few studies have addressed students in Vietnam, and none have
related to ULIS students.
All the aforementioned reasons granted me to conduct this research with
the title: “An investigation into students’ variation of speaking scores with
different raters and topics. A case study: Second-year English-majored
students at a language university”.
This study is going to review the abovementioned literature of content
to define the research questionnaires and focus on the two main factors: topics
and raters to discover whether the scores of some sophomores at a language
university actually vary under their influences.
2.
Statement of research problem & questions
This research aims to observe and analyze the variation of scores of
ULIS sophomores, which have not been studied intensively, in a speaking task
with different topics and different raters. Therefore, this paper aspires to raise
3
awareness of the importance of topics and raters in the speaking test, thus helps
teachers to better their perceptions and encourages more training. In brief, the
study purported to address the following questions:
(1) What is the students‟ speaking competence in this test?
(2) To what extent do the students‟ scores vary with different topics?
(3) To what extent do the students‟ scores vary with different raters?
(4) Are there differences between raters in the rating operation?
3.
Scope of the research
First, this research placed the main focus on the variation of scores of
students in relation with different topics and raters. The underlying reason for
this is that although there exist a number of elements affecting the scores of the
students, according to Eckes (2011), topics and raters factor are the two most
prominent elements influencing the reliability of the test. Also, another reason
shelters in the scale of a Bachelor thesis and time constraint.
In addition, it is noteworthy that the samples of the study were restricted
to second-year students majoring in English language teacher education in
ULIS, VNU, who are expected to reach the B2 level of English Proficiency in
CEFR after their second year. Nevertheless, the survey results would be as
considerably representative all second-year students in ULIS, VNU as
possible.
4.
Significance
This research makes a contribution to the body of research on factors
affecting scoring in Vietnam context. That teachers are aware of the
importance of topics, and raters in grading students‟ speaking performances
could put emphasis on a more careful choice of topics for the speaking tests.
Otherwise, the final results of the students will suffer. At the same time, it is
anticipated that the discoveries from this study would encourage more teacher
training sessions with a view to minimizing any effects teachers and topics
exert on speaking assessment.
4
On top of that, the research is also expected to not only serve as a useful
reference material for teachers and students at ULIS but also lay the foundation
for further research on the same topic.
5.
Organization of the study
The study is composed of 3 main parts:
PART A. INTRODUCTION
This chapter is the presentation of basic information such as the
statement of the problem, rationale, scope, aims and objectives as well as the
organization of the study.
PART B. DEVELOPMENT
Chapter I. Literature review
This chapter conceptualizes the framework of the study by discussing
the literature relating to performance-based assessment, oral assessment, topics
and raters factors in oral assessment.
Chapter II. Methodology
This chapter features the context and the methodology of the study,
which includes sampling, the data collection instruments, data collection
procedure as well as data analysis.
Chapter III. Findings and discussion
This chapter focuses on the analysis of the data and discusses the results
of the study.
PART C. CONCLUSION
This chapter summarizes the findings, offers some limitations and gives
recommendations for further study.
5
PART B: DEVELOPMENT
6
CHAPTER I. LITERATURE REVIEW
This chapter is an attempt to establish the theoretical backgrounds on
performance-based assessment in general and oral assessment in particular.
Then, the key concept of topics and raters in oral assessment, as well as factors
that affect the rating procedure will be reviewed before some related studies
worldwide and nationwide are mentioned.
1.
Performance-based assessment
1.1. Definitions of performance-based assessment
There are many researchers defining performance-based assessment.
Deville (1995) claimed that performance-based assessment required students to
produce complex responses together with many integrating skills and
knowledge. At the same time, it is also of great importance for them to apply
such skills to the real-life situation. Fitzpatrick and Morrison (1971) also
concurred with Chalhoub-Deville in that some criteria in the performancebased assessment were made more simulated in comparison with the traditional
paper-and-pencil test. Exploring the distinction between the performance-based
test and other types of assessment, they also claimed that apart from being
close to the reality, the performance-based test demonstrated almost no
absolute differences with its counterparts.
For Haertel (1992), there are two definitions of performance-based
assessment: The narrow or the strong one and the broad or the weak one.
Referring to the narrow definition which focuses mainly on the task completion
of the examinees, Haertel noted that a performance test was “any test in which
the stimuli presented or the response elicited emulate some aspects of the nontest settings”. On the other hands, it is the real language ability of the
examinees, not the task completion, is of greater importance in the broad
definition.
The way performance-based assessment was defined by Haertel can
7
cover the ideas of aforementioned researchers, thus, it would be adopted in this
research.
1.2. Characteristics of performance-based assessment
A distinctive characteristic of performance-based assessment is that
instead of recalling the abstract knowledge like in the traditional paper-andpencil test, learners are actually required to perform some relevant tasks
(McNamara, 1996). Apart from that, while the traditional ways of assessment
emphasize the relationship between the test-takers and the test instruments, one
more element named raters, who grade the performances of the students based
on rating scales, is added in the performance-based assessment process.
McNamara (1996) also stated that such element would make the interaction
among all elements much more complicated than before.
2.
Assessing the speaking skills
2.1. Spoken versus written language
Speaking skills can be considered to be one of the most important skills
in language acquisition process. Although the definition of speaking appears to
be quite familiar to most people, different researchers have their own ways to
define it. Bygate (1987, as cited in Mazouzi, 2013) contended that oral
language was utilized to deliver the intended messages of the speakers to
the listeners. Such messages could be “ideas, intention, thoughts and feelings”.
Later on, Fulcher (2003) defined speaking as “the verbal use of language to
communicate with others”, whereas Hedge (2000, as cited in Mazouzi, 2013)
suggested speaking was the skills which could be considered as one aspect to
judge people during their first impression. In other words, speaking skills is of
vital importance for not only mother tongue but also second and foreign
languages.
Byrne (1987) also concurred with Hedge that speaking was a “two-way
process” which not only included speakers and listeners but also involved the
8
use of both productive skill (speaking) and receptive skill (listening). In other
words, it seems that speaking is an interactive process of both speakers and
listeners. This notion is also shared by Thornbury in 2005. He stated that
because most of the speaking was face-to-face dialogue, interaction was certain
to ensue in both monologic speaking and conversation. However, according to
him, there were two more aspects, apart from the interaction, that needed to be
considered in managing the talk. They are turn-taking and paralinguistics. In
terms of turn-taking, speakers take turn to hold the “floor”, which can be
defined as the right to speak. This denotes the rule that no two speakers should
speak at once, at least, not for any sustained period of time. In addition,
Thornbury (2005) also mentioned paralinguistics - the interactional use of eye
gaze and gestures. According to him, in communicating with others, people
often used eye-contact, facial expressions, body language, pauses, tempo and
pitch variation to express their emotions and ideas. In other words, speaking is
an interactive and multi-sensory activity performed by both speakers and
listeners.
There exist a number of studies on the distinction between speaking and
writing. Nevertheless, Akinnaso (1982) believed that there was no agreement
on the exact differences. In 1992, Hatch told spoken from written discourse
production by three main aspects, including planning, contextualization, and
formality. He concluded that speech
was
more unplanned, highly
contextualized and informal than writing. That is, simple words, phrases or
fillers such as “you know”, “kind of” are usually utilized during the
conversation. Disagreeing with Hatch, Mazouzi (2013) claimed that spoken
language differed from written language in the concept of durability.
Specifically, when people communicate, their words only last for few seconds,
whereas on the other hands, anything written down can last much longer.
It is possible to conclude that the significant differences between
speaking and writing might shelter in planning, contextualization, formality
and durability. Under no circumstances should spoken language be
underestimated as it deserves to be regarded as a pivotal element of language
9
- Xem thêm -