ASSESSMENT OF TERTIARY ENGLISHMAJOR STUDENTS5WRITING: VIETNAMESE
TEACHERS,
PERSPECTIVES
- By
NGUYEN TRA M Y
B achelor o f Arts
H anoi U n ive rsity o f Foreign Studies
H anoi, Vietnam
1998
A thesis submitted in partial fu lfilm e n t o f the requirements fo r the degree o f
Master o f Education (TE S O L ֊ International),Faculty o f Education,Monash
U nive rsity, M elbourne,A ustralia
December,2003
llllllllllllllllllll
000018393
դ
----------------------------- ^
խ TRUNG ĨẤM
^
—
1HỞN6T1N THI/VIÊN
n ik A ĩ.5 ..e .ỏ
------------------------------
ձ
TABLE OF CONTENTS
Abstract ......................................................................................
A cknowledgements....................................................................
Declaration ...............................................................................
List o f Tables..............................................................................
CHAPTER ONE: INTRODUCTION................................
1.1 B a c k g r o u n d o f t h e r e s e a r c h ..........................................
1.2 R e s e a r c h a i m s .............................................................................
1.3 O u t l in e o f t h e t h e s i s ..............................................................
CHAPTER TWO: REVIEW OF THE LITER ATU R E......
4
2.1 C o m m u n ic a t iv e c o m p e t e n c e ..............................................
4
2 .2 P r in c ip l e s o f C o m m u n ic a t iv e L a n g u a g e T e s t in g
Ớ
5
7 1
2 .4 S u m m a r y o f t h e c h a p t e r .............՜.....................................
9
.7
2.3.1 Aspects in testing w ritin g .....................................
2.3.2 M arking scheme....................................................
2.3.3 D ifficulties in assessing w ritin g ...........................
9
:.
2 .3 I s s u e s in t e s t in g w r it in g s k i l l s ..................................... .
9
2.2.1 V alidity ..................................................................
2.2.2 R elia bility ..............................................................
2.2.3 P racticality ............................................................
շ
о
9
CHAPTER THREE: M ETH O D O LO G Y..........................
0
2
2
113
3 .2 S e l e c t io n o f p a r t ic ip a n t s ..................................................
2
3 .1 A QUALITATIVE APPROACH..........................................................
3 .3 M e t h o d s f o r d a t a c o l l e c t io n ........................................
4 .2 T e a c h e r s 5 p e r s p e c t iv e s o n a s s e s s m e n t c r i t e r i a .............
4 .3 C h a r a c t e r is t ic s o f a “ g o o d ” a r g u m e n t e s s a y .................
4 .4 F a c t o r s a f f e c t in g t e a c h e r s ' a s s e s s m e n t j u d g e m e n t s
>
7
(
р
,
з
і
7
4
4.3.1 Purpose ...........................................................................
4.3.2 Thesis..............................................................................
4.3.3 Evidence ..........................................................................
4.3.4 Refutation ........................................................................
4.3.5 Persona ...........................................................................
>
7
^
4 ^^ *
* ^ ^^ « ^*
յ
Հ Հ
4.2.1 Process o f developing criteria checklist........................
4.2.2 Process o f scoring ..........................................................
л 2
2
4.1.1 Assessment practices ......................................................
4.1.2 W riting syllabus .............................................................
5 *
2 о
2
V i e t n a m e s e E n g l i s h m a j o r u n i v e r s i t i e s ........................................
2
4.1 A n o v e r v ie w o f a s s e s s m e n t p r a c t ic e s a n d w r it in g s y l l a b u s in s o m e
2
՛
5 2
CHAPTER FOUR: FINDINGS AND DISCUSSION......
շ
3 .4 M e t h o d s f o r d a t a a n a l y s i s .............................................
4.4.1 The influence o f raters culturally-based perspectives and norms.............. 47
4.4.2 The influence o f the marking scheme.............................................................49
7
4 .5 A s s e s s m e n t c r i t e r i a ................. ................................................................................................ 4 9
4 .6 T h e u s e f u l n e s s o f t h e p r o c e s s o f d is c u s s io n o f a s s e s s m e n t c r it e r ia ....5 3
CHAPTER FIVE: CONCLUSION AND RECO M M EDATIONS............................55
5.1 S u m m a r y o f t h e s t u d y ............................................................................................................... 55
5 .2 R e c o m m e n d a t i o n s ......................................................................................................................... 5 6
5 .3 L im it a t io n s ......................................................................................................................................... 57
5 .4 D i r e c t i o n s f o r f u t u r e r e s e a r c h ..........................................................................................57
REFERENCES........................................................................................ ....................59
A p p e n d ix 1 :
A n d e r s o n ’s MARKING SCHEME.......................
66
A p p e n d ix 2 :
J a c o b s e t AL.’s s c o r i n g p r o f i l e .................
68
A p p e n d ix 3: A COMPARISON OF TWO APPROACHES...........
69
A p p e n d ix 4 :S t u d e n t s ' w r i t i n g t e x t s ................................
70
A p p e n d ix 5: R e f l e c t io n q u e s t i o n s ......................................
80
A p p e n d ix 6 :
R e f l e c t i o n a n s w e r s .........................................
81
A p p e n d ix 7 :
E n g l i s h T r a n s l a t i o n o f I n t e r v i e w s ••••..
83
A p p e n d ix 8 :
V ie tn a m e s e T r a n s c r ip ts o f I n te r v ie w s
02
A bstract
Direct tests seem to increasingly become popular in Vietnam as getting students’
to write is the best way to test their writing ability (Hughes, 2003). One o f the
most significant challenges in assessing writing is the subjectivity o f judgements
and ensuring that these judgements are consistent. Unfair decisions may affect
individuals’ lives (Hughes, 1989). For this reason, the research was carried out in
order to explore how teacher raters make their scoring judgments, to develop
collaboratively a set o f criteria in a checklist through which teachers’ assumptions
about 'good’ w riting were revealed and to gather teachers’ perspectives on the
usefulness o f the process for a reliable and valid scoring.
Five Vietnamese teachers, who are pursuing their Master degrees in Melbourne,
Australia, and who teach in universities with English - major courses, were
involved in this study. A criteria checklist was first developed by two experienced
teachers. A workshop was then held with the attendance o f five teachers applying
the checklist to score the sample essay. Changes were made after the discussion o f
the marking and a new criteria checklist was established. The agreed upon criteria
checklist was used to rate four essays. The participants were then asked to provide
their reflections in w riting on the usefulness o f this process o f training for their
work as raters.
The findings showed that inconsistency among raters in scores exists even when
there was a shared criteria checklist. There was a change in consistency across
raters (inter-rater reliability) after the discussion o f the marking o f the sample
essay. Nonetheless, the question o f how much raters scale up or down in their
grading is still challenging. Raters’ cultural perspectives (the norm, Western style
or Oriental style, that raters favour) and the rating scheme (the holistic or
analytical scoring) also influence teachers’ judgments. A ll o f these would be
improved through on-going rater training and moderation and the development o f
a more detailed criteria checklist. Also, the characteristics o f a ‘good’ argument
essay (the w riting genre that was assessed in this study) and the usefulness o f the
discussion workshop were presented through the teachers’ perspectives. Finally,
in response to the findings, a holistic criteria checklist was developed with the
Vietnamese ten-point scale and level descriptors.
ACKNO W LEDG EM EN TS
I would like to express my deepest gratitude to my supervisor, Mrs. Rosemary Viete
for her whole-hearted assistance during the development o f this research thesis under
her supervision, without which this research thesis could not have been completed.
I am especially indebted to my family, my husband for their constant encouragement
during the course o f my study.
I am also very much grateful to the teacher participants without whom this thesis
would have been impossible.
I sincerely express my great thanks to Mr. Le Thanh Dzung, Dean o f English
Department - Hanoi University o f Foreign Studies for his kind assistance.
M y special thanks also go to Dr. Sophie Arkoudis,Department o f Language, Literacy
& Arts Education - University o f Melbourne for her valuable help.
Finally, I also would like to convey my gratitude to my colleagues who have helped
me in various ways.
Declaration
This research thesis contains no material which has been accepted for the award o f
any other degree or diploma in any university or tertiary institution, and to the best o f
my knowledge and belief, neither does it contain material previously published or
written by another person, except where due acknowledgement is made in the text.
Signed
Full name: Nguyen Tra M y
The plan for this research was approved by the Standing Committee on Ethics in
Research Involving Humans on 6 August, 2003 (Reference 2003/524).
List o f Tables
Page
•
Table 2.1.
A sample holistic scale
17
•
Table 2.2.
A sample analytical scale
18
•
Table 4.1.
The calculation o f students’ final results
26
•
Table 4.2.
Classifying system
26
•
Table 4.3.
Criteria checklist set up by two experienced teacher
participants
29
•
Table 4.4.
Scores on essay 4
30
•
Table 4.5.
The newly amended criteria checklist
36
•
Table 4.6.
The scores
36
•
Table 4.7.
Summary o f agreements and disagreements
discussion
in the
37
•
Table 4.8.
Scores on essay 3
38
•
Table 4.9.
Scores on essay 1
40
•
Table 4.10.
A holistic marking scheme
51
iv
C H A PTER ONE: INTRO DUCTIO N
1.1 Background of the research
The major language o f international communication for the Socialist Republic o f
Vietnam was Russian from 1954 until the recent political changes in Eastern and
Central Europe. For the South o f Vietnam, French was the first foreign language (this
area was under French occupation) till 1954,and then English (due to the fact that the
US involved in Vietnam war) until the reunification o f the country in 1975. After
reunification, Russian was the first national foreign language for a number o f years,
and little attention was paid to the teaching o f either English or French (Do, 1999;
Nguyen and Crabbe,1999). In the context o f political renovation and the open-door
policy pursued by the Vietnamese government in the past decade, English has become
the first foreign language. In recent years, Vietnam has extended its political,
diplomatic and economic relationship with other countries and consequently, it has
witnessed an explosion in the demand for English (Brogan and Nguyen, 1999). With
the move to a market economy by the Vietnamese government and the growth o f
international business as well as an increasing number o f foreign tourists, knowledge
o f English has become the passport to a better-paid job not only in the tourism and
hospitality industries, but also in many other enterprises (Nguyen and Crabbe, 1999).
The spread o f English as a global means o f communication has had much impact on
the English language teaching and learning in Vietnam. Language testing which often
goes in line w ith language teaching and learning is o f high importance. It works as the
motivation for teaching and learning processes, measures learners’ levels and has an
influence on the curriculum since designers might revise program goals and
objectives in the on-going development o f the curriculum (Brown, 1995). A number
o f studies have been conducted in the field o f testing in general and in testing writing
in particular (Freedman, 1979; Hamp-Lyons, 1991; Vaughan, 1991; O ’Loughlin ,
1992; О 5Hagan, 1999; Lumley, 2002; Weigle , 2002). W riting assessment in the
context o f Vietnam, however, seems to be largely unexplored. The fact has prompted
me to conduct research on the assessment o f writing, a relatively subjective
assessment, in Vietnamese universities.
Chapter One
1
In terms o f subjectivity, direct tests were discouraged and avoided as reliability
dominated in language testing in the past. This was true in the world in the 1950s and
1960s (McNamara, 2000: 38) and still is in some Vietnamese universities as I have
observed. Grammatical structures and knowledge o f vocabulary were assessed instead
o f writing skills. However, in many universities nowadays direct tests are becoming
common. This gives rise to issues o f reliability and validity. The problem o f
subjectivity has increasingly been recognised to be “ something that had to be faced
and managed,
,in direct tests (McNamara, 2000: 38). This strengthens my wish to do
research in this field.
I wish to understand which criteria Vietnamese teachers raters have employed in
assessing students’ writing, how much weight they give for each criteria and to what
extent their perspectives on assessment criteria are similar or different, which might
explain degrees o f disagreement and discrepancy o f the final scores for a piece o f
writing among raters.
Also, my interest lies in the factors that affect teachers’ assessment judgements. I am
inspired to know which norms Vietnamese teacher raters favour in their assessment
since both Western w riting style and Oriental w riting style, linear and circular
respectively, according to views expressed in Liddicoat (1997) might well be
observed in students’ writing. M y study, in addition, involves identifying which
marking schemes {holistic or analytical marking) teachers favour and how these
reflect students’ best abilities.
Finally, teachers’ perspectives on the usefulness o f discussion o f assessment criteria, a
kind o f moderation, are explored. The fact that we young teachers are often not given
the assessment guidelines and training prompted me to offer and investigate teachers’
perceptions o f this process o f training cum moderation.
1.2 Research aims
The purpose o f this research is to: 1) find out the assumptions English teachers in
Vietnamese universities share about ‘good w riting’ for an argument essay; 2) have
participants collaboratively develop a set o f criteria for scoring such writing; 3) identify
Chapter One
2
the basis on which teachers make scoring judgements against these criteria, and 4) find
out teachers’ perceptions o f the usefulness o f the process as a tool for more reliable and
valid training and scoring.
1.3 Outline of the thesis
This thesis consists o f five chapters. Chapter One is the introduction and the research
aims. Chapter Two reviews the literature on communicative language testing and the
testing o f writing. Chapter Three presents the qualitative methodology used for the
research with the focus on in-depth interviews and open-ended questionnaires.
Chapter Four deals with the discussion o f the findings. The summary o f the findings
and the recommendations for teachers to be better supported in assessing writing
performance are presented in Chapter Five. Following the chapters are References and
Appendices.
Chapter One
3
C H A PTER TW O: REVIEW OF THE LITERATURE
In this part, I w ill deal w ith the notion o f communicative competence, which is
considered the framework for communicative language testing, the principles o f
which w ill then be presented with three fundamental criteria: validity, reliability and
practicality. Other issues in language testing w ill also be looked at. Lastly, I w ill
mention issues in testing w riting skills, which are relevant to the focus o f my research.
2.1 Communicative competence
Communicative language teaching (CLT) was devised in the late 1960s to satisfy the
new demands o f using English (Soler and Guzman, 2000).
Communicative
competence, a principal concept o f this approach, has generated a number o f
discussions around its definition. Many authors have mentioned the distinction
between “ competence,
, and “ performance,
,
.
Savignon (1983: 9) argued that
“ competence is what one knows. Performance is what one does,
,
. Kempson (1977:
cited in Canale and Swain, 1980) claimed that competence is identified as the
language users’ knowledge and performance is the study o f the use o f that knowledge.
Canale and Swain (1983) later developed a framework o f communicative competence
consisting o f four aspects:
1. Grammatical competence includes those competences involved in language
use, i.e. the knowledge o f such linguistic aspects as lexicology, morphology,
syntax, phonetics and phonology;
2. Sociolinguistic competence refers to control o f the conventions o f language
used that are determined by the features o f the specific language use;
3. Discourse competence means the mastery o f how to combine grammatical
forms and meanings to achieve unity o f a spoken or written text in different
genres; and
4. Strategic competence is defined as the mastery o f verbal and non-verbal
communication
strategies
used
to
compensate
for
breakdowns
in
communication, and to enhance the effect o f utterance.
(Adapted from Savignon, 1983; Bachman, 1990; Berns, 1990 and Shaw, 1992)
Chapter Two
4
Canale and Swain (1980) also demonstrated implications for a communicative testing
programme as follows:
communicative testing must be devoted not only to what the learner knows
about the second language and about how to use it (competence) but also to
what extent the learner is able to actually demonstrate this knowledge in a
meaningful communicative situation (performance). (34)
The notion o f communicative competence can be taken into account by test designers
in terms o f test content and test methods. It is also useful in working out the criteria
and the marking scheme.
2.2 Principles of Communicative Language Testing
Three basic considerations in language testing that are mentioned by a number o f
researchers (Hughes, 1989; Bachman, 1990; Weir, 1993; McNamara, 2000) are
validity, reliability and practicality.
2.2.1 Validity
Validity is defined as whether the test measures what it is meant to measure (Weir,
1990). Weir (1990) demonstrated five sub-components o f validity: construct validity,
content validity, face validity, wash-back validity and criterion-related validity. Other
kinds o f validity were also mentioned such as concurrent validity, predictive validity
(Bachman, 1990; Davies, 1990) ,operational validity (Viete, 1992) and consequential
validity (McNamara, 2000). A ll subgroups o f validity w ill be discussed below.
Bachman (1990: 255) argued that construct validity deals w ith “ the extent to which
performance on tests is consistent with predictions that we make on the basis o f a
theory o f abilities” . Hughes (1989), Davies (1990) and Weir (1990) also share the
same views on the notion o f construct validity.
Content validity, according to Anastasi (1982: 131,cited in Weir, 1990: 25),is
defined as “ the systematic examination o f the test content to determine whether it
covers a representative sample o f the behaviour domain to be measured” . The
argument o f the relevance o f the test content is discussed by Bachman (1990) and
Chapter Two
5
McNamara (2000). The former holds the view that content validity involves content
relevance and content coverage, as agreed by Davies (1990). The latter argues that
“judgements as to the relevance o f content are often quite complex, and the validation
effort is accordingly elaborate.” (51).
McNamara (2000: 133) defined face validity as “ the extent to which a test meets the
expectations o f those involved in its use, e.g. administrators, teachers, candidates and
,
. W eir (1990) argued that students would not perform at their best in
test score users,
the absence o f face validity. It, however, must be the first one to be neglected i f there
exists a conflict between it and any o f the other validities (Davies, 1990).
Another type o f validation is washback validity, which refers to the influence o f the
test “ on the teaching and learning that precedes it” (Weir, 1990: 27). It is appreciated
that i f language teachers equip students with skills relevant to present and future
needs and the test is designed to reflect these, the relationship between the test and the
teaching that precedes it w ill become closer.
Criterion-related validity demonstrates the relationship between test scores and a
suitable criterion o f performance (Bachman, 1990; Weir, 1990). Concurrent validity
(examining the correlation between test scores and another measure o f performance,
usually an older established test) and predictive validity (concerning whether test
scores can predict future performance) are two types o f criterion-related validity
(Bachman, 1990; Weir, 1990).
Operational validity describes “ the relationship between the ‘real w orld’ performance
and the performance measured by the test” (Viete, 1992: 122). In other words, only by
observation o f the candidate functioning in the real world and comparison o f this with
performance on the best can operational validity be established (Viete, 1992)
Consequential validity is considered as changes that occur as a consequence o f a test’ s
introduction and “ ...may in turn have an impact on what is being measured by the
test, in such a way that the fairness o f inferences about candidates is called into
question” (McNamara, 2000: 53)
Chapter Two
6
Among the different aspects o f validity, construct validity is regarded as the most
important. As Cumming (1996) argued:
Rather than enumerating various types o f validity..., the concept o f construct
validity has been widely agreed upon as the single, fundamental principle that
subsumes various other aspects o f validation, relegating their status to research
strategies o r categories o f em pirical evidence by which construct validity might
be assessed o r asserted. (5)
What is more, Gipps (1994: 61) stressed that “ construct validity is needed not only to
support test interpretation,but also to justify test use,
,
. Bachman and Palmer (1996)
added that construct validity helps to interpret scores from language assessment as
indicators o f learners’ language ability. A crucial question emerged: “ To what extent
can we justify these interpretations?” (Bachman and Palmer, 1996: 21). In writing
assessment, the issue o f construct validity underlying concerns about reliability in
scoring has been investigated (Hamp-Lyons, 1990). Several aspects in such research
involved the decisions and criteria that raters employ to form their judgements and the
empirical validation o f scales and criteria used for scoring (Hamp-Lyons, 1990). In
this research, the construct validity underpinning the basis on which teacher raters
made their judgements o f students’ writing texts and established the criteria checklist
was investigated.
2.2.2 Reliability
McNamara (2000: 136) referred to re lia b ility as “ consistency o f measurement o f
individuals by a test” . Davies (1990: 21) also demonstrated a similar definition as he
put forward: ^re lia b ility [is] the consistency o f test judgements and results” . Two
main groups o f factors affecting the reliability o f the tests are test-related factors and
scorer-related factors (Viete, 1992). Considerations should be taken into account in
several aspects o f re lia b ility.
Test-related factors consist o f the testing environment (familiarity, personnel involved
in the test, timing, physical conditions), test rubric (time organization, time allocation
and instructions), the input (format, nature o f language), the expected response
(format, nature o f language, restrictions on response), and the relationship between
input and response (reciprocal, nonreciprocal, adaptive) (Bachman, 1990).
Chapter Two
7
Scorer-related factors refer to the format and nature o f the assessment criteria,
criterion-referenced scoring methods (holistic or analytical scoring), degree o f
experience and training o f scorers, conditions for scorers, number o f scorers (multiple
scoring is preferred), sequence and number o f performances scored, degree o f
independence o f scorers, existence o f moderation procedures and anonymity o f tests
(Viete, 1992). According to my observation, in the majority o f Vietnamese English
major universities or departments, scoring criteria are not always available, training o f
scorers appears to be in absence, raters get tired from marking too many w riting tasks
within a short time due to the large number o f students and limited staff and time.
Moderation procedures ensuring individual scorers use all criteria and procedures
consistently and assisting in making final decisions about scores where major
discrepancies occur amongst scorers (Hughes, 1989 and Walker, 1990,quoted in
Viete, 1992) usually do not exist. Other scholars described scorer-related factors
including the consistency o f scoring among different raters [“ inter-marker reliability ,
,
(Bachman, 1990: 180,Weir, 1990: 32)] and the consistency o f each individual rater
[“ intra-marker reliability* ,(Bachman, 1990: 179; Weir, 1990: 32)]. In my research,
inter-marker reliability was considered.
Validity and re lia b ility are interrelated because a valid test must be a reliable one and
a test which is a reliable measure o f something other than what we intend to measure
(not valid) is useless (Hughes, 1989; Weir, 1990). Weir (1990: 33) argued that “ it is
sometimes essential to sacrifice a degree o f reliability in order to enhance validity” .
Later, he agreed with Guilford (1965: 481,cited in Weir, 1990: 33) that “ i f a choice
has to be made, validity 'after all, is more im po rta n t,
,
,
. A compromise between the
two, however, should be looked for depending on the purpose o f the test.
A number o f sources o f variability in raters’ judgements have been identified in
numerous studies. Among others, these include raters’ cultural or disciplinary
background (O ’Loughlin , 1992; Cumming et al., 2002), raters’ training and
moderation (Hamp-Lyon, 1991; Weir, 1993; Weigle ,1994; Alderson, 1995; Bachman
and Palmer, 1996, Lumley, 2002; Hughes, 2003),different interpretations o f
assessment criteria (Gipps, 1994). There have been increasing attempts to enhance the
test reliability. Assessment criteria, for instance, have been developed to provide
raters with a basis from which raters form their judgements (Cumming et al., 2002).
Chapter Two
8
What is more, rater training and moderation have been carried out to help raters to
reach a degree o f agreement about assessment criteria and rating scales or in other
words “ help bring raters to a temporary agreement on a set o f common
standards,
,
(Weigle, 2002: 72). Nevertheless, the issue o f reliability might undeniably
persist, since “ raters w ill never be in complete agreement on writing scores” (Weigle,
2002: 72) and complete elimination o f inconsistencies would be an unrealistic goal as
Bachman and Palmer (1996) demonstrated.
2.2.3 Practicality
P racticality or “ test efficiency” involves the “ financial viability ,
,o f the test design,
administration and scoring (Weir, 1990: 34-35). It is almost impossible to maintain
high validity and reliability in a test that is not too costly and does not require a lot o f
people, time and materials (Davies, 1990). Bachman (1996) argued that a given test
could not be said to be more or less practical than another since it depends on a
specific testing situation where resources required vary. Compromise is necessary to
maintain the balance among validity, re lia b ility and p ra cticality o f the test. (Bachman,
1996).
2.3 Issues in testing writing skills
2.3.1 Aspects in testing writing
Weir (1993) demonstrated three aspects that should be taken into account when
testing written production. They are conditions, operations and quality o f output. The
literature suggested that text types, topic and time allowance were different conditions
that impact on the reliability and validity o f the w riting tests. It is argued that having
more than one w riting task {text types) to perform increases reliability and validity
because it is relatively d ifficult to know about the candidate’s general w riting ability
through one w riting task (Hughes, 1989; Hamp-Lyons, 1990-1991,cited in Weir,
1993). Test practicality, however, w ill be an influence in terms o f the time taken by
such variety. W riting topics should be relevant to students’ background knowledge to
ensure that they are able to write something on the topics (Weir, 1993). A choice o f
topics could affect the test reliability because “ too much uncontrolled variance’’ w ill
Chapter Two
9
appear in the test (Weir, 1993: 135). In regard to the appropriate time allowed for the
completion o f w riting tasks, it is necessary to provide sufficient time for candidates to
produce texts that have to be long enough to be marked reliably (Weir, 1993).
Two different approaches for assessing w riting ability described by Hamp-Lyons
(1991 cited in Weir, 1993) are the indirect method and the direct method. The former
deals with a discrete point framework like grammar, vocabulary, spelling, etc and
these elements can be tested separately by the use o f objective tests (Weir, 1993). It
would be d iffic u lt to make statements about how w ell candidates write from the
discrete item tests. The latter refers to “ more direct extended w riting tasks” which
involve “ the production o f continuous texts,
,in which writers can raise their own
ideas (Weir, 1993: 133).
In communicative testing and process-oriented curricula, direct tests seem to be more
suitable though the process in the w riting examination does not usually reflect the
process o f w riting including brainstorming, outlining, w riting and rewriting, editing
and revising (V eit et al., 1994). The fact that in the Vietnamese educational context,
there is a large number o f students and limited staff, and a heavily exam-oriented
curriculum explain this. Nonetheless, in this present study, 50 minutes, the test length,
is expected to give students enough time to carry out these steps to produce a 250
word w riting test, achieving a relative balance between practicality and validity.
2.3.2 Marking scheme
Two basic approaches to scoring, analytical scoring and ho listic scoring, are
discussed by a number o f authors (Hughes, 1989; Hamp-Lyons, 1991; Weir, 1993;
O ’M alley and Pierce, 1996; McNamara, 2000; Weigle, 2002). In the following
subsections, the definitions o f the two marking approaches and the arguments for
which approaches to adopt are presented.
2.3.2.1 Analytical marking
A nalytical m arking is the method in which each aspect o f a performance e.g. content,
grammar, organization, etc is rated separately and the final score is the total o f these
individual ones (Weir, 1993). Other writers share similar definitions though the
Chapter Two
10
wordings are o f little difference like “analytic scales separate the features o f a
composition into components that are each scored separately,
,(O ’Malley and Pierce,
1996: 144) or “ scripts are rated on several aspects o f w riting or criteria rather than
given a single score” (Weigle, 2002: 114).
Analytical scoring holds a number o f advantages. First, more detailed feedback
regarding
specific
information for
aspects
o f students’
w riting
performance
and diagnostic
teachers in planning instruction are provided (Perkins, 1983 cited in
O’Malley and Pierce, 1996; Bachman and Palmer, 1996; Weigle, 2002). Moreover,
components o f writing students have progressed in most rapidly can be seen through
analytical scoring (Hamp-Lyons, 1991) and the problem o f uneven development o f
subskills can be revealed (Hughes, 1989). Another advantage in terms o f scorers and
scoring process is that every aspect o f writing skill that might be ignored has to be
looked at and more scores given for each component can result in more reliable scores
(Hughes, 1989). In addition, explicit concern reflected in teachers’ feedback,
particularly teachers’ praise on the positive aspects o f students’ w riting makes
students feel motivated, encouraged and invited to write as is shown by Tran (2002)
in relation to Vietnamese students o f English in writing.
Limitations can be witnessed in this kind o f marking scheme. First, it is obvious that
analytical marking takes longer than holistic schemes, an issue o f practicality, since
more than one score for each component is required (Hughes, 1989; O’Malley and
Pierce, 1996; Weigle, 2002). The major problem as seen by Hughes (1989),Weir,
(1993) and Weigle (2002) is whether scorers judge each aspect separately from the
others [called a “ halo effect” by Hughes (1989: 103) and Weir (1993: 163)]. In other
words, “ rating o f one criterion might have a knock-on effect in the rating o f the next,
’
(Weir, 1993: 164) since every component in a piece o f w riting is integrated. Madsen
(1983) and O’M alley and Pierce (1996) also raised another issue. It is that teachers
raters do not agree w ith the weight given to each component (O ’M alley and Pierce,
1996) or do not know how to weigh each error (Madsen, 1983). It might even be the
case that experienced raters use the analytical scoring scheme but rate more
holistically to come to a single score (Weigle ,2002).
Chapter Two
11
In analytic marking schemes, each aspect such as organization, vocabulary and
grammar might be equally weighted like in Anderson’s scheme (cited in Hughes
1989: 101-102,see Appendix 1),which consists o f five scales, each divided into six
levels with score points ranging from 1 to 6,where the final score is the total o f all
weighted scales. A note-worthy point in this scheme is “ the conjunction o f frequency
o f error and the effect o f errors on communication,
,(Hughes, 1989: 103). Put in a
different way, a small number o f grammatical errors can have a more serious effect on
communication than a series o f another kind (Hughes, 1989). A different scheme can
be witnessed in Jacobs et al.’s scoring profile (1981 cited in Hughes, 1989,see
Appendix 2). It is apparent from this scheme is that the more significant one aspect is,
the more weight it receives (Hughes, 1989; Weigle, 2002). Five components o f
writing: content, language use, organization, vocabulary and mechanics receive 30,
25,20,20 and 5 points respectively in order o f different emphasis. The weightings
can vary according to students’ levels (Hughes, 1989). The association o f each score
with its descriptors helps raters to grant scores in accordance w ith students’ levels
(Hughes, 1989).
2.3.2.2 Holistic marking
H olistic marking (referred to as global marking by Weir, 1993; Bachman and Palmer,
1996
or
general
impression
marking
by
Weir,
1990;
Weigle,
2002
or
“ im pressionistic” scoring by Hughes, 1989) refers to the rating o f a performance as a
whole (McNamara, 2000). In this approach, scores are not required for each
component in the criteria.
Hamp-Lyons (1991) and Weigle (2002) shared a common view that holistic scoring
has become prevalent in w riting assessment in the past 25 years. A number o f positive
features explain this trend. Apparently, this approach to scoring is faster and
consequently less expensive than any other approach (Hughes, 1989; Hamp-Lyons,
1991; Weir, 1993; Weigle, 2002). It takes experienced scorers a couple o f minutes
(Hughes, 1989) or even one minute or less (Hamp-Lyons, 1991) to assign a one page
text a score. For this reason, the use o f more than one rater is encouraged (Weir,
1993) “ to compensate for interrater unreliability” (P. Cooper, 1984: 243 cited in
Chapter Two
12
Hamp-Lyons, 1991). This notion is also shared by Hughes (1989) and Hamp-Lyons
(1991) as it is argued that scores given by multiple raters are more reliable than those
given by a single one. This is, however, true
only i f the markers are equally consistent in their own marking. I f this is not
the case the re lia b ility o f the more consistent marker on his own might be
better than the combined re lia b ility estimate fo r two markers who exhibit
unequal consistencies. (Weir, 1993: 165)
Another advantage o f this kind o f scoring is the intention to focus the reader’s
attention on the strengths o f the writing as White (1985,cited in Hamp-Lyons, 1991)
claimed. Readers’ attention can concentrate on certain aspects o f w riting and therefore
can provide appropriate information in an efficient way (Weigle, 2002). What is
more, holistic scoring reflects best the authentic and personal reaction o f a reader to a
text (White, 1984,cited in Weigle, 2002) and “ reinforces the vision o f reading and
writing as intensely individual activities involving the fu ll s e lf’ (White, 1985: 33,
quoted in Hamp-Lyons, 1991).
Holistic marking, on the other hand, presents several weaknesses. First, a person’s
writing ability cannot be seen through the single score since diagnostic information
through scores for each component o f a w riting task such as organization, content,
vocabulary is not provided (Weir, 1993; Bachman and Palmer, 1996; Weigle, 2002).
One might have a good command o f grammar, but not be very good at organizing
ideas. Others might have abundant ideas organized in a logical way but be poor at
sentence structure (Weigle ,2002). A profile o f student writers including a description
o f language ability (errors) or a prescription for treatment is expected but holistic
scoring fails to do so (Bachman and Palmer, 1996).
A major problem o f holistic scoring is the employment o f multiple 'hidden ,
components o f language ability when arriving at the final score as Bachman and
Palmer (1996) and Weigle (2002) demonstrated. It is d ifficu lt to interpret the score
since different raters do not necessarily use the same criteria and i f they do, different
components might be weighted differently (Bachman,
1996; Weigle, 2002).
“ Superficial characteristics” (Bachman and Palmer, 1996: 144) namely length,
handwriting (Markham, 1976; Sloan and McGinnis, 1982 cited in Weigle, 2002) ,
word choice and spelling errors (Charney, 1984,cited in Vaughan, 1991) which are
Chapter Two
13
- Xem thêm -