Đăng ký Đăng nhập
Trang chủ Assessment of tertiary english major students' writing vietnamese teachers' pers...

Tài liệu Assessment of tertiary english major students' writing vietnamese teachers' perspectives a thesis submitted in partial fulfilment of the requirments

.PDF
132
1
144

Mô tả:

ASSESSMENT OF TERTIARY ENGLISHMAJOR STUDENTS5WRITING: VIETNAMESE TEACHERS, PERSPECTIVES - By NGUYEN TRA M Y B achelor o f Arts H anoi U n ive rsity o f Foreign Studies H anoi, Vietnam 1998 A thesis submitted in partial fu lfilm e n t o f the requirements fo r the degree o f Master o f Education (TE S O L ֊ International),Faculty o f Education,Monash U nive rsity, M elbourne,A ustralia December,2003 llllllllllllllllllll 000018393 դ ----------------------------- ^ խ TRUNG ĨẤM ^ — 1HỞN6T1N THI/VIÊN n ik A ĩ.5 ..e .ỏ ------------------------------ ձ TABLE OF CONTENTS Abstract ...................................................................................... A cknowledgements.................................................................... Declaration ............................................................................... List o f Tables.............................................................................. CHAPTER ONE: INTRODUCTION................................ 1.1 B a c k g r o u n d o f t h e r e s e a r c h .......................................... 1.2 R e s e a r c h a i m s ............................................................................. 1.3 O u t l in e o f t h e t h e s i s .............................................................. CHAPTER TWO: REVIEW OF THE LITER ATU R E...... 4 2.1 C o m m u n ic a t iv e c o m p e t e n c e .............................................. 4 2 .2 P r in c ip l e s o f C o m m u n ic a t iv e L a n g u a g e T e s t in g Ớ 5 7 1 2 .4 S u m m a r y o f t h e c h a p t e r .............՜..................................... 9 .7 2.3.1 Aspects in testing w ritin g ..................................... 2.3.2 M arking scheme.................................................... 2.3.3 D ifficulties in assessing w ritin g ........................... 9 :. 2 .3 I s s u e s in t e s t in g w r it in g s k i l l s ..................................... . 9 2.2.1 V alidity .................................................................. 2.2.2 R elia bility .............................................................. 2.2.3 P racticality ............................................................ շ о 9 CHAPTER THREE: M ETH O D O LO G Y.......................... 0 2 2 113 3 .2 S e l e c t io n o f p a r t ic ip a n t s .................................................. 2 3 .1 A QUALITATIVE APPROACH.......................................................... 3 .3 M e t h o d s f o r d a t a c o l l e c t io n ........................................ 4 .2 T e a c h e r s 5 p e r s p e c t iv e s o n a s s e s s m e n t c r i t e r i a ............. 4 .3 C h a r a c t e r is t ic s o f a “ g o o d ” a r g u m e n t e s s a y ................. 4 .4 F a c t o r s a f f e c t in g t e a c h e r s ' a s s e s s m e n t j u d g e m e n t s > 7 ( р , з і 7 4 4.3.1 Purpose ........................................................................... 4.3.2 Thesis.............................................................................. 4.3.3 Evidence .......................................................................... 4.3.4 Refutation ........................................................................ 4.3.5 Persona ........................................................................... > 7 ^ 4 ^^ * * ^ ^^ « ^* յ Հ Հ 4.2.1 Process o f developing criteria checklist........................ 4.2.2 Process o f scoring .......................................................... л 2 2 4.1.1 Assessment practices ...................................................... 4.1.2 W riting syllabus ............................................................. 5 * 2 о 2 V i e t n a m e s e E n g l i s h m a j o r u n i v e r s i t i e s ........................................ 2 4.1 A n o v e r v ie w o f a s s e s s m e n t p r a c t ic e s a n d w r it in g s y l l a b u s in s o m e 2 ՛ 5 2 CHAPTER FOUR: FINDINGS AND DISCUSSION...... շ 3 .4 M e t h o d s f o r d a t a a n a l y s i s ............................................. 4.4.1 The influence o f raters culturally-based perspectives and norms.............. 47 4.4.2 The influence o f the marking scheme.............................................................49 7 4 .5 A s s e s s m e n t c r i t e r i a ................. ................................................................................................ 4 9 4 .6 T h e u s e f u l n e s s o f t h e p r o c e s s o f d is c u s s io n o f a s s e s s m e n t c r it e r ia ....5 3 CHAPTER FIVE: CONCLUSION AND RECO M M EDATIONS............................55 5.1 S u m m a r y o f t h e s t u d y ............................................................................................................... 55 5 .2 R e c o m m e n d a t i o n s ......................................................................................................................... 5 6 5 .3 L im it a t io n s ......................................................................................................................................... 57 5 .4 D i r e c t i o n s f o r f u t u r e r e s e a r c h ..........................................................................................57 REFERENCES........................................................................................ ....................59 A p p e n d ix 1 : A n d e r s o n ’s MARKING SCHEME....................... 66 A p p e n d ix 2 : J a c o b s e t AL.’s s c o r i n g p r o f i l e ................. 68 A p p e n d ix 3: A COMPARISON OF TWO APPROACHES........... 69 A p p e n d ix 4 :S t u d e n t s ' w r i t i n g t e x t s ................................ 70 A p p e n d ix 5: R e f l e c t io n q u e s t i o n s ...................................... 80 A p p e n d ix 6 : R e f l e c t i o n a n s w e r s ......................................... 81 A p p e n d ix 7 : E n g l i s h T r a n s l a t i o n o f I n t e r v i e w s ••••.. 83 A p p e n d ix 8 : V ie tn a m e s e T r a n s c r ip ts o f I n te r v ie w s 02 A bstract Direct tests seem to increasingly become popular in Vietnam as getting students’ to write is the best way to test their writing ability (Hughes, 2003). One o f the most significant challenges in assessing writing is the subjectivity o f judgements and ensuring that these judgements are consistent. Unfair decisions may affect individuals’ lives (Hughes, 1989). For this reason, the research was carried out in order to explore how teacher raters make their scoring judgments, to develop collaboratively a set o f criteria in a checklist through which teachers’ assumptions about 'good’ w riting were revealed and to gather teachers’ perspectives on the usefulness o f the process for a reliable and valid scoring. Five Vietnamese teachers, who are pursuing their Master degrees in Melbourne, Australia, and who teach in universities with English - major courses, were involved in this study. A criteria checklist was first developed by two experienced teachers. A workshop was then held with the attendance o f five teachers applying the checklist to score the sample essay. Changes were made after the discussion o f the marking and a new criteria checklist was established. The agreed upon criteria checklist was used to rate four essays. The participants were then asked to provide their reflections in w riting on the usefulness o f this process o f training for their work as raters. The findings showed that inconsistency among raters in scores exists even when there was a shared criteria checklist. There was a change in consistency across raters (inter-rater reliability) after the discussion o f the marking o f the sample essay. Nonetheless, the question o f how much raters scale up or down in their grading is still challenging. Raters’ cultural perspectives (the norm, Western style or Oriental style, that raters favour) and the rating scheme (the holistic or analytical scoring) also influence teachers’ judgments. A ll o f these would be improved through on-going rater training and moderation and the development o f a more detailed criteria checklist. Also, the characteristics o f a ‘good’ argument essay (the w riting genre that was assessed in this study) and the usefulness o f the discussion workshop were presented through the teachers’ perspectives. Finally, in response to the findings, a holistic criteria checklist was developed with the Vietnamese ten-point scale and level descriptors. ACKNO W LEDG EM EN TS I would like to express my deepest gratitude to my supervisor, Mrs. Rosemary Viete for her whole-hearted assistance during the development o f this research thesis under her supervision, without which this research thesis could not have been completed. I am especially indebted to my family, my husband for their constant encouragement during the course o f my study. I am also very much grateful to the teacher participants without whom this thesis would have been impossible. I sincerely express my great thanks to Mr. Le Thanh Dzung, Dean o f English Department - Hanoi University o f Foreign Studies for his kind assistance. M y special thanks also go to Dr. Sophie Arkoudis,Department o f Language, Literacy & Arts Education - University o f Melbourne for her valuable help. Finally, I also would like to convey my gratitude to my colleagues who have helped me in various ways. Declaration This research thesis contains no material which has been accepted for the award o f any other degree or diploma in any university or tertiary institution, and to the best o f my knowledge and belief, neither does it contain material previously published or written by another person, except where due acknowledgement is made in the text. Signed Full name: Nguyen Tra M y The plan for this research was approved by the Standing Committee on Ethics in Research Involving Humans on 6 August, 2003 (Reference 2003/524). List o f Tables Page • Table 2.1. A sample holistic scale 17 • Table 2.2. A sample analytical scale 18 • Table 4.1. The calculation o f students’ final results 26 • Table 4.2. Classifying system 26 • Table 4.3. Criteria checklist set up by two experienced teacher participants 29 • Table 4.4. Scores on essay 4 30 • Table 4.5. The newly amended criteria checklist 36 • Table 4.6. The scores 36 • Table 4.7. Summary o f agreements and disagreements discussion in the 37 • Table 4.8. Scores on essay 3 38 • Table 4.9. Scores on essay 1 40 • Table 4.10. A holistic marking scheme 51 iv C H A PTER ONE: INTRO DUCTIO N 1.1 Background of the research The major language o f international communication for the Socialist Republic o f Vietnam was Russian from 1954 until the recent political changes in Eastern and Central Europe. For the South o f Vietnam, French was the first foreign language (this area was under French occupation) till 1954,and then English (due to the fact that the US involved in Vietnam war) until the reunification o f the country in 1975. After reunification, Russian was the first national foreign language for a number o f years, and little attention was paid to the teaching o f either English or French (Do, 1999; Nguyen and Crabbe,1999). In the context o f political renovation and the open-door policy pursued by the Vietnamese government in the past decade, English has become the first foreign language. In recent years, Vietnam has extended its political, diplomatic and economic relationship with other countries and consequently, it has witnessed an explosion in the demand for English (Brogan and Nguyen, 1999). With the move to a market economy by the Vietnamese government and the growth o f international business as well as an increasing number o f foreign tourists, knowledge o f English has become the passport to a better-paid job not only in the tourism and hospitality industries, but also in many other enterprises (Nguyen and Crabbe, 1999). The spread o f English as a global means o f communication has had much impact on the English language teaching and learning in Vietnam. Language testing which often goes in line w ith language teaching and learning is o f high importance. It works as the motivation for teaching and learning processes, measures learners’ levels and has an influence on the curriculum since designers might revise program goals and objectives in the on-going development o f the curriculum (Brown, 1995). A number o f studies have been conducted in the field o f testing in general and in testing writing in particular (Freedman, 1979; Hamp-Lyons, 1991; Vaughan, 1991; O ’Loughlin , 1992; О 5Hagan, 1999; Lumley, 2002; Weigle , 2002). W riting assessment in the context o f Vietnam, however, seems to be largely unexplored. The fact has prompted me to conduct research on the assessment o f writing, a relatively subjective assessment, in Vietnamese universities. Chapter One 1 In terms o f subjectivity, direct tests were discouraged and avoided as reliability dominated in language testing in the past. This was true in the world in the 1950s and 1960s (McNamara, 2000: 38) and still is in some Vietnamese universities as I have observed. Grammatical structures and knowledge o f vocabulary were assessed instead o f writing skills. However, in many universities nowadays direct tests are becoming common. This gives rise to issues o f reliability and validity. The problem o f subjectivity has increasingly been recognised to be “ something that had to be faced and managed, ,in direct tests (McNamara, 2000: 38). This strengthens my wish to do research in this field. I wish to understand which criteria Vietnamese teachers raters have employed in assessing students’ writing, how much weight they give for each criteria and to what extent their perspectives on assessment criteria are similar or different, which might explain degrees o f disagreement and discrepancy o f the final scores for a piece o f writing among raters. Also, my interest lies in the factors that affect teachers’ assessment judgements. I am inspired to know which norms Vietnamese teacher raters favour in their assessment since both Western w riting style and Oriental w riting style, linear and circular respectively, according to views expressed in Liddicoat (1997) might well be observed in students’ writing. M y study, in addition, involves identifying which marking schemes {holistic or analytical marking) teachers favour and how these reflect students’ best abilities. Finally, teachers’ perspectives on the usefulness o f discussion o f assessment criteria, a kind o f moderation, are explored. The fact that we young teachers are often not given the assessment guidelines and training prompted me to offer and investigate teachers’ perceptions o f this process o f training cum moderation. 1.2 Research aims The purpose o f this research is to: 1) find out the assumptions English teachers in Vietnamese universities share about ‘good w riting’ for an argument essay; 2) have participants collaboratively develop a set o f criteria for scoring such writing; 3) identify Chapter One 2 the basis on which teachers make scoring judgements against these criteria, and 4) find out teachers’ perceptions o f the usefulness o f the process as a tool for more reliable and valid training and scoring. 1.3 Outline of the thesis This thesis consists o f five chapters. Chapter One is the introduction and the research aims. Chapter Two reviews the literature on communicative language testing and the testing o f writing. Chapter Three presents the qualitative methodology used for the research with the focus on in-depth interviews and open-ended questionnaires. Chapter Four deals with the discussion o f the findings. The summary o f the findings and the recommendations for teachers to be better supported in assessing writing performance are presented in Chapter Five. Following the chapters are References and Appendices. Chapter One 3 C H A PTER TW O: REVIEW OF THE LITERATURE In this part, I w ill deal w ith the notion o f communicative competence, which is considered the framework for communicative language testing, the principles o f which w ill then be presented with three fundamental criteria: validity, reliability and practicality. Other issues in language testing w ill also be looked at. Lastly, I w ill mention issues in testing w riting skills, which are relevant to the focus o f my research. 2.1 Communicative competence Communicative language teaching (CLT) was devised in the late 1960s to satisfy the new demands o f using English (Soler and Guzman, 2000). Communicative competence, a principal concept o f this approach, has generated a number o f discussions around its definition. Many authors have mentioned the distinction between “ competence, , and “ performance, , . Savignon (1983: 9) argued that “ competence is what one knows. Performance is what one does, , . Kempson (1977: cited in Canale and Swain, 1980) claimed that competence is identified as the language users’ knowledge and performance is the study o f the use o f that knowledge. Canale and Swain (1983) later developed a framework o f communicative competence consisting o f four aspects: 1. Grammatical competence includes those competences involved in language use, i.e. the knowledge o f such linguistic aspects as lexicology, morphology, syntax, phonetics and phonology; 2. Sociolinguistic competence refers to control o f the conventions o f language used that are determined by the features o f the specific language use; 3. Discourse competence means the mastery o f how to combine grammatical forms and meanings to achieve unity o f a spoken or written text in different genres; and 4. Strategic competence is defined as the mastery o f verbal and non-verbal communication strategies used to compensate for breakdowns in communication, and to enhance the effect o f utterance. (Adapted from Savignon, 1983; Bachman, 1990; Berns, 1990 and Shaw, 1992) Chapter Two 4 Canale and Swain (1980) also demonstrated implications for a communicative testing programme as follows: communicative testing must be devoted not only to what the learner knows about the second language and about how to use it (competence) but also to what extent the learner is able to actually demonstrate this knowledge in a meaningful communicative situation (performance). (34) The notion o f communicative competence can be taken into account by test designers in terms o f test content and test methods. It is also useful in working out the criteria and the marking scheme. 2.2 Principles of Communicative Language Testing Three basic considerations in language testing that are mentioned by a number o f researchers (Hughes, 1989; Bachman, 1990; Weir, 1993; McNamara, 2000) are validity, reliability and practicality. 2.2.1 Validity Validity is defined as whether the test measures what it is meant to measure (Weir, 1990). Weir (1990) demonstrated five sub-components o f validity: construct validity, content validity, face validity, wash-back validity and criterion-related validity. Other kinds o f validity were also mentioned such as concurrent validity, predictive validity (Bachman, 1990; Davies, 1990) ,operational validity (Viete, 1992) and consequential validity (McNamara, 2000). A ll subgroups o f validity w ill be discussed below. Bachman (1990: 255) argued that construct validity deals w ith “ the extent to which performance on tests is consistent with predictions that we make on the basis o f a theory o f abilities” . Hughes (1989), Davies (1990) and Weir (1990) also share the same views on the notion o f construct validity. Content validity, according to Anastasi (1982: 131,cited in Weir, 1990: 25),is defined as “ the systematic examination o f the test content to determine whether it covers a representative sample o f the behaviour domain to be measured” . The argument o f the relevance o f the test content is discussed by Bachman (1990) and Chapter Two 5 McNamara (2000). The former holds the view that content validity involves content relevance and content coverage, as agreed by Davies (1990). The latter argues that “judgements as to the relevance o f content are often quite complex, and the validation effort is accordingly elaborate.” (51). McNamara (2000: 133) defined face validity as “ the extent to which a test meets the expectations o f those involved in its use, e.g. administrators, teachers, candidates and , . W eir (1990) argued that students would not perform at their best in test score users, the absence o f face validity. It, however, must be the first one to be neglected i f there exists a conflict between it and any o f the other validities (Davies, 1990). Another type o f validation is washback validity, which refers to the influence o f the test “ on the teaching and learning that precedes it” (Weir, 1990: 27). It is appreciated that i f language teachers equip students with skills relevant to present and future needs and the test is designed to reflect these, the relationship between the test and the teaching that precedes it w ill become closer. Criterion-related validity demonstrates the relationship between test scores and a suitable criterion o f performance (Bachman, 1990; Weir, 1990). Concurrent validity (examining the correlation between test scores and another measure o f performance, usually an older established test) and predictive validity (concerning whether test scores can predict future performance) are two types o f criterion-related validity (Bachman, 1990; Weir, 1990). Operational validity describes “ the relationship between the ‘real w orld’ performance and the performance measured by the test” (Viete, 1992: 122). In other words, only by observation o f the candidate functioning in the real world and comparison o f this with performance on the best can operational validity be established (Viete, 1992) Consequential validity is considered as changes that occur as a consequence o f a test’ s introduction and “ ...may in turn have an impact on what is being measured by the test, in such a way that the fairness o f inferences about candidates is called into question” (McNamara, 2000: 53) Chapter Two 6 Among the different aspects o f validity, construct validity is regarded as the most important. As Cumming (1996) argued: Rather than enumerating various types o f validity..., the concept o f construct validity has been widely agreed upon as the single, fundamental principle that subsumes various other aspects o f validation, relegating their status to research strategies o r categories o f em pirical evidence by which construct validity might be assessed o r asserted. (5) What is more, Gipps (1994: 61) stressed that “ construct validity is needed not only to support test interpretation,but also to justify test use, , . Bachman and Palmer (1996) added that construct validity helps to interpret scores from language assessment as indicators o f learners’ language ability. A crucial question emerged: “ To what extent can we justify these interpretations?” (Bachman and Palmer, 1996: 21). In writing assessment, the issue o f construct validity underlying concerns about reliability in scoring has been investigated (Hamp-Lyons, 1990). Several aspects in such research involved the decisions and criteria that raters employ to form their judgements and the empirical validation o f scales and criteria used for scoring (Hamp-Lyons, 1990). In this research, the construct validity underpinning the basis on which teacher raters made their judgements o f students’ writing texts and established the criteria checklist was investigated. 2.2.2 Reliability McNamara (2000: 136) referred to re lia b ility as “ consistency o f measurement o f individuals by a test” . Davies (1990: 21) also demonstrated a similar definition as he put forward: ^re lia b ility [is] the consistency o f test judgements and results” . Two main groups o f factors affecting the reliability o f the tests are test-related factors and scorer-related factors (Viete, 1992). Considerations should be taken into account in several aspects o f re lia b ility. Test-related factors consist o f the testing environment (familiarity, personnel involved in the test, timing, physical conditions), test rubric (time organization, time allocation and instructions), the input (format, nature o f language), the expected response (format, nature o f language, restrictions on response), and the relationship between input and response (reciprocal, nonreciprocal, adaptive) (Bachman, 1990). Chapter Two 7 Scorer-related factors refer to the format and nature o f the assessment criteria, criterion-referenced scoring methods (holistic or analytical scoring), degree o f experience and training o f scorers, conditions for scorers, number o f scorers (multiple scoring is preferred), sequence and number o f performances scored, degree o f independence o f scorers, existence o f moderation procedures and anonymity o f tests (Viete, 1992). According to my observation, in the majority o f Vietnamese English major universities or departments, scoring criteria are not always available, training o f scorers appears to be in absence, raters get tired from marking too many w riting tasks within a short time due to the large number o f students and limited staff and time. Moderation procedures ensuring individual scorers use all criteria and procedures consistently and assisting in making final decisions about scores where major discrepancies occur amongst scorers (Hughes, 1989 and Walker, 1990,quoted in Viete, 1992) usually do not exist. Other scholars described scorer-related factors including the consistency o f scoring among different raters [“ inter-marker reliability , , (Bachman, 1990: 180,Weir, 1990: 32)] and the consistency o f each individual rater [“ intra-marker reliability* ,(Bachman, 1990: 179; Weir, 1990: 32)]. In my research, inter-marker reliability was considered. Validity and re lia b ility are interrelated because a valid test must be a reliable one and a test which is a reliable measure o f something other than what we intend to measure (not valid) is useless (Hughes, 1989; Weir, 1990). Weir (1990: 33) argued that “ it is sometimes essential to sacrifice a degree o f reliability in order to enhance validity” . Later, he agreed with Guilford (1965: 481,cited in Weir, 1990: 33) that “ i f a choice has to be made, validity 'after all, is more im po rta n t, , , . A compromise between the two, however, should be looked for depending on the purpose o f the test. A number o f sources o f variability in raters’ judgements have been identified in numerous studies. Among others, these include raters’ cultural or disciplinary background (O ’Loughlin , 1992; Cumming et al., 2002), raters’ training and moderation (Hamp-Lyon, 1991; Weir, 1993; Weigle ,1994; Alderson, 1995; Bachman and Palmer, 1996, Lumley, 2002; Hughes, 2003),different interpretations o f assessment criteria (Gipps, 1994). There have been increasing attempts to enhance the test reliability. Assessment criteria, for instance, have been developed to provide raters with a basis from which raters form their judgements (Cumming et al., 2002). Chapter Two 8 What is more, rater training and moderation have been carried out to help raters to reach a degree o f agreement about assessment criteria and rating scales or in other words “ help bring raters to a temporary agreement on a set o f common standards, , (Weigle, 2002: 72). Nevertheless, the issue o f reliability might undeniably persist, since “ raters w ill never be in complete agreement on writing scores” (Weigle, 2002: 72) and complete elimination o f inconsistencies would be an unrealistic goal as Bachman and Palmer (1996) demonstrated. 2.2.3 Practicality P racticality or “ test efficiency” involves the “ financial viability , ,o f the test design, administration and scoring (Weir, 1990: 34-35). It is almost impossible to maintain high validity and reliability in a test that is not too costly and does not require a lot o f people, time and materials (Davies, 1990). Bachman (1996) argued that a given test could not be said to be more or less practical than another since it depends on a specific testing situation where resources required vary. Compromise is necessary to maintain the balance among validity, re lia b ility and p ra cticality o f the test. (Bachman, 1996). 2.3 Issues in testing writing skills 2.3.1 Aspects in testing writing Weir (1993) demonstrated three aspects that should be taken into account when testing written production. They are conditions, operations and quality o f output. The literature suggested that text types, topic and time allowance were different conditions that impact on the reliability and validity o f the w riting tests. It is argued that having more than one w riting task {text types) to perform increases reliability and validity because it is relatively d ifficult to know about the candidate’s general w riting ability through one w riting task (Hughes, 1989; Hamp-Lyons, 1990-1991,cited in Weir, 1993). Test practicality, however, w ill be an influence in terms o f the time taken by such variety. W riting topics should be relevant to students’ background knowledge to ensure that they are able to write something on the topics (Weir, 1993). A choice o f topics could affect the test reliability because “ too much uncontrolled variance’’ w ill Chapter Two 9 appear in the test (Weir, 1993: 135). In regard to the appropriate time allowed for the completion o f w riting tasks, it is necessary to provide sufficient time for candidates to produce texts that have to be long enough to be marked reliably (Weir, 1993). Two different approaches for assessing w riting ability described by Hamp-Lyons (1991 cited in Weir, 1993) are the indirect method and the direct method. The former deals with a discrete point framework like grammar, vocabulary, spelling, etc and these elements can be tested separately by the use o f objective tests (Weir, 1993). It would be d iffic u lt to make statements about how w ell candidates write from the discrete item tests. The latter refers to “ more direct extended w riting tasks” which involve “ the production o f continuous texts, ,in which writers can raise their own ideas (Weir, 1993: 133). In communicative testing and process-oriented curricula, direct tests seem to be more suitable though the process in the w riting examination does not usually reflect the process o f w riting including brainstorming, outlining, w riting and rewriting, editing and revising (V eit et al., 1994). The fact that in the Vietnamese educational context, there is a large number o f students and limited staff, and a heavily exam-oriented curriculum explain this. Nonetheless, in this present study, 50 minutes, the test length, is expected to give students enough time to carry out these steps to produce a 250 word w riting test, achieving a relative balance between practicality and validity. 2.3.2 Marking scheme Two basic approaches to scoring, analytical scoring and ho listic scoring, are discussed by a number o f authors (Hughes, 1989; Hamp-Lyons, 1991; Weir, 1993; O ’M alley and Pierce, 1996; McNamara, 2000; Weigle, 2002). In the following subsections, the definitions o f the two marking approaches and the arguments for which approaches to adopt are presented. 2.3.2.1 Analytical marking A nalytical m arking is the method in which each aspect o f a performance e.g. content, grammar, organization, etc is rated separately and the final score is the total o f these individual ones (Weir, 1993). Other writers share similar definitions though the Chapter Two 10 wordings are o f little difference like “analytic scales separate the features o f a composition into components that are each scored separately, ,(O ’Malley and Pierce, 1996: 144) or “ scripts are rated on several aspects o f w riting or criteria rather than given a single score” (Weigle, 2002: 114). Analytical scoring holds a number o f advantages. First, more detailed feedback regarding specific information for aspects o f students’ w riting performance and diagnostic teachers in planning instruction are provided (Perkins, 1983 cited in O’Malley and Pierce, 1996; Bachman and Palmer, 1996; Weigle, 2002). Moreover, components o f writing students have progressed in most rapidly can be seen through analytical scoring (Hamp-Lyons, 1991) and the problem o f uneven development o f subskills can be revealed (Hughes, 1989). Another advantage in terms o f scorers and scoring process is that every aspect o f writing skill that might be ignored has to be looked at and more scores given for each component can result in more reliable scores (Hughes, 1989). In addition, explicit concern reflected in teachers’ feedback, particularly teachers’ praise on the positive aspects o f students’ w riting makes students feel motivated, encouraged and invited to write as is shown by Tran (2002) in relation to Vietnamese students o f English in writing. Limitations can be witnessed in this kind o f marking scheme. First, it is obvious that analytical marking takes longer than holistic schemes, an issue o f practicality, since more than one score for each component is required (Hughes, 1989; O’Malley and Pierce, 1996; Weigle, 2002). The major problem as seen by Hughes (1989),Weir, (1993) and Weigle (2002) is whether scorers judge each aspect separately from the others [called a “ halo effect” by Hughes (1989: 103) and Weir (1993: 163)]. In other words, “ rating o f one criterion might have a knock-on effect in the rating o f the next, ’ (Weir, 1993: 164) since every component in a piece o f w riting is integrated. Madsen (1983) and O’M alley and Pierce (1996) also raised another issue. It is that teachers raters do not agree w ith the weight given to each component (O ’M alley and Pierce, 1996) or do not know how to weigh each error (Madsen, 1983). It might even be the case that experienced raters use the analytical scoring scheme but rate more holistically to come to a single score (Weigle ,2002). Chapter Two 11 In analytic marking schemes, each aspect such as organization, vocabulary and grammar might be equally weighted like in Anderson’s scheme (cited in Hughes 1989: 101-102,see Appendix 1),which consists o f five scales, each divided into six levels with score points ranging from 1 to 6,where the final score is the total o f all weighted scales. A note-worthy point in this scheme is “ the conjunction o f frequency o f error and the effect o f errors on communication, ,(Hughes, 1989: 103). Put in a different way, a small number o f grammatical errors can have a more serious effect on communication than a series o f another kind (Hughes, 1989). A different scheme can be witnessed in Jacobs et al.’s scoring profile (1981 cited in Hughes, 1989,see Appendix 2). It is apparent from this scheme is that the more significant one aspect is, the more weight it receives (Hughes, 1989; Weigle, 2002). Five components o f writing: content, language use, organization, vocabulary and mechanics receive 30, 25,20,20 and 5 points respectively in order o f different emphasis. The weightings can vary according to students’ levels (Hughes, 1989). The association o f each score with its descriptors helps raters to grant scores in accordance w ith students’ levels (Hughes, 1989). 2.3.2.2 Holistic marking H olistic marking (referred to as global marking by Weir, 1993; Bachman and Palmer, 1996 or general impression marking by Weir, 1990; Weigle, 2002 or “ im pressionistic” scoring by Hughes, 1989) refers to the rating o f a performance as a whole (McNamara, 2000). In this approach, scores are not required for each component in the criteria. Hamp-Lyons (1991) and Weigle (2002) shared a common view that holistic scoring has become prevalent in w riting assessment in the past 25 years. A number o f positive features explain this trend. Apparently, this approach to scoring is faster and consequently less expensive than any other approach (Hughes, 1989; Hamp-Lyons, 1991; Weir, 1993; Weigle, 2002). It takes experienced scorers a couple o f minutes (Hughes, 1989) or even one minute or less (Hamp-Lyons, 1991) to assign a one page text a score. For this reason, the use o f more than one rater is encouraged (Weir, 1993) “ to compensate for interrater unreliability” (P. Cooper, 1984: 243 cited in Chapter Two 12 Hamp-Lyons, 1991). This notion is also shared by Hughes (1989) and Hamp-Lyons (1991) as it is argued that scores given by multiple raters are more reliable than those given by a single one. This is, however, true only i f the markers are equally consistent in their own marking. I f this is not the case the re lia b ility o f the more consistent marker on his own might be better than the combined re lia b ility estimate fo r two markers who exhibit unequal consistencies. (Weir, 1993: 165) Another advantage o f this kind o f scoring is the intention to focus the reader’s attention on the strengths o f the writing as White (1985,cited in Hamp-Lyons, 1991) claimed. Readers’ attention can concentrate on certain aspects o f w riting and therefore can provide appropriate information in an efficient way (Weigle, 2002). What is more, holistic scoring reflects best the authentic and personal reaction o f a reader to a text (White, 1984,cited in Weigle, 2002) and “ reinforces the vision o f reading and writing as intensely individual activities involving the fu ll s e lf’ (White, 1985: 33, quoted in Hamp-Lyons, 1991). Holistic marking, on the other hand, presents several weaknesses. First, a person’s writing ability cannot be seen through the single score since diagnostic information through scores for each component o f a w riting task such as organization, content, vocabulary is not provided (Weir, 1993; Bachman and Palmer, 1996; Weigle, 2002). One might have a good command o f grammar, but not be very good at organizing ideas. Others might have abundant ideas organized in a logical way but be poor at sentence structure (Weigle ,2002). A profile o f student writers including a description o f language ability (errors) or a prescription for treatment is expected but holistic scoring fails to do so (Bachman and Palmer, 1996). A major problem o f holistic scoring is the employment o f multiple 'hidden , components o f language ability when arriving at the final score as Bachman and Palmer (1996) and Weigle (2002) demonstrated. It is d ifficu lt to interpret the score since different raters do not necessarily use the same criteria and i f they do, different components might be weighted differently (Bachman, 1996; Weigle, 2002). “ Superficial characteristics” (Bachman and Palmer, 1996: 144) namely length, handwriting (Markham, 1976; Sloan and McGinnis, 1982 cited in Weigle, 2002) , word choice and spelling errors (Charney, 1984,cited in Vaughan, 1991) which are Chapter Two 13
- Xem thêm -

Tài liệu liên quan

Tài liệu xem nhiều nhất