• E-mail
  • Print
  • Comment
  • Font Size
  • Digg
  • del.icio.us
  • Discuss article

A Comparative Study of Teacher Ratings of Emergent Literacy Skills and Student Performance on a Standardized Measure

Posted on: Monday, 3 October 2005, 12:01 CDT

By Beswick, J F; Willms, J Douglas; Sloat, E A

In this article, we compared outcomes derived from teacher rating scales and standardized tests to determine if there were systematic discrepancies between kindergarten teachers' ratings of literacy skills and results derived from direct assessment of the emergent literacy skills of kindergarten students. We assessed the emergent reading skills of 205 kindergarten students using the Teacher Rating Scale - Literacy, and a standardized measure, the Wechsler Individual Achievement Test - Word Reading. Although the instruments measured the same precursors to reading, teacher ratings were more negative than would be expected from students' performance on the direct measures. Regression analyses indicated that teacher ratings were most closely associated with child variables such as gender and behaviour, and family variables such as maternal education. These findings suggest that the construct validity of teacher rating scales may be compromised by the influence of extraneous variables. Findings are discussed with respect to the use of contextual measures such as teacher rating scales and checklists, the impact of extraneous variables on assessment practice, and the role of teacher education in enhancing the validity of teachers' contextual assessment of emergent literacy skills.

Introduction

Failure to meet grade-level expectations in reading is the most cited reason for retention in the early grades (Snow, Burns & Griffin, 1998). Reading difficulties are pervasive and persistent. They occur across all ethnic and socioeconomic strata, and research indicates that students who read poorly at the end of first grade are likely to be well below grade level in reading after three additional years of instruction (Juel,1988). This deficit also tends to be cumulative since students with poor reading skills read less, have reduced access to the curriculum, learn less, and fall farther behind same-grade peers (Jackson, Paratore, Chard & Garnick, 1999). Indeed, the deleterious curricular and social consequences of late detection of reading difficulties have been well documented (Coleman & Vaughn, 2000; Jackson, Paratore, Chard, & Garnick, 1999; Vernon- Feagans, Hammer, Miccio & Manlove, 2003). However, there is now converging research evidence which attests to the mutability of reading trajectories and the effectiveness of early intervention in preventing reading failure (Agostin & Bain, 1997; Blachman, Ball, Black & Tangel, 2000; Dickson & Bursuck, 1999; Leppanen, Niemi, Aunola & Nurmi, 2004; Lyon et al., 2001; Notary-Syverson, O'Connor & Vadasy, 1998; Phillips, Morris, Osmond, & Maynard, 2002; Shaywitz, 2003; Snow, Burns & Griffin, 1997; Torgeson, 2001; Vadasy, 2000). The onus, then, is on educators to identify young students who are encountering literacy learning difficulties and provide empirically validated intervention to prevent reading failure.

The first step in preventive early intervention is assessment. In practice, early literacy assessment is largely a contextbased, informal process during which teachers observe emergent literacy skills and make judgements about each child's skill and ability, developmental rate, and responsiveness to instruction. These contextual observations are often supported by the use of developmental continua and rubrics in the form of checklists or rating scales. Based on an analysis of teachers' early literacy assessment practices, Meisels and Parker (2000) determined that teachers used observation to assess 70 percent of early literacy skills, and frequently used checklists to support those observations. However, there are concerns about the psychometric adequacy and objectivity of those observational measures. Meisels and Parker (2000) found that these contextual assessment measures had limited evidence of psychometric adequacy: only 14 percent had good reliability data and fewer still had evidence of concurrent or predictive validity. As well, these measures provided limited information to guide teacher observations, record keeping, and interpretation (Meisels & Parker, 2000; Paris & Hoffman, 2004).

In addition to concern about the limitations in available psychometric data and administrative information, there is also concern that teachers' informal assessments may be influenced by extraneous factors. Research demonstrates, for instance, that teachers' values and cultural expectations affect their perceptions and, ultimately, their assessments of students (Hosp & Reschly, 2003; Shaywitz, 2003). In view of the current emphasis on teachers' contextual assessment of emergent literacy skills, we need to determine the degree to which such assessments are valid judgements about children's early literacy skill development. One approach to this issue would be to examine closely the relationship between teacher ratings of literacy skills and student performance on individually-administered standardized measures with prior evidence of construct validity. While cognizant of concerns regarding standardized measures and their use with young children (Goodwin & Goodwin, 1997; National Education Goals Panel, 1998), we feel such scrutiny would assist us in determining whether irrelevant variables affect the valid use of contextual measures and thereby mitigate the efficacy of early identification procedures (Ebel & Frisbie, 1991; Popham, 2000). In this study, we examine this issue by investigating the discrepancy between assessment information derived from teacher ratings and from formal tests, and by analyzing the variables that influence teacher judgments and their contextual assessments of kindergarten students' emergent literacy skills.

Literature Review

Numerous researchers have investigated the relative efficacy of teacher ratings and formal tests in the early identification of learning difficulties. As well, they have documented the influence of extraneous variables on both teacher referrals and their contextual assessment of students (Fletcher & Satz, 1984; Goodwin & Goodwin, 1997; Gresham & MacMillan, 1997; Hecht & Greenfield, 2001; Shaywitz, 2003; Shepard, 1994; Wolf, 1997). A review of the research literature on the predictive validity of teacher ratings and direct measures reveals mixed reviews, with some studies finding teacher ratings equivalent, or superior to, formal tests (Gresham & MacMillan, 1997; Hecht & Greenfield, 2001; Quay & Steele, 1998), and other studies determining that formal tests are better predictors of student outcomes (Flynn & Rahbar, 1998; Fletcher & Satz, 1984; Shaywitz, 2003).

Teacher rating scales are the second most frequently used measure in education, second only to teacher-made tests (Wolf, 1997). Ideally, these scales should enhance the predictive validity of teacher assessment by systematically focussing attention on salient developmental characteristics and providing contextual data based on observation of a child's interaction with the curriculum. By contrast, formal tests are direct measures that assess performance on a series of contrived tasks within a defined time period. Since groupadministered tests have questionable validity with children younger than eight years of age, the preferred approach is individual assessment by trained examiners (Goodwin & Goodwin, 1997).

At issue here, however, is concern that direct measures are more expensive to administer than teacher rating scales, and are generally considered a limited representation of a child's emerging skills since they sample performance at only one point in time. Additionally, formal assessments are decontextualized, and they may not be reflective of particular curricula. As well, there are concerns about the compatibility of test requirements with the response capabilities of young children (Goodwin & Goodwin, 1997; Shepard, 1994). However, proponents of formal tests argue that valid reliable measures, when used by psychometrically literate examiners, yield meaningful information that can assist in diagnosis and instruction (Goodwin & Goodwin, 1997). Hence, test proponents suggest that well standardized measures, when administered to the intended population and interpreted as recommended, are superior to teacher ratings in that they provide evidence of construct validity, and have the added advantage of standardized administration and interpretation procedures which offer safeguards against bias. Although discussion of the relative efficacy of the two types of measures is ongoing, both standardized measures and teacher rating scales are used in early literacy assessment. Proponents of standardized tests contend that although there is no guarantee that measures will be administered and interpreted as directed, well standardized measures provide evidence of technical adequacy, and include detailed procedures for administration and scoring that enhance consistency and mitigate against examiner bias. In contrast to standardized measures, teacher rating scales and checklists are generally accompanied by limited procedural information and by little or no evidence of psychometric adequacy (Meisels & Piker, 2000; Paris & Hoffman, 2004). There is also concern that the validity of observational measures may be compromised by the presence of bi\as (Ebel & Frisbie, 1991; Popham, 2000). In view of the frequency with which checklists and rating scales are used to guide teacher observations, there is concern that teacher ratings may be affected by extraneous factors that can bias the assessment process. Such influences exacerbate measurement error, threaten construct validity, and have potentially pernicious outcomes for subgroups of students (Popham, 2000).

Extraneous influences can result in outcome differences for students that are not directly attributable to variation in performance on the construct being assessed. Since teacher expectations are based on what they deem to be appropriate, these expectations influence their perceptions of students. For example, research suggests that cultural differences such as dialectical variation, and a sauntering walking style referred to as the stroll, can affect teacher ratings and lead to underestimation of student achievement, overestimation of behavioural difficulties, and increased referrals for special services (Hosp & Reschly, 2003; Neal, McGray, Welk-Johnson, & Bridgest, 2003; Serwatka, Dove & Hodge, 1986). Such negative bias in teacher ratings has potentially deleterious consequences for students, including academic disengagement, inappropriate referrals for special services, misclassification, and educational segregation (Hosp & Reschly, 2003; Winograd, Flores-Duenas, & Arrington, 2003).

Extraneous factors can also influence teachers' assessment of students' literacy skills. In the Connecticut Longitudinal Study, for example, teachers referred four times as many boys as girls for investigation of significant reading difficulty (Shaywitz, 2003). While suspicion of reading difficulty was the espoused reason for referral, individual assessment revealed an equivalent number of boys and girls with reading difficulties. Indeed, Shaywitz (2003) has suggested that teachers referred more boys because classroom behavioural norms for young children are more in keeping with expectations for girls. Hence, teachers' ratings of boys' literacy skills were negatively biased, and their overreferral of boys reflected behavioural, not literacy concerns (Shaywitz, 2003). Referral bias has been the subject of several recent investigations (Knotek, 2003; Neal et al., 2003; Hosp & Reschly, 2003). Referral bias denotes the tendency of teachers to refer a larger number of certain subgroups of students based on factors such as gender and behavioural status, rather than on objective data related to the reason for referral. Following teacher referrals, multidisciplinary teams may engage in confirmatory bias when they are influenced by factors such as the social capital of the student's family and the team's desire to support the teacher (Knotek, 2003). Research suggests that initial referral bias is more likely to be confirmed when behavioural descriptions elicit emotive social support, when students are poor and perceived as trouble-makers, and when the teacher is influenced by culturally-conditioned behaviours (Knotek, 2003; Neal et al., 2003). However, referral bias may also disadvantage well-behaved students, including girls who, although experiencing reading difficulties, may not be referred for special services because they are assumed to be doing fine (Phillips et al., 2002).

The process of misdiagnosis, or failure to diagnose, begins with classroom assessment when teacher ratings and referrals are influenced by variables extraneous to the construct being assessed. Regardless of the quality of the rating scale or checklist, construct validity is compromised. Construct validity denotes the extent to which the results obtained on a measure or instrument can be interpreted as clearly indicating the skill or trait intended to be measured (Bachman & Palmer, 1996). For example, evidence of construct validity for a teacher rating scale of early literacy skills may be inferred based on the degree to which results agree with theoretically-related and psychometrically-sound external measures of emergent reading skills (Sattler, 1992). However, instruments are only valid to the extent that they are used for their intended purpose. When teacher ratings of emergent literacy skills are influenced by extraneous variables such as behaviour, construct validity is compromised.

The literature contains numerous recommendations to enhance the validity of teacher ratings. These include: a) using research- based, skill-specific teacher rating scales (Rynn & Rahbar, 1997); b) using teacher ratings to augment test results (Shepard, 1994; Sattler, 1992), or as the first step in a two-step identification process (Gredler, 1997; Flynn & Rahbar, 1998; Teisl, Mazzocco & Myers, 2001); and, c) encouraging teachers to 'rate by trait', a procedure designed to mitigate against bias by requiring teachers to rate all students on one skill category before proceeding to rate all students on the next category (Ebel & Frisbie, 1991).

The Present Study

There is consensus that no single measure or type of assessment constitutes best practice for all literacy assessment purposes (Paris & Hoffman, 2004). Rather, it is the informed use of a variety of measures, both contextual and direct, that ensures quality practice appropriate to the specific assessment purpose (Winograd et al, 2003). Indeed, successful K-3 teachers report that they use a variety of measures, including observations, checklists, inventories, rating scales, anecdotal evidence, and formal tests (Paris & Hoffman, 2004). However, although teachers can choose from a broad array of commercial and non-commercial assessments, limited information exists regarding the reliability and validity of many of these measures, and this evidence is most likely to be absent for non-commercial measures (Meisels & Parker, 2000). Since early identification of reading difficulties is contingent on early assessment, and the efficacy of preventive early intervention for students with reading difficulties is well accepted, the validity of early assessment measures is germane to preventive early identification (Gallagher, 1999; Gredler, 1999; LaParo & Pianta, 2000; Meisels, 1999). Teacher ratings and checklists are frequently used in early literacy assessment since they are contextual, inexpensive, and easy to use (Wolf, 1997). The validity of these measures depends on the degree to which they assess the specified constructs, and the extent to which they are used as intended. Because of the vital contribution of teachers' contextual assessment to the early identification process, and the widespread use of teacher rating scales and checklists in early assessment, we must provide evidence of the psychometric adequacy of such informal assessment measures by gathering evidence of their validity in relation to measures with prior evidence of construct validity, and identifying construct-irrelevant variables that may influence contextual assessment and diminish validity.

In this study, we compared outcomes derived from teacher ratings of emergent literacy skills with those obtained on standardized direct measures with multiple sources of validity evidence. We also examined the influence of contextual variables in order to address the following research questions:

1. Is there a discrepancy between kindergarten teachers' ratings of students' emergent literacy skills and students' performance on a standardized measure of early literacy skills with extensive prior evidence of construct validity?

2. If a discrepancy exists, is it related to characteristics of the child such as gender, characteristics of the family such as socioeconomic status, or behavioural features of the child in the school setting?

3. What are the implications of these findings for practice and policy in early literacy assessment?

Method

To address these research questions, we collaborated with teachers in nine different schools to assess the emergent literacy skills of 205 kindergarten students. We used two assessment measures, the Teacher Rating Scale - Literacy (Flynn, 1997), and the Word Reading subtest of the Wechsler Individual Achievement Test - Second Edition (Psychological Corporation, 2002). Both measures assessed similar precursors to reading, including visual discrimination of letter forms, letter naming, phonological awareness, letter-sound knowledge, and word identification. We employed several strategies to diminish bias and enhance the validity of teacher ratings. We used a research-based teacher rating scale, asked teachers to 'rate by trait' rather than by student, and arranged for substitutes so that participating teachers would have uninterrupted time to reflect upon and peruse students' literacy portfolios before assigning ratings. We then analysed the data to: a) estimate correlations among the measures; b) determine whether discrepancies existed between results derived from teacher rating scales and those derived from standardized tests; and, c) investigate the variables influencing those discrepancies. These analyses allowed us to determine the extent to which results obtained using the teacher rating scale were valid in relation to outcomes obtained on the standardized measure, the extent to which they were discrepant, and the variables that influenced the discrepancy between results obtained on the two measures.

Instruments

The instruments used to assess students' literacy development were a teacher rating scale of early literacy skills, the Teacher Rating Scale-Literacy (TRS), and a standardized reading test, the Word Reading subtest of the Wechsler Individual Achievement Test - Second Edition (WIAT-II). Additionally, in view of research findings regarding the impact of behavioural variables on teacher ratings of literacy skills (Shaywitz, 2003), teachers completed the Conners' Teacher Rating Scale (Conners), to measure behaviour.

The Teacher Rating Scale-Literacy (TRS) is a skill-specific teacher rat\ing scale of emergent literacy skills (Flynn,1997). Although, as with most teacher rating scales, there is limited psychometric data on the TRS, it was chosen for this study because the constructs assessed are based on current literacy research (Flynn & Rahbar, 1998), and are analogous to the items assessed on the WIAT-II Word Reading subtest. Additionally, the items are written in clear behavioural statements, and there is some evidence that the TRS enhances the predictive value of teacher ratings of students' risk status. Flynn and Rahbar (1998) reported that when using the TRS, kindergarten teachers identified 64 percent of students who would experience reading failure in grades one, two, or three. Although below the 80 percent identification rate obtained using a screening battery, the TRS resulted in a sizeable improvement over the 30 percent valid positive rate obtained when teachers were asked to rate students' risk of reading failure using a five-point scale based on descriptive criteria such as "significantly at risk".

The TRS is comprised of behavioural descriptions for these literacy precursors: knowledge of letter names; visual discrimination of letter forms; knowledge of letter-sound correspondence; phonological awareness; decoding skill; and word identification. Characteristics are rated from low to high on a scale of 1 to 10, with a behavioural statement provided for the lowest and the highest score in each of the skill categories. Scores are summed to produce a total score, with a minimum of 10 and a maximum of 100.

The Wechsler Individual Achievement Test - Second Edition (WIAT- II) is a recent revision of the original Wechsler Individual Achievement Test (WIAT). The WIAT-II is an individually administered comprehensive norm-referenced achievement battery designed to assess pre-academic and academic skills of individuals aged 4 years through adulthood. It is also designed to compare individual performance against same-age or samegrade peers (Psychological Corporation, 2002). The Word Reading subtest of the WIAT-II is appropriate for assessing emergent literacy skills of students at the kindergarten level. Items in this subtest directly assess the constructs assessed by the TRS: alphabetic knowledge; visual discrimination; phonological awareness; knowledge of letter-sound correspondence; decoding; and word identification (Smith, 2001).

The WIAT-II was standardized during the 1999-2000 and 2000-2001 school years using a stratified random sample of 5,586 individuals. Hence, it allows normative interpretation based on a contemporary sample (Smith, 2001). For five-and sixyear-old students, the interscorer reliability of the WIAT-II Word Reading test is 0.99, and the accumulated data from multiple studies suggest that the WIAT- II possesses good psychometric properties and measures the constructs it was designed to measure (Smith, 2001).

The Conners' Teacher Rating Scale Revised: Short Form (Conners) is a normreferenced teacher rating scale designed to measure internalizing behaviours such as anxiety as well as externalizing behaviours such as aggression and conduct problems. The Conners is suitable for use with children and youth aged 3 to 17 years (Conners, 2001). It is one of six Conners' Rating Scales-Revised which include both long and short versions to be completed by either teachers, parents, or adolescents. The scale contains 28 items grouped into four subscales: a) Oppositional; b) Cognitive Problems/ Inattention; c) Hyperactivity; and, d) an ADHD Index. The items included in the short form of the Conners are those with the highest factor loadings on the longer version of the Connors' scales (Volpe & DuPaul, 2001). Although validation is ongoing, research to date suggests that the Conners' rating scale possesses good psychometric properties when used as directed and in accordance with the purpose for which it was designed (Volpe & DuPaul, 2001).

Participants

Students. A selection of 205 students enrolled in the first year of formal education in nine schools in one large school district in Atlantic Canada comprised the study sample. Recruitment materials were sent to the parents of all eligible children attending those schools. Permission was obtained from parents of 205 students out of a total population of 236, a compliance rate of 87%. The mean age of participants was 73.04 months (SD = 4.06); ages ranged from 64 months to 87 months. One hundred participants (48.8%) were female, and 105 (51.2%) were male. Seven students (3%) had been retained the previous year and were repeating the kindergarten program. Participants resided in small towns and rural areas served by regional elementary schools. Socioeconomic status ranged from low to middle income. Ethnicity was primarily northern European, predominantly English, Irish and Scottish.

Teachers. Twelve provincially certified teachers, with experience ranging from eight to 32 years and a mean teaching experience of 20.33 years, participated in this study. All teachers were female, were of northern European heritage, and were currently employed as full time kindergarten teachers in nine schools.

Variables

Since one purpose of this study was investigation of the contextual variables that affect kindergarten teachers' assessment of students' emergent literacy skills, student and family variables were examined.

In addition to student behaviour as measured by the Conners, other student variables used were gender, chronological age, and repeater status. Student behaviour was chosen because it is associated both with the decision to refer for investigation of reading difficulties and the decision to retain (Hallahan & Kaufman,2003; Jimmerson & Kaufman, 2003; Shaywitz, 2003). Gender and chronological age were chosen because more boys than girls are referred for reading difficulties (Shaywitz, 2003), and more boys are retained, thereby becoming older for grade level (Jimmerson & Kaufman, 2003). Additionally, research suggests that the most cited reason for retention in the early grades is the inability to read at grade level (Snow, Burns & Griffin, 1998).

Dummy coding was used to categorize the gender variable with females assigned a value of "1" and males assigned a value of "0". Since the mean age of students in this study was 73.04 months, students with a chronological age below the mean were categorized as younger; those with a chronological age at or above the mean were categorized as older. Seven students in the study sample were repeating kindergarten, and were thus categorized as repeater, while first-year kindergarten students were classified as nonrepeater. All repeaters had a chronological age above 79 months.

Family background variables included parents' educational level, their occupational status, and family structure. Those variables were chosen because family socioeconomic status is associated with children's outcomes on both cognitive and behavioural assessments (Willms, 2002). Additionally, research suggests that lone parent households have traditionally been headed by single mothers with lower levels of education and lower socioeconomic status (LeFebvre & Merrigan, 1998; Lipps & Yiptong-Avila, 1999).

Four classifications were used to describe parental education: a) less than high school (10 years of school); b) high school completion (12 years of schooling); c) some post-secondary education (14 years of schooling); and, d) post-secondary degree or diploma (16 years of schooling). For the purpose of data analysis, two classifications were used to describe parental occupation: working class and middle class. These were derived from six employment categories describing father's occupation, which are used in several studies to distinguish between working and middle class occupations (Goldthorpe & Hope, 1974). The six original classifications ranged from unskilled to professional; we categorized classifications one to three as working class, and classifications four to six as middle class. We designated the family structure variable as a one-parent family if the child normally resided with one adult, and as a two- parent family if the child lived in a home which normally included two adults, regardless of marital status.

Data Collection

Prior to data collection, we held a meeting with school principals to explain the purpose of the study and to provide a written synopsis of the research proposal and data collection procedures. All principals discussed the information with their kindergarten teachers and then contacted the researchers to confirm that their school would participate. A preliminary meeting was then held with each teacher to discuss the teacher rating scales to be completed, and to request that teachers bring their class registers and students' literacy portfolios to the scheduled meeting with a member of the research team.

Data collection occurred during April and May of the kindergarten year, and involved two simultaneously conducted steps:

Step 1. A researcher met individually with each kindergarten teacher in a private room away from the classroom setting. The purpose, components, and response scale of the TRS were reviewed, and training was provided on completing the rating scale. Next, a semi-structured interview was conducted about each student to elicit information on child and family variables such as gender, chronological age, repeater status, parental education, and parental occupational status. The teacher was then asked to complete the TRS for all students using the "rate by trait" procedure to mitigate against bias in teachers' ratings (Ebel & Frisbie, 1991).

Training was then provided on the Conners' rating scale, and teachers completed a Conners for each student.

Step 2. For each class, while the teacher was meeting with one of the researchers to complete rating scales, two experienced educators with M.Ed, degrees and prior professional experience in assessing young childre\n, were introduced to the kindergarten class. Children were prepared for this meeting and for their substitute teacher since teachers had talked to their classes about the procedure the previous day. On meeting the children, the educators who conducted the testing explained to the children that they would meet with them individually to do some reading activities in a nearby room. They then administered the standardized measure, the WIAT-II Word Reading subtest.

Data Analysis and Results

To compare scores on the teacher rating scales with those obtained on standardized direct measures (Mean =100, SD = 15), we converted scores from the TRS and the Conners to standard scores with a mean of 100 and a standard deviation of 15. We then estimated Pearson product-moment correlations among the three measures for the entire sample (N = 205).

Correlations. Table 1 shows the means and standard deviations for the measures, their relationship to the national norm of 100 and the correlations between the scores.

Mean scores for the standardized direct measure of reading and for all four of the Conners' behaviour ratings were significantly greater than the national norm of 100. There was a moderately strong positive correlation, r = 0.67, p < .01, between the rating scale, the TRS and the standardized WIAT-II reading subtest. There was a statistically significant but low negative correlation between the standardized measure of reading and all four Conners' subscales:

Table 1

Means, Standard Deviations, and Pearson Product-Moment Correlation Matrix for Direct Assessment of Reading and for Teacher Rating Scales of Literacy and Behaviour (N=205)

Oppositional r = - 0.14, p < .05; Cognitive Problems-Inattention r = - 0.29, p < .01; Hyperactive r = - 0.18, p < .01; ADHD Index r = - 0.22, p < .01. There was also a statistically significant negative relationship between the TRS and each of the four Conners' ratings: Oppositional r = - 0.32, p < .01; Cognitive Problems-Inattention r = - 0.63, p < .01; Hyperactive r = - 0.36, p < .01; and, ADHD Index r = - 0.44, p < .01. These negative relationships were expected since scores on the Conners increase with increased behaviour problems. Finally, there were moderate to strong correlations among the four Conners' subscales.

Difference Scores. To determine if there were systematic discrepancies between teacher ratings and the results of direct assessment, it was necessary to first quantify the difference between results obtained on the two analogous measures, the TRS and the WIAT-II Word Reading subtest. We thus calculated difference scores to discern the difference between the mean score on the TRS and the mean score on the WIAT-II Word Reading subtest, a direct measure of the same early literacy constructs. When this score is positive, it indicates that teacher ratings are more positive than results obtained on direct assessment; when negative, teacher ratings are more severe than would be expected based on direct assessment results. We computed difference scores by subtracting standardized raw scores on the direct assessment WIAT-II Word Reading subtest from standardized raw scores on the teacher- administered TRS. We used standardized raw scores instead of age- normed standard scores to reflect more accurately real world performance differences since age-normed standard scores would be lower for older students than for younger students with similar raw scores. Table 2 shows the relationship between child characteristics, age and repeater status, and the discrepancy between standardized raw scores on the TRS-Literacy and the WIAT-II Word Reading subtest.

Table 2

Relationship between Child Age and Repeater Status and the Discrepancy (Difference Score) between Standardized Raw Scores on the TRS-Literacy and the WIAT-II Reading Subtest (N=205)

Age. There was little discrepancy between the mean scores of younger and older students on the TRS and the WIATII. Difference scores revealed only slight variation in scores for younger (M = - 3.53, SD = 9.88) versus older students (M = - 4.86, SD = 10.78).

Repeater Status. With respect to the second child characteristic, repeater status, the mean TRS and WIAT-II Word Reading scores of repeaters were lower than the mean scores of nonrepeaters on both measures. Difference scores were also higher for students who were repeating kindergarten (M = - 15.84, SD = 6.44) than for students who were nonrepeaters (M = - 3.74, SD = 10.19).

Other Child and Family Characteristics. We then calculated the means and standard deviations of the TRS and WIATII Word Reading subtest, as well as the difference scores, for the remaining child and family background variables. Since age and repeater status had been investigated previously, repeaters were eliminated from this analysis leaving a sample of 198 nonrepeaters. A series of one way analyses of variance (ANOVA) was performed with one of the child or family background variables as the independent variable and difference scores as the dependent variable. The independent variables were: child gender, mother's education, father's education, mother's work, father's work, and family structure.

Table 3

Relationship between Difference Scores and Child and Family Characteristics (N=198)

Table 3 shows the means and standard deviations of the TRS, the WIAT-II Word Reading subtest, and the difference scores for each category of each independent variable.

Observed F values for each independent variable are reported in the last column, with group differences that are statistically significant appearing in boldface. There were significant differences in group mean difference scores for the following independent variables: gender F(1, 196) = 9.10, p <.01; mother's education F(3, 194) = 4.70, p < .01; father's education F(3, 194) = 4.09, p < .01; and mother's work F(1, 195) = 8.63, p < .01. Table 3 also shows an elevated negative difference score for males over females, a consistent reduction in negative difference scores as mother's education increases as well as a general pattern of reduction as father's education increases, and a reduction in negative difference scores when maternal work is categorized as middle class in comparison with scores when maternal work is categorized as working class. These results suggest that there is systematic discrepancy between teacher ratings and performance on the direct measure, and that, for students in the first year of kindergarten, this discrepancy is most closely related to three predictors: child gender, parental education, and mother's work.

We then performed a standard multiple regression analysis to regress the difference scores on the significant independent variables: child gender, mother's education, father's education, and mother's work. Subsequent regression analyses were conducted to estimate the variance in difference scores attributable to the four behaviour ratings on the Conners.

Table 4

Regression Coefficients (and Standard Errors) for Regression Models Predicting Difference Scores from Child Gender, Family Background Variables, and Behaviour Ratings on the Conners' Teacher Rating Scales (N=198)

Table 4 displays the unstandardized regression coefficients and standard errors for regression models predicting difference scores from child gender, and from family background variables: mother's education, father's education, and mother's work. The regression equation with the four predictors was significant. Together the independent variables contributed 12% in shared variability, R^sup 2^ = .12, F(4, 192) = 6.29, p < .001. In terms of the individual relationships between the independent variables and the difference scores, gender (t = 3.30, p < .01) and mother's education (t = 2.09, p < .05) were predictive (Model I), and 11% of the variance in difference scores was attributable to gender and mother's education, R= .11, F(2, 194) = 12.30, p < .001 (Model II).

Table 4 also displays the results of subsequent regression analyses conducted to evaluate whether the Conners' behaviour ratings predicted teacher difference scores over and above gender and mother's education. Because the intercorrelations of the four Conners' subscales are in the moderate to strong range, each subscale was entered separately in combination with gender and mother's education. Regression results show that each of the Conners' subscales contributes to the prediction of difference scores, over and above gender and mother's education:

a) Oppositional subscale, R^sup 2^ = .20, F(3, 194) = 16.12, p < .001 (Model III);

b) Cognitive Problems-Inattention subscale, R^sup 2^ = .40, F(3, 194) = 42.31, p < .001 (Model IV);

c) Hyperactive subscale, R2 = .20, F(3,194) = 16.05, p < .001 (Model V);

d) ADHD Index, R^sup 2^ = .27, F(3, 194) = 23.60, p < .001 (Model VI).

These results suggest that, in addition to the gender of the child and education of the mother, student behaviour significantly influenced the variability in teacher ratings of early literacy skills. Outcomes on the Conners' Cognitive Problems-Inattention subscale contributed an additional 29 percent of this variance; the ADHD index contributed an additional 16 percent; and ratings on the Oppositional and Hyperactive behaviour subscales each contributed an additional 9 percent.

Discussion

Present findings reveal systematic discrepancy between kindergarten teachers' ratings of students' emergent literacy skills and students' performance on a standardized direct measure with prior evidence of construct validity. Although the measures assess the same constructs, teachers' ratings of emergent literacy skills are severe in comparison with results derived from direct assessment. The discrepancy between teacher ratings and standardized results is most closely associated with child, family, and behavioural factors. In terms of student characteristics, teachers are more negative in their ratings of literacy skills when the students ar\e repeating kindergarten, are male, have mothers with low education, and exhibit behavioural difficulties in the classroom. When family and behavioural factors are considered, ratings become consistently more positive as maternal education increases, and as negative behaviours in the school setting decrease.

Results thus suggest that the use of teacher rating scales and checklists in early literacy assessment may have both positive and negative implications. On the positive side, there is a moderately strong positive correlation between the teacher rating scale of literacy skills, the TRS, and a psychometrically sound measure of emergent literacy with prior validity evidence, the WIAT-II Word Reading subtest. On the negative side, TRS ratings are affected by child and family characteristics despite best efforts to circumvent bias and diminish measurement error. The most influential child variables are gender and behaviour. Teacher ratings are more positive for females than for males, a finding consistent with the literature and with concern regarding referral bias since more boys than girls are referred for special services, and more boys are diagnosed with learning and behavioural difficulties, including reading difficulties (Hallahan & Kaufmann, 2003; Shaywitz, 2003). The present results support earlier findings that the classroom behaviour of young boys precipitates a disproportionate number of referrals for special services (Boggiano & Barrett, 1992; Phillips et al, 2002; Shaywitz, 2003). Additionally, as suggested by Cooper and Farran (1988), it appears that behavioural and academic variables interact to influence teacher ratings such that ratings of academic proficiency are influenced by classroom behaviour.

Maternal education is also associated with teacher ratings of emergent literacy skills. Research emanating from Canada's National Longitudinal Study of Children and Youth suggests that, traditionally, single parent mothers have had lower levels of education and lower socioeconomic status, and that this low socioeconomic status negatively affects children's outcomes on cognitive assessment (LeFebvre & Merrigan,1998; Lipps & Yiptong- Avila, 1999). Indeed, Willms (2002) estimated that the odds of a child from a low socioeconomic status family having behavioural or cognitive difficulties are about one-third higher than for a child from a family of average income. Additionally, he found that maternal education is the variable that most directly influences outcomes for preschool and early school age children because education is closely associated with maternal occupation and socioeconomic status, two variables that directly affect the resources available to support children.

Our results suggest that teachers' knowledge of behavioural and global contextual factors seem to influence their ratings of students' literacy skills. Indeed, it may be unrealistic to expect teachers to rate students on a complex construct such as emergent literacy development as if it were a discrete attribute detached from contextual influences. Thus, although teacher rating scales are an enticing measurement option, the present findings suggest that they may be prone to systematic error which diminishes validity. Validity depends both on the accuracy with which rating scales represent well-defined constructs and on their valid use in accordance with the intended measurement purpose. Confident interpretation of results is conditional on the measure being used as directed and for the intended purpose (American Educational Research Association, 1999; Huck, 2000). When skill-specific ratings in the academic-cognitive domain, such as ratings of emergent literacy skills on the TRS, are influenced by variables in the social-behavioural domain and by global contextual factors, the instrument is neither used as directed nor as intended. Thus, even if the rating scale or checklist does adequately represent the construct being assessed, systematic error may occur because of invalid use. Hence, the present findings have implications for policy and practice with respect to the use of rating scales and checklists in early literacy assessment, the need to be aware of the variables that affect teachers' contextual assessment of students, and the role of teacher education in enhancing the valid use of assessment measures.

Implications for Policy and Practice

Teacher Rating Scales in Early Assessment

Users of measurement instruments have ultimate responsibility for their appropriate use and interpretation (American Educational Research Association, 1999). Hence, teachers of young children must be vigilant when using and interpreting all assessment measures, including teacher rating scales and checklists. Further, these measures should not be the sole criterion that guides decision making or instructional planning for children. As suggested by other researchers, teacher rating scales should be used as one of multiple assessment measures, or as the first step in a multi-tiered identification procedure (Berninger, Stage, Smith & Hildebrand, 2001; Flynn, 2001; Gredler, 2000; Meisels, 1999; Teisl et al., 2001). The use of multiple measures and multiple sources of information diminishes the likelihood that important educational decisions will be made based on inadequate or biased information. This is especially important in the inclusive setting, where, in addition to heterogeneous classes, teachers may also have several students with exceptional learning needs. The concern is that the exigencies of the classroom milieu may negatively affect the objectivity of teacher ratings, introduce rater bias, and further diminish validity. For these reasons, neither a teacher rating scale nor a standardized measure should be considered the sole criterion by which educators make decisions about crucial educational issues.

Variables That May Influence Contextual Assessment of Students

The beliefs and values of teachers impact directly on the daily lives of children as well as their future outcomes in both the academic-cognitive and socialbehavioural domains (Hosp & Reschly, 2003; Knotek, 2003; Neal et al., 2003). Children whose families have higher education, higher income and more resources, enter school at a distinct advantage, and that advantage increases the longer they are in school (Alexander & Entwisle, 1999). Teacher perceptions of behavioural and academic skills influence teacher-child relationships at the outset of kindergarten, with implications for students' socialization, adjustment, and academic achievement (Pianta, Rollins & Steinberg, 1995). Indeed, at the kindergarten to grade three level, teacher beliefs have a more significant effect on variations in classroom practices than do contextual variables such as class size (Maxwell, McWilliam, Ault, & Schuster, 2001). Teacher beliefs, then, shape teacher practices (Minor, Onwuegbuzie, Witcher, & James, 2002), and influence their perceptions and ultimately their assessment of students.

The impact of teacher beliefs on the practice of early assessment is germane to the mandate of teachers' professional organizations, school boards and departments of education. These agencies must provide sustained professional development to ensure that teachers have the knowledge and skills to employ best practices in early literacy assessment (Paris & Hoffman, 2004). They must also demonstrate leadership in initiating dialogue on issues which affect the teaching-learning process, including the influence of teacher beliefs on their assessment of students. This would not be an outcomes-driven dialogue with an idealistic but unrealistic goal such as the eradication of bias. Rather, as suggested by Huet (2003), it would be a process-driven dialogue undertaken for the purposes of building respect, creating "a common language and a community of values" (pp. 38), and fostering a commitment to equity, a fundamental tenet of inclusive educational practice.

Teacher Education

In addition to the role of professional organizations in fostering equity in all educational practices, teacher education programs have an integral role at the preservice level since teacher education is positively associated with developmentally appropriate practice during the early school years (Minor et. al, 2002). It follows that the predictive value of teacher ratings is more likely to be enhanced by changing characteristics of the rater than by changing characteristics of the rating scale. Therefore, teacher educators must strive to ensure that teachers are knowledgeable about assessment measures and well-prepared to conduct literacy assessment (Winograd et al, 2003). They must provide high quality preparatory programs that are firmly grounded in empirically validated best practices, and ensure that preservice teachers of early elementary students have thorough and current knowledge of child development, literacy, differentiated instruction, assessment, and the fundamentals of educational measurement. As well, teacher educators should provide opportunities for reflection and discussion so that preservice teachers are knowledgeable of their own beliefs and values, and are familiar with the customs and belief systems of varied sociocultural groups. The influence of beliefs on teacher practice and the role of beliefs as a determinant of teacher change must be addressed since research clearly shows that the attitudes and beliefs preservice teachers bring with them upon entry into teacher education programs strongly influence their teaching practices (Minor et al., 2002). Once enrolled in a teacher education program, preservice teachers' beliefs are influenced by their practice teaching experiences, and by the provision of opportunities to reflect on these teaching experiences (Minor et al., 2002). Therefore, it is crucial that teacher educators identify issues such as gender, culture, and socioeconomic status as poten\tial sources of bias, and encourage reflective dialogue during the course of preparatory programs (Minor et al., 2002). Support for reflective practice, beginning at the preservice level, would encourage teachers to become cognizant of their personal belief systems and of the potential impact of those beliefs on the contextual assessment of students. Although vitally important to valid early literacy assessment, teacher preparation must go beyond a focus on the psychometric properties of assessment measures, and focus as well on the social milieu in which assessment occurs, the beliefs and biases of those conducting assessment, and the consequential validity of assessment practices (Paris & Hoffman, 2004).

Summary and Directions for Future Research

In the present study, we compared results derived from teacher ratings of literacy skill development with those obtained using an individually-administered standardized measure with prior evidence of construct validity. We found a moderate correlation between teacher ratings of emergent literacy skills and results obtained on the standardized measure. These findings indicate that the teacher rating scale has practical value in guiding teachers' contextual assessment and early identification efforts. However, results also suggest that teacher ratings were affected by student gender and behaviour, as well as by demographic variables such as parental education and socioeconomic status. This finding indicates that extraneous variables may have influenced teachers' contextual assessment of students' literacy skill development. Since early literacy assessment is a precursor to preventive early intervention, it has both academic and social consequences for children. Thus, we must ensure that teachers are knowledgeable assessment practitioners who are cognizant of these variables, aware of their capacity to influence assessment results, and informed as to their potentially negative impact on the academic and social trajectories of students.

Our findings, and those of other researchers, point to the need for further study of the influence of extraneous variables on teachers' contextual assessment of emergent literacy skills.

Since the present study was cross sectional with data collection on only two instruments, we propose longitudinal research using analogous contextual and direct measures that assess the same literacy constructs. Tracking students' literacy growth trajectories over time would provide a clearer picture of the relationship between skill-specific, demographic, and behavioural variables, and allow us to better evaluate the influence of demographic and behavioural factors on teachers' contextual assessment. Additionally, teachers in the present study taught in rural communities and small towns in Atlantic Canada, and it is likely that they were more familiar with the total context of students' lives than teachers in urban areas. For this reason, we believe that future studies should include teachers and students in varied geographical areas, and in both rural and urban settings.

Note: The authors are appreciative of support received from the Canadian Institute for Advanced Research, and of funding from the Social Sciences and Humanities Research Council for its support of the collaborative research program, Raising and Leveling the Bar in Children's Cognitive, Behavioural and Health Outcomes (Grant Number 512 - 2003 - 1016). They also wish to acknowledge the Applied Research Branch, Human Resources Development Canada, for its support of the Understanding the Early Years initiative.

References

Agostin, T. & Bain, S. (1997). Predicting early school success with developmental and social skills screeners. Psychology in the Schools, 34(3), 219-228.

Alexander, K. & Entwistle, D. (1999). Early schooling and social stratification. In R. Pianta & M. Cox (Eds.). The transition to kindergarten. Baltimore: Paul H. Brookes Publishing.

American Educational Research Association (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association

Berninger, V., Stage, S., Smith, D., & Hildebrand, D. (2001). Assessment for reading and writing intervention: A three-tier model for prevention and remediation. In J. Andrews, D. Saklofske & H. Janzen (Eds.). Handbook of psychoeducational assessment (pp. 198- 219). San Diego: Academic Press.

Bachman, L. & Palmer, A. (1996). Language testing in practice: Designing and developing useful language tests. Oxford, UK: Oxford University Press.

Blachman, B., Ball, E., Black, R. & Tangel, D. (2000). Road to the code: a phonological awareness program for young children. Baltimore: Brookes.

Boggiano, A.K. & Barrett, M. (1992). Gender differences in depression in children as a function of motivational orientation. Sex Roles: A Journal of Research, 44, 11-17.

Coleman, J. & Vaughn, S. (2000). Reading intervention for students with emotional/behavioural disorders. Behavioural Disorders, 25, 93-104.

Conners, C. K. (1997). Conners' rating scales - revised: Technical manual. Toronto, Ontario, Canada: Multi-Health Systems.

Conners, C. K. (2001). Conners' rating scales revised: Technical manual. Toronto: Multi-Health Systems, Inc.

Cooper, D. H. & Farran, D.C. (1988). Behavioral risk factors in kindergarten. Early Childhood Research Quarterly, 3, 1-19.

Dickson, S. & Bursuck, W. (1999). Implementation of a model for preventing reading failure: a report from the field. Learning Disabilities Research and Practice, 14, 191-202.

Ebel, R. & Frisbie, D. (1991). Essentials of educational measurement (pp. 250-253). New Jersey: Prentice-Hall.

Fletcher, J. & Satz, P. (1984). Test-based versus teacher-based predictions of academic achievement: a three year longitudinal follow-up. Journal of Pediatric Psychology 9(2), 193-201.

Flynn, J. (1997). Teacher Rating Scale. LaCrosse, WI: LaCrosse Area Dyslexia Research Institute, Inc.

Flynn, J. (2001). From identification to intervention: Improving kindergarten screening for risk of reading failure. In N. Badian (Ed.). Prediction and prevention of reading failure, (pp.133-152). Timonium, MD: York Press.

Flynn, J. & Rahbar, M. (1998). Improving teacher prediction of children at risk for reading failure. Psychology in the Schools, 35(2), 163-172.

Forness, S. & Kavale, K. (2001). Reflections on the future of prevention. Preventing School Failure, 45(2), 75-81.

Goldthorpe, J. H., & Hope, K. (1974). The social grading of occupations: A new approach and scale. London: Oxford University Press.

Gallagher, J. (1999). Policy and the transition process. In R. Pianta & M. Cox (Eds.), The transition to kindergarten. Baltimore: Paul H. Brookes Publishing.

Goodwin, W. & Goodwin, L. (1997). Using standardized measures for evaluating young children's learning. In B. Spodek & O. Saracho (Eds.), Issues in early childhood educational assessment and evaluation. New York: Teachers College Press.

Gredler, G.R. (1997). Issues in early childhood screening and assessment. Psychology in the Schools, 34(2), 99-106.

Gredler, G. R. (2000). Early childhood education - assessment and intervention: what the future holds. Psychology in the Schools, 37(1), 73-79.

Gresham, F. M. & MacMillan, D. L. (1997). Teachers as tests: Differential validity of teacher judgments in identifying students atrisk for learning difficulties. School Psychology Review, 26(1), 47-61.

Hallahan, D. & Kaufman, J. (2003). Exceptional learners (9th edition). Boston: Allyn & Bacon.

Hecht, S. A. & Greenfield, D. B. (2001). Comparing the predictive validity of first grade teacher ratings and reading-related tests on third grade levels of reading skills in young children exposed to poverty. School Psychology Review,30, 50-69.

Hosp, J. L. & Reschly, D. J. (2003). Referral rates for intervention or assessment: a meta-analysis of racial differences. Journal of Special Education, 37(2), 67-80.

Huet, S. (2003). Climbing to higher ground. Education Canada, 43(2), 37-39.

Huck, S. (2000). Reading statistics and research. New York: Addison Wesley Longman, Inc.

Janus, M. & Offord, D. (2000). Readiness to learn at school. Canadian Journal of Policy Research. 1(2), 71-76.

Jackson, J. B., Paratore, J., Chard, D., & Garnick, S. (1999). An early intervention supporting the literacy learning of children experiencing substantial difficulty. Learning Disabilities Research and Practice, 14, 254-267.

Jimmerson, S. & Kaufman, A. (2003). Reading, writing and retention: A primer on grade retention research. The Reading Teacher, 56(7), 622-635.

Juel, C. (1988). Learning to read and write: a longitudinal study of 54 children from first through fourth grades. Journal of Educational Psychology, 80, 437- 444.

Knotek, S. (2003). Bias in problem solving and the social process of student study teams: A qualitative investigation. Journal of Special Education, 37(1), 2-14.

LaParo, K. & Pianta, R. (2000). Predicting children's competence in the early school years: a meta-analytic review. Review of Educational Research, 70(4), 443-484.

Lefebvre, P. & Merrigan, P. (1998). Family background, family income, maternal work and child development. Hull, Quebec: Applied Research Branch, Human Resources Development Canada.

Leppanen, U., Niemi, P., Aunola, K. & Nurmi, J. (2004). Development of reading skills among preschool and primary school pupils. Reading Research Quarterly, 39(1), 72-93.

Lipps, G. & Yiptong-Avila, J. (1999). From home to school - how Canadian children cope. Statistics Canada: Centre for Educational Statistics.

Lyon, G. R. (1996). Learning disabilities. The Future of Children: Special Education for Students with Disabilities, 6, 54- 76.

Lyon, G., Fletcher, J., Shaywitz, S., Shaywitz, B., Torgesen, J., Wood, F., Schulte, A., & Olson, R. (2001). Rethinking learning disabilities. In C. Finn, A. Rotherham & C. Hokanson (Eds.). Rethinking special education for a new century (pp. 259-287). Washington DC: The Thomas B. Fordham Found\ation.

Maxwell, K., McWilliam, R., Ault, M., & Schuster, J. (2001). Predictors of developmentally appropriate practices in kindergarten through third grade. Early Childhood Research Quarterly, 16, 431- 452.

Meisels, S. (1999). Assessing readiness. In R. Pianta & M. Cox (Eds.), The transition to kindergarten.(pp 39 - 66). Baltimore: Paul H. Brookes Publishing Company.

Meisels, S. & Piker, R. (2000). An analysis of early literacy assessments used for instruction. (Technical Report No. 3-002). Ann Arbor: University of Michigan, Center for the Improvement of Early Reading Instruction.

Minor, L., Onwuegbuzie, A., Witcher, A., & James, T. (2002). Preservice teachers' educational beliefs and their perceptions of characteristics of effective teachers. Journal of Educational Research, 96(2), 116-127.

Montague, M. & Rinaldi, C. (2001). Classroom dynamics and children at risk: A followup. Learning Disability Quarterly, 24, 75- 83.

National Education Goals Panel (1998). Principles and recommendations for early childhood assessments. Washington DC: National Education Goals Panel.

Neal, L. I., McCray, A. D., Webb-Johnson, G. & Bridgest, S. T. (2003). The effects of African-American movement styles on teachers' perceptions and reactions. Journal of Special Education, 37(1), 49- 57.

Notary-Syverson, A., O'Connor, R. & Vadasy, P. (1998). Ladders to literacy: A kindergarten activity book. Baltimore: Brookes.

O'Connor, R. (1999). Teachers learning 'Ladders to Literacy'. Learning Disabilities Research and Practice, 14, 203-214.

Paris, S. & Hoffman, J. (2004). Reading assessments in kindergarten through third grade: Findings from the Center for the Improvement of Early Reading Achievement. Elementary School Journal, 105(2), 199-217.

Phillips, L., Norris, S., Osmond, W. & Maynard, A. (2002. Relative reading achievement: A longitudinal study of 187 children from first through sixth grade. Journal of Educational Psychology, 94(1), 3-13.

Pianta, R., Steinberg, M. & Rollins, K. (1995). The first two years of school: Teacher-child relationships and deflections in children's classroom adjustment. Development and Psychopathology, 7, 295-312.

Popham, W. J. (2000). Modern educational measurement: Practical guidelines for educational leaders, (pp.145-165). Boston: Allyn and Bacon.

Psychological Corporation. (2002). Wechsler individual achievement test: Second edition. San Antonio, TX: Psychological Corporation.

Quay, L.C. & Steele, D.C. (1998). Predicting children's achievement from teacher judgements: An alternative to standardized testing. Early Education and Development, 9, 207-217.

Sattler, J. (1992). Assessment of Children. San Diego: Jerome M. Saltier Publisher, Inc.

Serwatka, T. S., Dove, T., & Hodge, W. (1986). Black studenls in special educalion: Issues and implications for community involvement. Negro Education Review, 37, 17-26.

Shaywitz, S. (2003). Overcoming dyslexia. New York: Alfred A. Knopf.

Shepard, L. (1994). The challenge of assessing young children appropriately. Phi Delta Kappa, 78(3), 206-216.

Smith, D. (2001). Wechsler individual achievement test. In J. Andrews, D. Saklofske & H. Janzen (Eds.). Handbook of psychoeducational assessment (pp. 169-191). San Diego: Academic Press.

Snow, C. E., Burns, M. S. & Griffin, P. (1998). Preventing reading difficulties in young children. Washington DC: National Academy Press.

Teisl, J., Mazzocco, M., & Myers, G. (2001). The utility of kindergarten teacher ratings for predicting low academic achievement in first grade. Journal of Learning Disabilities, 34(3), 286-293.

Torgesen, J. K. (2000). Individual differences in response to early intervention in reading: The lingering problem of treatment resisters. Learning Disabilities Research and Practice, 15, 55-64.

Torgesen, J.K. (2001). Intensive remedial instruclion for children with severe reading disabilities: Immediate and long term outcomes from two instructional approaches. Journal of Learning Disabilities, 34.

Vadasy, P. (2000). Sound partners: implementation manual for supervisors. Seattle, WA: Washington Research Institute.

Vernon-Feagans, L., Hammer, C., Miccio, A., & Manlove, E. (2003). Early language and literacy skills in low-income African-American and Hispanic children. In S. Neuman & D. Dickinson (Eds.). Handbook of early literacy research (pp 192-210). New York: Guilford Press.

Volpe, R. & DuPaul, G. (2001). Assessment with brief behavior rating scales. In J. Andrews, D. Saklofske & H. Janzen (Eds.). Handbook of psychoeducational assessment (pp. 357-384). San Diego: Academic Press

Willms, J. D. (2002). The prevalence of vulnerable children. In J. D. Willms (Ed.), Vulnerable children. Edmonton, Alberta: University of Alberta Press.

Winograd, P., Flores-Duenas, L., & Arrington, H. (2003). Best practices in literacy assessment. In L. Morrow, L. Gambrell, & M. Pressley (Eds.). Best practices in literacy instruction: Second edition. New York: Guilford Press.

Wolf, R. M. (1997). Rating scales. In J. P. Keeves (Ed.), Educational research, methodology and measurement: an international handbook, 2nd ed. (pp. 958-965). Adelaide: Pergamon.

J. F. BESWICK, J. DOUGLAS WILLMS & E. A. SLOAT

Canadian Research Institute for Social Policy

University of New Brunswick

Suite 300, Keirstead Hall

Fredericton, New Brunswick

Canada E3B 5A3

Copyright Project Innovation Fall 2005


Source: Education

More News in this Category


Related Articles



Rate this article:
1/52/53/54/55/5

User Comments (0)

Comment on this article

Your Name
Text from the image
Comment
max 1200 chars
* All fields are required