Tutor LightResources English Resources

The Challenge For Aspiring Leaders Of High-stakes Testing With All Year Groups, Arising From Educational Reforms

An assessment of the current state of education and the pressures imposed upon teachers and students

Date : 23/09/2017

Author Information


Uploaded by : Rock
Uploaded on : 23/09/2017
Subject : English


Educational changes are rife in the contemporary political climate. From the introduction of a Labour academy policy designed to improve “educational outcomes in deprived areas” (Machin and Vernoit, 2010, p.19) and its extension under the coalition government encouraging schools to electively convert themselves, to the Conservative’s radical National Curriculum overhaul, removing assessment levels whilst maintaining “there needs to be more assessment” (Gibb, 2015a), aspiring educational leaders should be enthused by the new resultant possibilities, but are becoming trapped in what I see as an ever-more unforgiving system of accountability.

Accountability which, as Acquah (2013) explains, makes “extensive use of examination data as the mechanism through which schools are held to account” (p.1). Before the instigation of a National Curriculum, teaching within Britain was said to “not provide consistent quality of education” (HoC, 2008, p.10). Subsequently, national testing provided a standardised validation of attainment, utilising performance indicators to drive up standards (ibid.). The HoC (2008) found “educators accept that accountability of schools is a necessary feature of a modern education system and that national testing has an important part to play” (p.12): testing is a permanent fixture of the curriculum, in whatever guise it manifests. Yet, its true purpose, its associated measures and targets, construct a very real and troubling challenge.

Gove (2009) explained each government’s intention as “the principal goal of academic achievement” (pp.19-20), later regarding academisation and assessment reform, or “Life without Levels”, as providing greater autonomy for schools to realise that achievement. Gibb (2015b) sought change because “the orthodoxy that governs how schools are run and how lessons are taught, has not been good enough”. Their changes create a unique challenge for contemporary aspiring leaders because they constitute a triumvirate of educational reforms which conspire to teach, per Hutchings (2017), “shallow learning for” tests (p.5) in a “narrower curriculum” (p.40).

Context and Methodology

I have been working as a Secondary English teacher in an academy converted, in 2013, under the government’s sponsorship scheme. My school became part of a Multi-Academy Trust (MAT) – which manages 60 primaries, secondaries and independent schools – because it was considered a failing school. The MAT centralised several policies, including branding, financial management, and human resources as well as establishing subject-policy teams, providing guidance and training on examination preparation, internal moderation and our assessment models, all in response to the government’s abolition of levels.

I will identify how the three reforms of academisation, lapsed-levels and accountability have engineered a new bureaucratic management model which poses a significant threat to the autonomy of aspiring educational leaders, despite claims to the contrary. These reforms will be assessed through the lens of my own experience in a MAT and the pressures I have faced to engage in what Acquah (2015) observes is “strategic behaviour” (p.11), so that the “high stakes testing [of all year groups] results in an improvement in test scores because [I have focussed my…] teaching very closely on the test” (Hutchings, 2017, p.4).

I will begin with an assumed basis for the three reforms, in education’s globalisation, following which I will discuss the challenges posed by each, before exploring their cumulative effect of creating an inappropriate environment of high-stakes testing for all students. Part of this report comprises data from an extended conversation with our subject policy-lead, explaining the rationale behind the MAT’s employment of high-stakes testing.

A Globalised Education System

Arguably, education’s globalisation and the government’s fixation on “International comparisons” (DfE, 2016, p.98) is the greatest driver of reform: the DfE (2016) considers British students “a long way behind [and…] outstripped” by other nations. The OECD (2017) established the PISA test to “evaluate education systems worldwide”, which, Crehan (2016) reports, became “an excuse for reform by governments around the world” (p.6). Currently, 72 countries take part consecutive British governments have sought to adopt policies from the most efficient educational systems, with Gibb (2015a), presently, fixated on urgency: “fast-improving countries around the world do not use levels”.

Per PISA data, British teachers have been locked in an ineffective system. Crehan (2016) cites “17 per cent of British 15–16-year-olds […] did not attain the [PISA 2012] baseline proficiency level in reading (Level 2)” (p.10). The OECD (2011) identify this failure as highlighting that these students “lack the essential skills needed to participate effectively and productively in society” (p.11). Gove (2010) understood the ramifications, claiming “thousands of children […] leave school unable to compose a proper sentence, ignorant of basic grammar, incapable of writing a clear and accurate letter”. The old-style management of schools by Local Education Authorities and the pre-existing system of levels, for example, have both been deemed as hallmarks of this ineffectiveness.

The initial challenge posed by MATs

The Conservative’s (2015) pledged to “turn every failing and coasting secondary school into an academy” (p.34) wherever a school was “judged by Ofsted to be requiring improvement” (ibid.). Academisation should, as Gibb (2015b) affirms, create “innovation through autonomy”, enabling a headteacher to regain command of both administrative issues and curriculum management therein allowing aspiring leaders to engage with innovation on their own professional terms. Autonomy has been a prime motivator in the extension of the academy programme its guiding principle alleges that autonomous schools make use of “freedoms more radically than any other school” (Gibb, 2016) to innovate, experiment and “improve their performance” (NFER, 2015, p.3).

The academisation reforms have sought to strip away bureaucracy, freeing schools from the political shadow of the overarching formal management model (FMM) prescribed by former governments. Bush (2011) considers the FMM as respecting the “authority legitimized by [the…] formal positions” (p.40) held by headteachers. All teachers operate within their headteacher’s strict boundaries, but before the Conservative’s most recent reforms, that FMM existed as a descendent hierarchy, reaching down from government, to local authority, to school. To sever that hierarchy’s head, encouraging autonomy and experimentation, Gibb (2017a), for example, “scrapped 20,000 pages of unnecessary central guidance”.

The role of the MAT obscures that vision, particularly for the aspiring leader. Of the two types of academisation available, sponsorship, to which my school is subject, is the least autonomous: the school’s failing status situates it within the remit of the Conservative’s (2015) enforcement plans (p.34). The sponsored academy’s autonomy becomes subsumed by their MAT’s – the “academy sponsor who will take them over” (NSN, 2015, p.3) – the jurisdiction of much of the MAT’s policies and curriculum becomes those of the school’s.

In my context, our MAT emphasised cooperation: shared innovation and the development of best practices informed by research. Subject heads from schools across the MAT met periodically to discuss approaches, but they also took instruction from their subject-policy teams. These provided guidance, instruction and training in preparation for GCSE examinations. They also provided, as per Mansell (2016), a much reduced “autonomy over all aspects of policy at the school level” (p.9) not freedom from a layer of bureaucracy then, but a replacement with bureaucracy in a different form.

The MAT creates a stretched FMM that leaves much local leadership to the headteacher, but holds greater control over, for example, curricula aspects. My contextual example, above, suggests Bush’s (2011) “’restricted’ collegiality” model (p.72) between subject heads and their policy team. This model assumes “that organizations determine policy and make decisions through a process of discussion leading to consensus” (ibid.). Taking the MAT as the organization however, it is clear, despite the MAT’s alleged shared perspective, issues of accountability diminish this model to a “centralized agenda” (ibid.) which, bureaucratically “symbolizes the dominance of the formal organizational structure” (Bush, 2011, p.57).

The challenge posed by “life without levels”

Assessment levels provided what Kempa and L’Odiaga (1984) described as criterion-referenced assessment: a “measurement of students’ performances against […] criteria specifying educational attainments and ability levels” (p.56). Mapped against national, expected standards for each key stage (KS), levels alleged an understanding of student progression. Many leaders maintained this system’s unreliability, with the NFER (2013a) questioning the reliability and consistency of “judgements confirming that a pupil is working at a certain level” (p.2). The DfE (2013) finally conceded, “this system is complicated and difficult to understand”.

The government’s conversion championed mastery-led curriculums, with Gibb (2015a) certain this would “ensure deep, secure knowledge and understanding”. Gibb (2015a) correctly identified the old challenge to leaders that levels had posed, insisting they were “pushing pupils on to new material […] when they had […] serious gaps in their knowledge”. Chasms of understanding divided classrooms and, Torrance (2009) recounts, aspiring leaders found it “narrowly focused on a small number of tests in a small number of subjects” (p.220). Shallow in knowledge and future application, the old levels failed to match the new curriculum.

Fullan (2003) recognised the limitations of prescribed levels, arguing that to reach “deeper developments we need creative energies and ownership of the reforms by the teaching force” (p.5). Gibb’s (2015a) abolition of levels symbolises this realisation: another “professional autonomy for schools”, a desire many educational leaders have long held since the 1970s saw them constrained when “an excess of teacher autonomy had led to [an unacceptably declining…] educational quality” (Hoyle and Wallace, 2006, p.169).

Nationally, the autonomy of abolished-levels led to urgent and ad-hoc curriculum changes during the academic year. Gibb (2015a) relished “removing levels [and requiring] schools to develop their own assessment schemes”. The expectation that leaders could create or adopt a new and untested assessment system to drive every facet of this new curriculum, represents a serious risk to their professional integrity. Hoyle and Wallace (2006) warn that “professionals are found wanting when the general public wants them to demonstrate command of certain knowledge” (p.182), and the public has: parents expected us to have a clearly defined picture of how we will assess, what that will look like, and what the outcomes will show.

We have largely been unable to answer their questions. Still in its infancy, life without levels remains an unquantifiable frontier with, two years into its implementation, struggling teachers continuing to guess at what it should look like and how it should be assessed.

The DfE (2013) had questioned every school’s ability with its ‘universal truth’ that “outstanding schools […] have an opportunity to take the lead in developing and sharing curriculum and assessment systems which meet the needs of their pupils”. However, that challenge did not acknowledge the difficult requirement that school leaders need to create a consistent flow of assessment that would not only bridge the gap between KS2 and 4, but reveal student progression trajectories.

In my context, our MAT assumed most of the challenge presented by this change. The subject policy-teams took responsibility for the development: a benefit of our FMM. Bush (2011) remarks that “managerial decisions are made [within this model] through a rational process” (p.42), and although the hierarchical organisation “represents a means of control” (ibid.), our subject-policy teams took a balanced view of assessment. Our subject-lead reported that the MAT considered “what [students] should know” at each stage of schooling. In English, they regarded the primacy of the skills they wanted students to attain. Contrasting with the government’s desire for a knowledge-favouring cognitive approach, the MAT adopted Key Performance Indicators (KPIs), numbering the skills, for example in English, such as: 1) being “able to make detailed inferences”, 4) supporting “ideas with a range of appropriate evidence”, or 13b) accurately using “embedded clauses” (Griffin, 2016, p.3).

Our subject-lead reported that KPIs were “partially informed by the National Curriculum”, but that they did not – allegedly – fall into the same hole as levels, because the “weaknesses that would flatten a student’s level, now reveal where that weakness lies”. The assessment of students against whether these KPIs are not-met, partially-met, met or exceeded, provides a clearer picture of formative feedback that helps guide leaders’ planning.

Our replacement system was prescribed through the bureaucratic pyramid, which Bush (2011) identifies as a “hierarchical authority structure”, despite its ironic implementation forcing leaders to bear its uncertainty ourselves. Whether we find that this system works or not, our bureaucratic model, disregarding the government’s autonomous expectations, prevents aspiring leaders from trialling alternatives. The innovation cannot be our own.

However, what remains is the legacy of consistency in marking, a challenge which Bew (2011) argues is “the most significant […] criticism” (p.60) levelled at examination. NFER (2013b) reports the “lowest levels of agreement [between teacher marks] tended to be found in examinations that placed most dependence on essay-type questions” (p.23), such as English, while Meadows and Billington (2005) list several potential biases at play between markers, from the influences of background and personal traits to ideologies surrounding what best reflects subject expectations (pp.30-35). That challenge remains, specifically because the FMM in play between MAT and academy places an “emphasis on the accountability of the organization to its sponsoring body” (Bush, 2011, p.42), to the MAT.

The challenge posed by accountability

The historical and political evolution of the British educational system has, since the 1980s, sought “to use private-sector and market incentives in state education” (Mattei, 2012, p.248), drawing language “from the business world” (Pring, 2012, p.748), such as market incentives: pay awards based on specified objectives. One objective, from my context, includes: “Improving the impact of teaching […] including examination outcomes” (UL, 2014, p.3). Teachers are judged and scrutinised against examination results and effectively remunerated or punished because of their students’ achievements or failures.

Acquah (2013) argues that this “market accountability is […] a prominent part of the Government’s plans for the education system” (p.5): students and parents become consumers, provided data pertinent to informing school application decisions. This “market operates in the shadow of a hierarchy” (ibid.) however, which represents the latter of two rising standards, namely: accountability.

The National Curriculum has held schools accountable for student progress through the “measurement of pupils’ progress at regular intervals” (HoC, 2008, p.9). Intended for the end of each KS, national testing measures student progress against assumed targets, with accountability growing from a desire for standardisation and the assertion that “parents wanted access to test result data” (ibid.).

Gove’s (2012) curriculum, with “regular, demanding, rigorous examinations”, is essential to both the market and the future control provided to MATs. With Leo, Galloway and Hearne’s (2010) acknowledgement that the government placed pressure on academies “to achieve improvement quickly” (p.35), and “to demonstrate innovation” and success (p.128), the burden to ensure success across their portfolio of academies continues to increase. Accountability, according to the HoC (2017), has revealed “significant number[s of MATs…] are failing to improve year on year” (p.26) alongside punitive measures that challenge MATs to work harder, by outlining how consistently failing academies may be transferred to a new sponsor. The challenge for aspiring leaders to ensure achievement is significant, lest the MAT lose a school and its associated funding, and a teacher’s pay rise or position be jeopardised.

The DfE’s (2016b) new Attainment 8 measures were introduced to better judge schools by matching students with the “achievements of other pupils with the same prior attainment” (p.5). Floor standards (the “minimum standard for pupil achievement” (p.8)), now calculate a school’s student-average across subjects. The average differences between GCSE grades and targets provide the ultimate accountability, where a “school may come under scrutiny through [Ofsted] inspection” (ibid.), when its average difference between grades and targets is below -0.5.

How do leaders ensure students achieve their targets and thereby, across the averaging of students, avoid triggering an Ofsted inspection? This challenge arises from the fact, as Smith (2016) cites, that students arriving with high KS2 scores automatically jeopardise a school because they lack the probability of achieving their targets at GCSE that a low-scoring student may more-easily achieve (p.8).

The pressure has always been on teachers to ensure student achievement. Similarly, KS2 scores have always been “[in-]sufficiently reliable [indicators…] to assess and predict the achievement of individual pupils over time” (Doyle and Godfrey, 2005, p.42). Wiliam (2001) had previously considered the unreliability of national testing as a measurement of any individual pupil. Nonetheless, with Attainment 8’s accountability measures, the challenge to achieve on the targets projected from those unreliable KS2 scores has been tightened.

The cumulative challenge of these reforms: high-stakes testing

In the Victorian era, White (1886) argued that high-stakes testing “perverted the best efforts of teachers, [narrowed…] their instruction [and…] permitted a mechanical method of school supervision” (pp.199-200). Wiggins (2016) describes the modern system as “essentially the same” with Tidd (2017) believing the “assessment edifice is crumbling [and…] rotten at its very core”.

Hutchings (2017) blames high-stakes for generating “pressure to ‘deliver’ […] high scores in tests […] and a management style involving target-setting and close oversight of practice” (p.10). This is increasingly crucial for a MAT in its desire to demonstrate innovative success and maintain control of its portfolio of academies testing therefore is not going away, though I question its validity at KS3.

Gove and Gibb’s reforms, seeking rigour and regular examinations, originates from Dunlosky et al (2013), who assert: “testing… improves learning” (p.29). Gove (2014), anxious that “performance dips and students suffer” during KS3, favours the “Common Entrance […] exams designed for 13-year-olds”. His opinion set a precedent that has been expounded by MATs in ignorance of what Christodoulou (2016) saw as Ofsted’s and the government’s distortion of assessment for learning (AfL) (p.22).

Black and Wiliam (2001) described AfL’s feedback provision, modifying “teaching and learning activities […] to adapt the teaching work to meet [pupil] needs” (p.2). As the antithesis of the government’s accountability, it helps students rather than measuring them. AfL respects the development of student skills through the deliberate employment of strategies and methods. It is not, as pursued by my MAT, the employment of high-stakes testing for all, seeking to weigh them and their teachers.

Our subject-lead claimed our KS3 “KPIs were [considered] too easy” secondly, KPI-focussed assessments were “insufficient come the end of the academic year”. Assisted by its examination board, AQA, the MAT prepared assessments for all its KS3 and 4 students not Common Entrance examinations, but the GCSE mark scheme. The MAT shares Meadows and Billington’s (2005) view that “tightly defined mark schemes and standardisation of examiners removes” marker inconsistency (p. 30) allegedly, resultant marks will more accurately exhibit each student’s current trajectory, revealing whether they are on target, or not. Though an undeniable benefit to teachers, younger students are “being required to learn things for which they are not ready” (Hutchings, 2017, p.5): AQA (2016) provided a full suite of KS3 test packs, differentiating them from final GCSE examination papers only by extract reading ages all five questions – pitched at sixteen-year-olds – are the GCSE-level questions.

As a sponsor to 60 schools, it is within the MAT’s interest to work with the examination board to prepare its students. Unfortunately, the examination board shows no interest in developing age-appropriate examinations. The bureaucracy management model that would otherwise govern internal school practices, has bifurcated, opening out but simultaneously stripping away the voice of aspiring leaders: leaders work for their headteacher, but on behalf of their MAT’s subject-policy teams, who govern and maintain policies externally whilst seemingly beholden to the examination board. It is important to understand that accountability has steered the bureaucratic system in this way, as bureaucracy typically “emphasizes the goal orientation” (Bush, 2011, p.57) of the MAT, which is: assured success for all.

However, significant pressure has been applied to my department, with all teachers expected to prepare students for GCSE-level examinations. Although our subject-lead emphasised we should not “just be teaching to the test”, claiming that a mere two weeks of preparation was appropriate, every marked task undertaken by all ages were GCSE questions two terms were given over to drilling the year-ten language course was solely drilling. And, while our subject-lead asserted that “KS3 should be a text-based curriculum”, he implicitly opposed Ofsted’s (2012) findings which state there is “inappropriate attention at too early a stage to the skills needed for external tests and examinations” (p.15). The troubling reality he is avoiding is, as Ofsted (2012) find, “GCSE skills of analysis [are taught] at the expense of personal response” (p.16). Consequently, aspiring leaders fail their students if they, as I did, teach to the test, despite being coerced to do so by their fear of accountability: in my context, the results will judge those children and their teachers.

Gove’s promotion of high-stakes testing, despite Gibb’s (2015) avowal that testing should be “not less - but not centrally determined and not high stakes”, conflicts with the further claims of Dunlosky et al (2013), who advocate practice testing (low- and no-stakes), because it has been shown to “enhance… learning and retention” (p.29). High-stakes does not have anywhere near the learning benefits of low-stakes, as it can often involve “a dull drilling of facts” (Willingham, 2012) and, as more importantly identified by Christodoulou (2016), it is not a reliable, and therefore not a valid, way to generate “formative inference about a pupil’s understanding” (p.71).

The MAT expected teachers to use high-stakes outcomes for both rank ordering (rearranging classes from best to worst scores) and individualised feedback. Koretz (2008) explains that “one can more easily ascertain which specific skills contribute to students’ weaknesses” (p.49) if they are broken down into small pieces in a way not possible with high-stakes, because they lack sensitivity (Christodoulou, 2016, p.118): students’ cognitive function is impaired, with “superficial and shallow progress […] rewarded” (ibid.), because they “suffer stress and worry” (Bousted, cited in Davies, 2017).

Aspiring leaders must recognise that any internal testing must reveal what the students do and do not know what they can and cannot do. The effectiveness of providing valuable support must limit the application of marks, percentages or grades. Whilst neither the MAT nor the examination board have any apparent incentive for a deep curriculum, aspiring leaders must. We must teach what Christodoulou (2016) calls “domain-specific” content (p.37) to avoid the flattening of our curriculums, simply because the stakes are unbelievably high, and forgo our own fears of penalty for failure.


Day et al (2011) regard “professional values, ethics and educational ideals” (p.64) as intrinsic features of effective leaders, reflected by the “organizational values and practices of more effective and improving schools” (ibid.). Autonomy remains vital in ensuring effective leaders are the fundamental component in innovation. Autonomy in the application of KPIs in my school was not, however, possible. Autonomy in the drive to prepare all students for GCSE examination was not possible. Autonomy in the exploration of teaching and learning opportunities were not possible.

These findings remain deeply troubling: the bureaucracy of government should be giving way so that aspiring academy leaders can explore activities which “will lead to long-term learning” (Christodoulou, 2016, p.129), not have them stolen away by their MATs. Aspiring leaders, like myself, have found themselves increasingly pressured to counsel their students through examinations, regardless of age or ability, and thenceforth teach to the test. In my context that has meant all marked tasks given over to examination-style questions rather than age-appropriateness designed to coax student learning. We must understand that this will only “lead to superficial improvements” (ibid.) and strive instead to develop pedagogical means for deep learning in deep curriculums.

Gibb’s (2017b) desire for a “broad knowledge-rich, academic and high-status curriculum [giving…] pupils a deeper appreciation of their culture” has been endangered by the government’s imprudence, in enabling what Adams and Mansell’s (2014) identify as “a regulatory black hole [where…] the government lacks powers to intervene in [an academy’s…] running”. Aspiring leaders are discovering their former curriculum freedoms are diminished their desire for depth curtailed. Wilkins (2015) declares that “there is a real and present danger of a regulatory gap” (p.187): we are no longer bound by the statutes of a compulsory National Curriculum but that enforced by our MAT. Aspiring leaders continue, therefore, to be challenged by a top-down management structure that offers an occasional ear, but overrules with policies of short-term self-interest policies ignorant of best practices and cognitive science.

Our subject-lead stated, “We are in a continued process of refinement” with a system freed from the difficulties of levels, running a separate system of measurement, such as KPIs, only perpetuates complexity. It is hard to distinguish student progression, with no sense of whether the challenges are equal across subjects. Our subject-lead agreed that the right way to know if students are on track is to engage with them in lessons, but, simply put, the “test at GCSE is a different measurement”.

I conceded that running two systems was ineffective employing a single mark scheme increases teacher marking and feedback proficiency. However, I disagreed that presenting twelve-year-olds with a GCSE paper was in anyway appropriate, or that all students should have to sit the paper because, as our subject-lead argued, “some are achieving way up there”!

The government’s hasty reforms, emphasising its perpetual appreciation for continual educational improvement, has bent the British education system into a prime example of Hargreave’s (1989) metaphor that assessment is “the tail that wags the curriculum dog” (p.12).

I feel it is neither an effective use of time, curriculum or stressor for KS3: just because some are capable, does not mean it should be employed against everyone. I therefore assert that these cumulative reforms in honour of autonomy have come both too fast and without adequate precautions. Not least because their origination failed to acknowledge the many differences between International education systems, from culture to student aspirations. The fact, as evidenced, is that although both government and Ofsted agree on precepts of what constitutes appropriate curricula, questioning and assessment, their vision is an insufficient defence when a MAT can impose inappropriate testing measures against its students without the safeguards of that old bureaucratic system.

Wiliam (2011) demonstrates that it “takes years for even the most capable [leaders…] to develop” (p.120) a comprehensive understanding of the small steps essential to the learning process. The overarching challenge for aspiring leaders who are still developing their pedagogy and awareness of these small steps, even in the confines of an accountability-centric MAT, is to grow student resilience in the development of specific skills, to explore and build knowledge in domain-specific contexts, and to not allow examinations to overshadow the KS3 curriculum.

Hutching’s (2017) realisation that “high stakes testing and accountability measures discourage creative teaching” (p.46) must remain foremost in our minds because, ultimately, our only challenge should remain: how do we engage our students so that they ask the questions and they make the choices and they become future champions of education?

This resource was uploaded by: Rock