The Tarot of Lesson Observation
The new Ofsted proposals being discussed at the moment are concerning, particularly the separation of curriculum and instruction. The particular problems that any inspectorate faces in shifting away from single word judgements are pretty well established in the literature going back 50 years but it seems little has been learned. It's clear that there needs to be a system of accountability but we are in danger of going back 15-20 years, when classroom observations were essentially tarot card reading with key judgements being made based on the flimsiest of evidence. Here are some key issues to consider:
Context Blindness
Firstly, classroom observation scores are strongly influenced by the types of students teachers work with, with those teaching higher-achieving students often receiving better ratings. This is especially true for subject specialists compared to generalists. Teachers are often assigned to classes in ways that favour this pattern, making observation scores less fair. Scores tend to reflect how well a teacher manages the classroom or creates a positive environment, but they are less impacted by teaching strategies which lead to long-term gains. These observation scores also vary a lot from year to year because they depend so much on the class dynamics.
Rater expertise
Evaluators often lack the necessary subject-matter knowledge to make informed judgments about discipline-specific instructional practices. Even within a discipline there is often a lack of knowledge about what is being taught. An ex-Maths teacher doesn't always have the content knowledge to truly evaluate every Maths lesson. This gap in expertise can lead to superficial evaluations that miss the often unseen and covert aspects of effective instruction.
Generic observation instruments
The problem of generic observation frameworks which are designed to be broadly applicable across subjects, actually fail to capture the unique pedagogical practices required for different disciplines. Also, decoupling curriculum from instruction is a major step backward. A very good example of this from Christine Counsell is senior leaders using verbs like “describe,” “explain,” and “evaluate” as some kind of indicator of effective learning processes, but which cause a disconnect with the disciplinary focus of causal explanations in history for example. This disconnect reflects a broader clash in education between subject-specific curricula and generic aims focused on perceived utility.
Perverse incentives
Lessons that were considered 'outstanding' 10-15 years ago were often all-singing/all-dancing Cirque De Soleil style lessons with students running around the room and writing on posters on the wall. These lessons were rolled out for inspections, never to be seen again. We now know that engagement doesn't always mean learning and that being cognitively active doesn't have to mean being physically active. The idea that learning is an observable phenomenon has some evidence such as precision teaching and the work of Ogden Lindsley but this is a very specific methodology which is rare in most classrooms. In my experience, most of the time when observers (inspectors/ leadership) have cited evidence for learning, they have cited performance not learning.
So what is a better way of thinking seriously about lesson observations and the judgement of instructional quality?
In our book 'How Teaching Happens', Paul A. Kirschner, Dr. Jim Heal and I wanted to consider this problem of teacher effectiveness so we focussed on David Berliner's body of work and his seminal paper 'Learning About and Learning From Expert Teachers'. In terms of judging teacher quality, let's start off by what we mean by teacher expertise. If we want to identify good teaching then we need to be explicitly about that that actually means. Here, Glaser is useful:
▶ Expertise is domain specific, takes countless hours to achieve, and continues to develop throughout one’s career.
▶ Expertise development is not linear; it makes jumps and stagnates on plateaus.
▶ Experts’ knowledge is better structured than novices’.
▶ Experts’ representations of problems are deeper and richer than novices’.
▶ Experts recognise meaningful patterns faster than novices do.
▶ Experts are more flexible, more opportunistic planners, and can change representations faster than novices.
▶ While experts may start solving a problem slower than a novice, they’re – in the long run – faster problem solvers.
▶ Expert have automatised many behaviours allowing easier and quicker processing of more complex information.
As stated above, most of these are unobservable, hard to infer and would be very difficult to identify by observing someone teach for 30 mins but are good teachers born not made? In making judgements about teacher quality, are we simply identifying talent rather than developing expertise?
In his article, Berliner addresses the role of talent which he defines as “individual differences in abilities and skills that appear to be innate or ‘hardwired’ into individuals.” He concludes that while the debate over the significance of talent may be theoretically interesting, it has little practical relevance when it comes to pedagogical expertise. For Berliner, teaching talent is better understood as: “an extremely complicated interaction of many human characteristics [including] sociability, persuasiveness, trustworthiness, nurturent [sic] style, ability to provide logical and coherent stories and explanations, ability to do more than one thing at a time, physical stamina, the chance to ‘play teacher’ with a younger sibling or playmate, and so forth” (p. 465).
Another reason the discussion of talent with respect to teaching might be moot, according to Berliner, and which echoes the points made earlier is because context is so powerful. Teachers are more or less productive depending on, for example, workplace conditions, policy of principals, superintendents, and school boards, community expectations of the community, etc. As Rich (1993) determined, expertise isn’t simply a characteristic of a person, but rather an interaction of the person and the environment in which they find themselves.
In terms of practical paths forward, I think this paper by Strong et al is an important one and among the advice it offers is:
Recommended by LinkedIn
• Tools should focus on instructional quality within specific subject areas rather than relying on generic observation frameworks.
• Emphasize how teachers connect lessons to prior knowledge, adapt instruction for diverse learners, and engage students in higher-order thinking.
2. Evidence-Based Evaluation Frameworks:
• Develop observation tools that are validated by their ability to predict student achievement.
• Use frameworks that explicitly measure instructional practices linked to learning outcomes.
3. Multiple Observation Perspectives:
• Include evaluations from multiple observers to reduce individual biases.
• Use video recordings of lessons to allow for repeated and collaborative evaluations.
4. Cognitive Bias Mitigation:
• Incorporate training programs for evaluators to address cognitive biases, such as halo effects or confirmation bias.
• Introduce structured rubrics with clear, objective criteria for scoring teacher performance.
5. Integrated Data Sources:
• Combine observational data with other indicators of teacher effectiveness, such as student progress data, teacher reflections, and peer reviews.
6. Longer-Term Evaluations:
• Conduct multiple observations across different contexts and times to capture a more reliable picture of teacher practices.
7. Automated and Technology-Enhanced Tools:
• Explore the use of AI and machine learning to analyze instructional practices systematically, potentially reducing human bias in observation data. (This last point is a long-shot I think but the development of LLMs in the last 5 years has surprised even me and I would not rule this out.)
When I think about lesson observations, I always return to Rob Coe's work from 10 years ago which is still hugely important and I cannot improve upon his advice:
Also I would recommend Craig Barton and Adam Boxer's excellent frameworks.
So there is a lot more to be said about this but this is my initial reaction to the Ofsted news and I think we need to think a lot more critically about the thorny problem of observing teacher quality lest we end up going back to the dark days of fortune telling/tarot card reading as lesson observation which was so damaging.
IGCSE ICSE CBSE Physics Teacher | Educator | Multimodal Learning | Integrating Technology in Learning
1moA great read! Thanks for sharing.
Assistant Principal & English Subject Lead @ Avanti Grange Secondary School | Leading Teaching & Learning | Championing Innovation, Creativity & Student Success
2moLove this - especially action point number 7. For any institutions looking to progress forward into the 21st century and take teaching, learning and their curriculums with them - I would argue point number 7 isn’t a long shot but an absolute necessity!
Head of Mathematics at Erudito licėjus
2moAre video recordings of classroom lessons legal?
Curriculum Consultant, Educational Researcher, HoD-C
2mothis was a great read - thank you
Human
2moLove this article, Carl Hendrick!!! And super timely as I have been reflecting on the following the past couple of days: As a special needs teacher who worked in "inclusion" models and as a school administrator, head, and founder...I have had the honor of observing A LOT of teachers teacher - both in person and online - hours upon hours. I do feel this has helped me immeasurably in the development of my own teaching practice. However, I don't feel that this has helped me coach my teaching teams. I am interested in any models that involve the teachers as the observers and more colleague created teams for teaching improvement...is there research on this aspect I could tap into?