Monday 17 January 2022

Calibrating Assessments

 Pupils have now taken our new assessments in KS3 in at least one unit. In response to the Ofsted Research Review, we have concentrated on making sure that they test the pupils on what they have been learning. This sounds obvious, but actually I think that our old assessments (and GCSE Listening/Reading) were actually testing pupils on their literacy, their confidence, their ingenuity and their cultural knowledge. We ended up labelling the pupils with strong literacy as "good at languages". And perniciously, labelled pupils with weaker literacy as "not doing well in languages".

We also made sure we used the test as an opportunity to get some feedback from the pupils on their learning, and to engage them in thinking about their progress.

The tests were designed to give feedback to the pupils about their progress on an intellectual level, but also on an affective level: helping them see the progress they are making and feeling positive about their learning. 

Rather than use tests to get a broad spread of marks in order to categorise pupils, we were looking for evidence that most pupils were successfully learning.


Given these changes, how to calibrate the test? How to decide what score is acceptable, what is great, and what requires action to be taken?

Further questions: If this is based on what pupils have been learning, should the expectation be that all pupils should get all of it right? Should the expectation be the same for all? Should we grade pupils at all, if the idea was to make them feel successful? What marks are reported, and how are these explained?

The question of how marks are reported is critical. Parents and pupils are told if the pupil is "on track" / "above" / "below expected". Where "on track" doesn't mean compared to the national average or compared to other pupils. It means "on track" for a pupil of similar prior attainment. So based on broad brush groupings of pupils based on KS2 results (Upper, Middle, Lower) which act as a proxy for predicted GCSE grades. If this is the information parents are receiving, then our tests need to provide us with evidence towards making these judgements.

We did the first round of tests without establishing thresholds. This meant we were able to then look at the data the tests gave us alongside other information. For a start, most pupils, as hoped, scored relatively highly, reflecting the fact that the tests were meant to be a recognition of what they had been studying. Secondly, pupils of similar prior attainment did seem to score within a similar range of marks. Looking at this range of marks, we were able to assign numerical values to what could be considered "on track" for pupils of different starting points. The fact that these were very similar for Year 7, Year 8 and Year 9 was also reassuring.

Where pupils fell above or below the mark of most other pupils with a similar KS2 profile, we were able to look at how this matched up with other information the teacher had: performance in Speaking and Writing tests, participation in class, absence...

The tests and the tentative thresholds did give us information that helped towards a bigger picture of individual pupils' performance.

Of course, this helps us judge the pupils' performance. But we also have to judge if as a department our planning and teaching is effective. As a school, we are given a data breakdown of how different departments compare. We have the percentage of pupils "on track", "above" and "below" in all subjects in each year group. And the same data by subgroups, such as gender, PPG, EAL or SEND. This enabled us very quickly to see that were in line with other departments and where we varied from the mean, it was towards having slightly more pupils "on track" than in other departments. As the new tests were designed to be a positive experience, this doesn't seem to be a problem. Although we will have to continue to be vigilant to make sure we are picking up where pupils might be under-achieving.

Next time, we are taking a bit of a leap in the dark and seeing if we can keep the same numerical threshold scores for the next unit. The tests will be different, but of a similar format. It will give us a starting point for similar discussions of how pupils' scores in different tests match up with how they are doing in class and help the teacher build up a picture of evidence.






No comments:

Post a Comment