Skip to main content

A note on interpreting the figures in this report

The figures in this report show the average attainment of the learners by year group, across a range of year groups and academic years. The vertical axis shows year groups. For the ethnicity section, the vertical axis represents different academic years. Colours are used to differentiate each academic year. The horizontal axis shows the average attainment achieved by all learners (or in some cases, all learners in a specific demographic group, i.e. all females) in that year group in each academic year, contextualised in units of months.  

The attainment metric underlying the personalised assessments (PAs) is ‘IRT scores’ – Item Response Theory scores which are explained in detail in the technical information section below. In order to convert attainment into more readily understandable units of months, we used an approach based on the level of attainment learners in each year group achieved in a ‘reference year’.

For each year group, we considered the level of attainment achieved by learners in that year group, and the year group above and below, in the reference year.  We then determined how much difference there was between the attainment of learners in these year groups in IRT score units using a statistical model.  This model is explained in more detail in the technical information section below, but we describe the approach in general terms here. 

Because we know that these learners are 1 or 2 year groups apart, we can convert the difference in IRT scores into units that relate to how much longer the learners have been in school; 12 or 24 months.  An example of this is shown in the graph below for Year 3 for Numeracy (Procedural).

For this release, we continue to use 2022/23 as a consistent year of reference across all subjects, both for calculation of an average month’s progress and as a benchmark of the difference between years. This creates a common reference point to examine demographic differences by subject, an important factor when considering the pre/post-pandemic introduction of the different assessments.

Figure A1: Calculating attainment in months

Image

Description of Figure A1: A line chart showing an example of how to calculate average attainment in months. The data in this chart is for illustrative purposes only and does not relate to any actual data in the release.

In this example, we determine the average attainment of Year 2, Year 3, and Year 4 learners in our reference year (2022/23) via the dark blue points.  The difference between Year 2 and 4 learners’ attainment, as shown by the dark blue arrow, converts to 24 months (as the Year 4 learners have been in school 24 months longer than the Year 2s).

Having established this baseline of attainment in months, we can then use it to determine how many months lower or higher the attainment of learners in this year group was in other academic years. In the example above, the light blue point shows the average attainment of Year 3 learners in 2018/19. We work out the difference in attainment between this point and the equivalent point for the reference year (shown by the light blue arrow) and convert this difference into months by comparing its size to our 24-month difference (i.e. comparing the size of the light blue arrow with the size of the dark blue arrow). In this example, the light blue arrow is around 25% the size of the dark blue arrow, showing that Year 3s in 2018/19 have average attainment around 6 months higher than Year 3s in 2022/23.

When interpreting the figures in this report, it is vital to remember that younger learners make more progress, in absolute terms, than older learners.  In other words, between Year 2 and 3 the average learner differs more in what they know and can do, versus Year 8 and 9 where the average learners would be more similar in their skills and knowledge.  Because this report deals with months’ progress, the context that 12 months’ progress for younger learners equates to more absolute attainment change than for older learners is crucial to bear in mind.  In many figures in this report, there is a greater months’ difference between demographic groups for older learners, but it is important to remember that the underlying absolute attainment difference is not as pronounced as the months’ progress difference for younger learners shown in the figures.

In addition to showing the average attainment of the learners by year group, the figures in this report also show difference by sex and also by eligibility for free school meals (FSM). The example below shows how months’ progress difference is computed to compare females and males. This process is exactly the same as that used to calculate the difference by FSM eligibility in this release.

Figure A2: Example of how attainment differences by sex are worked out

Image

Description of Figure A2: A bar chart showing an example of how to calculate average attainment for males and females.

Average attainment is presented as months higher or lower than the attainment of the same year group in 2022/23. Green triangles represent the average attainment of males, and red circles the average attainment of females. The data in this chart is for illustrative purposes only and does not relate to any actual data in the release.

The chart has additional points overlaid to show where the average attainment of males and females would sit.  As is to be expected, these points sit either side of the overall average shown by the relevant bar, as there are roughly half females and half males in each year group.

The yellow arrow quantifies the difference in progress in months between males and females.  For example, for Year 3 in 2020/21, the difference is 4 months.  These differences, as represented by the yellow arrows, are what is plotted on Figure 5 in the statistical release to quantify the gap.

Updating the item difficulty scale

The National Personalised Assessments in Wales rely on a Rasch Model where item (assessment question) difficulties and learner abilities are placed on the same scale. Each learner’s ability estimates are based on the difficulty of items they get right or wrong.

From the 2023/24 academic year, updated item difficulty estimates were implemented in the PAs for the first time, which were calculated, or calibrated, based on item responses from learners in the 2021/22 academic year. This was done to ensure that item difficulties were representative and accurate as possible and used a common year baseline across all assessments. Prior to this point, items were calibrated at the point each subject was rolled out, which happened over the course of some years, some subjects before and some after the COVID-19 pandemic. 2021/22 was chosen as the first academic year following the COVID-19 pandemic to be unaffected by school lockdowns and also meant that the updated values would be based on responses from almost the entire population of learners in Wales, due to PAs being statutory assessments.

Because this report compares progress over the entire operation of the PAs, a consistent item difficulty and person ability scale is needed. In other words, for the academic years’ data preceding 2023/24 to be directly comparable with 2023/24 data, the preceding years’ item difficulty values needed to be recalculated so they were on the 2023/24 scale.

This was done using a technique called concurrent calibration, utilising the large overlap in items that appear in multiple academic years as an ‘anchor’ to equate the two scales in use.  This is accomplished by fixing the ‘anchor’ items’ difficulty values to their 2023/24 values when calibrating the data for previous academic years, provided they still fitted the Rasch Model appropriately.

Items that drifted in their item difficulty between 2023/24 and previous academic years were removed as anchors. This can happen, for example, when there are changes to the curriculum and items that assess a certain curriculum element are taught to a greater or lesser extent. Specifically, items that differed by more than 0.5 standard deviations (of the retrospectively recalculated item difficulty scale) from their 2023/24 value were removed from the analysis.

Once a stable set of retrospective item difficulties were re-calculated, then learner abilities could be retrospectively recalculated based on them for academic years previous to 2023/24.  This placed them on a comparable scale to the 2023/24 data and therefore rendered them valid for use in the longitudinal analysis in this report.

See tables 1 to 12 in accompanying spreadsheet for the difference in average (mean) outcomes for all subjects and year groups. In each of the table’s column “2022/23 months progress” refers to the previous release that included all years up to the 2022/23 academic year, the column “2023/24 months progress” refers to the figures on the 2023/24 scale used in this report, and the column “difference” refers to the difference between the two.  Tables 1 to 4 below, illustrates the change for the main data in this release.

Table A1: Difference in months progress, Numeracy (Procedural), all pupils
Year groupAcademic yearData published in June 2024Data published in June 2025Difference
32018/196.2 7.6 1.4 
62018/193.6 13.9 10.4 
92018/19-2.3 6.6 8.9 
32020/21-0.1 0.1 0.2 
62020/211.2 1.3 0.1 
92020/212.2 0.6 -1.6 
32021/22-0.2 -2.1 -1.9 
62021/220.7 -1.2 -2.0 
92021/221.3 -1.4 -2.7 
Table A2: Difference in months progress, Welsh Reading, all pupils
Year groupAcademic yearData published in June 2024Data published in June 2025Difference
32020/216.9 7.1 0.3 
62020/2110.5 12.5 1.9 
92020/2117.2 24.2 7.0 
32021/222.5 4.3 1.8 
62021/226.9 7.9 1.0 
92021/2212.4 11.9 -0.5 
Table A3: Difference in months progress, English Reading, all pupils
Year groupAcademic yearData published in June 2024Data published in June 2025Difference
32020/213.6 3.2 -0.5 
62020/212.0 5.6 3.6 
92020/215.6 11.1 5.5 
32021/223.3 3.0 -0.4 
62021/221.7 1.6 -0.1 
92021/224.7 4.2 -0.4 
Table A4: Difference in months progress, Numeracy (Reasoning), all pupils
Year groupAcademic yearData published in June 2024Data published in June 2025Difference
32021/22-2.0 -0.2 1.8 
62021/22-10.0 -1.1 8.9 
92021/22-4.3 0.6 5.0 

Ethnicity difference filtering

Some ethnic groups in Wales have a very small number of learners. To protect confidentiality, we present such data using 19 major ethnic groups which are amalgamations of the smaller groups. To narrow down the results presented in the Ethnicity section and to make the charts more readable, we include only the ethnic groups that showed a statistically significant difference from White – British learners. That is to say that the difference was unlikely to occur due to chance natural year-on-year sampling variation. Specifically, in terms of the statistical significance testing, t-tests were calculated for each group compared to White – British learners in 2022/23, which were Bonferroni corrected for multiple comparisons and only those differences that had a one per cent or less chance of occurring by chance were considered for reporting. For this the average difference across the data combined across all academic years was used as the criteria for inclusion or not to make sure that only consistent trends across time are presented in the main section of the report.

Technical information

The personalised assessments are adaptive in nature. Each learner receives a different set of questions to their peers, and each learner’s assessment is dynamically tailored - if they get questions right, they will receive harder questions next, and if they answer questions incorrectly, they will receive easier questions next. This means that it is not valid to compare raw scores that learners achieve on their assessments, as each learner sees different questions of varying levels of difficulty.

Therefore, the personalised assessments make use of Item Response Theory (IRT) to work out how learners have done in their assessments. IRT is a statistical approach that allows us to allocate a difficulty rating to each question based on how learners across the year groups responded to it.  This means that this approach can account for the level of challenge of the different questions and produce IRT scores that are comparable no matter which questions each learner answered. The ‘months’ attainment metric reported in the figures above is therefore based on IRT scores (otherwise known as ability estimates).

The IRT scores used for the analysis in this report are not the same as the scores teachers, learners and parents see on personalised assessment reports. IRT scores are the ‘internal’ or underlying scores that are used to produce age-standardised scores and to calculate progress points on learner reports. The main reason for not using progress or age-standardised scores in this report is that they cannot be compared across year groups, whilst IRT scores can.

In addition to being adaptive, the personalised assessments follow an on-demand model, meaning schools can schedule assessments at any point during the academic year. Learners taking assessments early in a given academic year tend to achieve at a slightly lower level than those who take them later on.  Therefore, when aiming to evaluate whether learner attainment in one dataset differs from that in another, it is important to control for the impact of ‘learning time’.  This report therefore subsets the data to the period when the majority of assessments were taken - March to July. This mitigates (but does not entirely remove) the risk that the effects observed are due to learners taking assessments earlier or later in one year than in another.

The entire national cohort did not take the personalised assessments in the period March to July. This means that, for the purposes of this analysis, there is a risk that the learners who do take assessments in this period are not fully representative of the whole population. We have therefore applied weightings to the average scores reported in this paper to account for this possibility, based on demographic data available. This mitigates the risk that the effects observed are due to systematic differences in which learners take assessments in this time period in each year.

Schools have the option to run each assessment twice within each academic year. This has also been accounted for within the weighting, as otherwise some learners would be ‘double counted’ in the analysis. If a learner took an assessment twice within the March-July period analysed in the same academic year, each one is allocated half the weight it would otherwise have had.

In terms of how attainment differences over time are contextualised, regression models (which are used to establish the relationship between any two things) were used to determine the trend in average attainment across each year group. Eight separate linear regression models were fitted for each subject - one per year group – for all academic years of delivery. Each year group’s regression used 2 to 3 year groups’ data; that from learners in the year group in question and those in the year groups immediately above or below it.  Each regression’s formula was [IRT score ~ year group], meaning the resulting coefficient for year group could be interpreted as the amount of IRT score attainment by which we expect learners in two different year groups to differ (on average).

This allowed the reporting of score differences to be contextualised in units of months, as outlined in the body of the paper. In this report, the reference year for attainment for each assessment is 2022/23.

Links to international evidence

The patterns over time seen in this release mirror the patterns seen elsewhere internationally as individual countries and education systems recover from the impact of the global pandemic. A few examples are listed below but this is not an exhaustive list.

National Literacy Trust article summarising multiple UK studies on the impact of COVID-19 on literacy, and with reference to the attainment gap between more advantaged and less advantaged learners.

Article by Harvard Graduate School of Education (May 2023) summarising research into Covid impact on attainment in several states in the USA, noting the negative impacts of COVID-19 and socio-economic differences.

Study on learning loss due to school closures during the COVID-19 pandemic (Proceedings of the National Academy of Sciences of the United States of America), based on data of primary school children from the Netherlands, published as a paper for Proceedings of the National Academy of Sciences of the United States of America, April 2021.

Paper published in December 2022 by the World Bank Group on the educational and economic impacts of COVID-19 school closures in Poland.

An international study on learning loss following the pandemic and the educational inequalities (University of Oxford) between children from different socio-economic backgrounds published in Nature, January 2023.

Report from the National Audit Office on education recovery in schools in England following the pandemic, published February 2023.

Study by the National Foundation for Educational Research on the impact of COVID-19 on educational attainment in England, published March 2022.

The difference in favour females for reading and in favour of males for numeracy is seen internationally:

Working paper published by the OECD (Organisation for Economic Co-operation and Development) on the evolution of gender gaps in numeracy and literacy between childhood and adulthood, October 2018.

The attainment gap in favor of learners not eligible for FSM is a long-established trend across the UK. See for example:

Deprivation and Education: The evidence on learners in England, Foundation Stage to Key Stage 4, published by the Department for Children Schools and Families, 2009.

Closing the attainment gap, publication by the Education Endowment Foundation, 2018

Contact details

Statistician: Steve Hughes
Email: school.stats@gov.wales
 
Media: 0300 025 8099

SFR: 50/2025