Skip to main content

Background

The Welsh Index of Multiple Deprivation (WIMD) is the official measure of relative deprivation for small areas in Wales. It identifies areas with the highest concentrations of several different types of deprivation. WIMD ranks all small areas in Wales from 1 (most deprived) to 1,917 (least deprived). It is an accredited official statistic produced by statisticians at the Welsh Government, working under the Code of Practice for Statistics.

WIMD is calculated from 8 different domains (or types) of deprivation, each compiled from a range of different indicators. This technical report describes how WIMD 2025 was constructed and contains a full list of indicators and information about the indicators. Our WIMD guidance document provides more information on the definition of deprivation, how to interpret and use WIMD. Our WIMD results report also provides examples of applications of WIMD.

How WIMD is constructed

There are 3 main components of the index: 

  • the 54 underlying indicator datasets
  • ranks for the 8 separate domains (or types) of deprivation, created by combining relevant indicators within each domain
  • overall WIMD ranks, created by combining the domain ranks 

All these components are calculated for each of the small areas (Lower layer Super Output Areas or LSOAs) in Wales and published. A full list of the indicators can be found in the results report. 

The way the indicators and domains are combined is designed to reliably distinguish between areas at the most deprived end of the distribution, but not at the least deprived end. This means that differences between the least deprived areas in Wales are less well defined than differences between the more deprived areas.

Changes for WIMD 2025

The methodology is broadly the same as for previous indices, with the same 8 domains or types of deprivation captured. However, some new datasets, methodologies and geographies have been used to produce WIMD 2025, meaning outputs are not directly comparable to previous indices. 

Details of changes to indicators and methodologies are provided in this technical report, and a summary of changes is available in the results report. 

Indicators

The domains are built up from sets of indicators. These are measurable quantities which capture the concept of deprivation for each domain (e.g. the percentage of working age people in receipt of employment-related benefits for the employment domain, and a measure of educational attainment (KS4 average points score) in the education domain). 

All WIMD 2025 indicators have to meet the same criteria as for WIMD 2019 and its predecessors, as listed in the report on our proposed indicators. Indicators must be robust at the small area level and consistent across Wales. In practice, this means that the index is based largely on administrative data, with a limited number of census or modelled variables where appropriate administrative data are not available.

Weightings for indicators within domains are derived in several different ways, summarised as follows and explained in detail in each domain technical report chapter: 

  • for income and employment domains, only one composite indicator exists therefore no weighting is required within the domain
  • for the housing, physical environment and access to services domains, indicators are grouped into 2 or more sub-domains, then the sub-domain ranks are exponentially transformed (see annex 1.1), before being combined in a weighted sum
  • for the education, health, access to services (physical access sub-domain) and community safety domains, the statistical technique of factor analysis is used to calculate factor weights for the indicators (see annex 1.2)

As well as ranks for the overall domains, we will publish ranks for each sub-domain on StatsWales. Some of our indicators use the ONS Small Area Population Estimates (SAPE) in the denominators. We used the SAPE data published in November 2024 (estimates up to mid- 2022) which was available for the WIMD 2025 data processing period in October 2025. Since then, the ONS have published updates and revisions to some of this data on 7 November 2025. We have looked at this and assessed the impact of changes at small area level to be small. 

Domain scores

The overall index and domain ranks are the main output for WIMD. As part of the process for calculating WIMD ranks, deprivation scores (domain and overall) are produced, see the guidance report for further advice on interpreting scores. 

The individual domain ranks, calculated by ranking the weighted sum of domain indicators, are then exponentially transformed (see annex 1.1) to produce domain scores. Areas have scores (transformed ranks) ranging between 0 (least deprived) and 100 (most deprived) on each domain. The scores increase exponentially so that the most deprived areas have more prominence. This reduces the extent to which deprivation in some domains can be cancelled by lack of deprivation in others.

The sets of domain scores are then weighted according to the respective domain weight and added together to produce the overall WIMD score, which is, in turn, ranked to provide the overall WIMD ranks. 

Domain weights

Domain weights control the relative contribution of each domain to overall deprivation. Their values are based upon expert advice and the quality of the indicators available. If a domain has a higher weight, changes in that domain will have a bigger impact on the overall index. 

Table 1: domain weights for WIMD 2025 alongside the weights used in 2019
WIMD domainWIMD 2025 domain weightWIMD 2019 domain weight
Income22%22%
Employment20%22%
Health15%15%
Education14%14%
Access to services10%10%
Housing9%7%
Community safety5%5%
Physical environment5%5%

The addition of 2 new indicators in the housing domain has led to a small increase in its weight from 7% to 9%. To allow for this, the weight for the employment domain has been reduced slightly from 22% to 20%, but this remains the second highest weighted domain because it is a strong determinant of deprivation.

WIMD geographies

Super output areas

Following the 2001 Census, the ONS developed a geographic hierarchy called Super Output Areas (SOAs). They were designed to improve the reporting of small area statistics in England and Wales. The areas were reviewed, and some changes made, following the 2021 Census (ONS). Where possible, official statistics are published at the SOA geography.

There are three layers of SOA: Lower layer, Middle layer, and Upper layer. This is because disclosure requirements mean that some sets of data can be released for much smaller areas than others. To support a range of potential data requirements, it was decided to create these three SOA layers.

  • A Lower layer SOA (known as an LSOA) must have a minimum population of around 1,000.
  • The mean size of all the LSOAs is around 1,600.
  • LSOAs are built from groups of Census Output Areas (usually between 4 and 6).
  • A Middle layer SOA (MSOA) must have a minimum population of around 5,000.
  • The mean size of all the MSOAs is around 8,200.

Geographic unit for WIMD

The geographic areas used in the calculation of WIMD 2025 are the 1,917 LSOAs in Wales. LSOAs were used as the geographic unit in WIMD 2005, 2008, 2011, 2014 and 2019. The ONS reviewed LSOA boundaries after the release of Census 2021 data, and there are now 1,917 LSOAs instead of the previous 1,909 for WIMD 2019.

Although the overall WIMD ranks are only calculated at LSOA level, we will make deprivation profiles for larger areas (like local authorities, local health boards, MSOAs and Senedd Constituency areas) available on StatsWales. These look at the proportion of small areas within a larger area that are very deprived. Individual indicator data will also be published at a range of geographies on StatsWales. For most domains, indicator data are allocated to an LSOA by the data suppliers as part of the collection process. However, data is provided at a lower geographical level for some indicators in the access to services, education and community safety domains. An explanation of how data were allocated to LSOAs for these domains is provided in annex 1.3.

Acknowledgements

We are grateful for the contributions of many people and organisations who have provided data and advice for WIMD 2025. We especially wish to thank the following for the development work or tailored support they provided: 

  • Building Research Establishment (BRE), who produced an updated version of the poor quality housing indicator
  • Care Inspectorate Wales, who provided location data for childcare providers
  • Caroline Thomas and Mark Corbin, CGI, who developed travel times calculations
  • Department for Education, who provided data for the education domain
  • Deprivation.org, Oxford Consultants for Social Inclusion (OCSI), and the Ministry of Housing, Communities & Local Government, who provided several indicator datasets
  • Digital Health and Care Wales, who provided data on GP-recorded health conditions
  • Noise Consultants Ltd, who produced noise exposure data
  • Office for National Statistics, who provided data for the health domain and population estimates
  • Professor Glen Bramley, Heriot-Watt University, who produced the housing affordability indicator
  • Professor Rich Fry and Oliver Thwaites, Population Data Science, Swansea University, who provided data on ambient greenness
  • Public Health Wales, who provided data from cancer registrations and the Child Measurement Programme
  • Welsh Police Forces, who contributed to the community safety domain

Annex 1.1: exponential transformation of the domain ranks

The exponential transformation of ranks reduces the extent to which deprivation in some domains can be cancelled by lack of deprivation in others. The transformation 'draws out' the ranks of the most deprived areas so that spaces are introduced between areas that reflect the actual distributions, and emphasise the most deprived 'tail' of the distribution.

The precise transformation involved is as follows. For any LSOA, denote its rank on the domain, scaled to the range (0,1], by R (with R=1/1917 for the least deprived, R=1917/1917=1 for the most deprived). The transformed domain score equals:

Image
-23 x log{1 - R x [1- exp(-100/23)]}

 

where log denotes natural logarithm and exp the exponential or antilog transformation. This formula is straightforward to calculate and simpler than the commonly-used transformation to a normal curve, which requires the use of a look-up table. 

Figure 1.1: histogram of a transformed domain

Image

Description of figure 1.1: this shows the distribution of the transformed ranks, called scores. Each transformed domain has a range of 0 to 100, with a score of 100 for the most deprived LSOA. The least deprived LSOA (R = 1/1917) has a domain score of approximately 0.01. 

The constant -23 gives a 10% cancellation property. This means that 10% of LSOAs have a score higher than 50 (the most deprived), and the remaining 90% of LSOAs have scores between 0 and 50. When transformed scores from different domains are combined by averaging them, the skewness of the distribution reduces the extent to which deprivation in one domain can be cancelled by lack of deprivation in another. For example, if the transformed scores on two domains are simply averaged, with equal weights, a (hypothetical) LSOA that scored 100 on one domain and 0 on the other would have a combined score of 50 and would thus be ranked at the 90th percentile. Averaging the untransformed ranks, or after transformation to a normal distribution, would result in such a LSOA being ranked at the 50th percentile; the high deprivation in one domain would have been fully cancelled by the low deprivation in the other. 

This means the index methodology is designed to reliably distinguish between areas at the most deprived end of the distribution, but differences between the least deprived areas in Wales are less well defined.

As well as being used to combine domain ranks to produce overall index ranks, this transformation is applied to sub-domain ranks to produce domain ranks for the access to services, housing and physical environment domains.

While each LSOA was given a distinct rank for most WIMD 2025 indicators, a small number of indicators had a large number of tied “0-values” at the least deprived tail of the distribution. For example, within the access to services domain there were 955 LSOAs with 0% of properties unable to receive superfast broadband. This pattern was observed in the digital access sub-domain within access to services and the flood risk sub-domain within physical environment. For the purposes of the exponential transformation, these LSOAs were assigned subdomain ranks of 1917 to minimise their contributions to overall domain scores.

Annex 1.2: factor analysis

Factor analysis overview

Factor analysis is a method for assessing the extent to which a set of indicators may be measuring the same underlying construct or factor. The premise behind a one-common-factor model is that the underlying factor is imperfectly measured by each of the indicators in the dataset but that indicators that are most highly correlated with the underlying factor will also be highly correlated with each other. By analysing the correlation between indicators it is therefore possible to make inferences about the common factor and as a result estimate a ‘factor score’ for each LSOA. This score is derived from a set of weights for each of the indicators in the data set that is generated by the process of factor analysis. This factor score can then be used as the domain index. 

Factor analysis has only been applied to four domains: health, education, access to services and community safety. Within the access to services domain, factor analysis is used to calculate weights within the physical access sub-domain. The main reasons why factor analysis has been used are: 

  • the indicators are on different metrics and have different levels of accuracy and so cannot simply be summed
  • to ascertain the factor that underlies the indicators within the domain
  • to help take into account the problem of ‘double counting’ within a domain

In the employment and income domains, we can identify individuals who are or are not deprived in terms of the domain definition. The number of deprived people can then simply be summed and divided by a suitable denominator to create an area rate. 

This is not possible in the other 6 domains, where forms of deprivation tend to present themselves in different ways at different times. For example, an individual is ‘health deprived’ if they die prematurely or are long-term sick. While the long-term sick may be more likely to die prematurely than others, these events do not occur to the same people at the same time. 

Typically, such domains include data on people at different ages and stages. For example, in the education domain, lack of qualifications in the adult population as well as poor results at school level were assessed. 

We hypothesise that there is an underlying factor at the local area level (e.g. health deprivation) that makes these different states likely to exist together in the same area. This underlying factor cannot be measured directly but can be identified through its effects on specific individual measures (e.g. premature death, long-term illness, low birth-weight children etc.). 

We have, therefore, collected several indicators that measure, with different levels of accuracy, the effects of this underlying factor. By looking at the relationship between all these indicators the underlying factor can be identified and quantified. 

Factor analysis also takes some account of the problem of ‘double-counting’ within domains that potentially contain indicators that overlap with each other. For example, in the health domain, it is possible for an individual to have had cancer and be included in the limiting long-term illness indicator. Combining data using other methods such as ‘z scores’ more directly double-weights these cases by taking them all into account. Factor analysis, however, takes some account of this overlap because an indicator may have a lower weight if the contribution it makes has already been taken into account. 

Maximum likelihood estimation 

WIMD 2025 follows the methodology of recent iterations and that applied by Oxford University for WIMD 2000, as well as the indices for the other 3 UK countries. 

Maximum Likelihood (ML) Factor Analysis was chosen as the method of estimation because it is: 

  • scale-invariant (unlike Principal Factoring)
  • accounts for measurement error (unlike Principal Components Analysis)
  • treats data as sampled from a super-population, which is consistent with project assumptions

Communality

This is the proportion of a variable's variance explained by a factor structure. A variable's communality must be estimated prior to performing a factor analysis. A communality does not have to be estimated prior to performing a principal component analysis. Communality estimates are estimates of the proportion of common variance in a variable. Prior communality estimates are those which are estimated prior to the factor analysis. Common methods of prior communality estimation include: 

  • an independent reliability estimate
  • the squared multiple correlation between each variable and the other variables
  • the highest off-diagonal correlation for each variable
  • iteration by performing a sequence of factor analyses using the final communality estimates from one analysis as prior communality estimates for the next analysis 

Final communality estimates are the sum of squared loadings for a variable in an orthogonal factor matrix. 

The default setting for communality prior estimates, Square Multiple Correlation, was used for WIMD 2025 calculations.

Calculation process

The indicators were first transformed to the standard normal distribution. The transformed indicators were then entered into a ‘one common factor Maximum Likelihood factor analysis’ using the fa function from the R package Psych (Rdocumentation.org)

The weights component of the resultant factor analysis was extracted and used to produce the domain scores for the community safety, education, and health domains and the physical access sub-domain score within the access to services domain.

Annex 1.3: allocation of data to Lower layer Super Output Areas (LSOAs)

For most domains, indicator data are allocated to LSOAs by the data suppliers as part of the collection process. However, data is provided at a lower geographical level for some indicators in the access to services domain and for most indicators in the education and community safety domains. An explanation of how indicator data in these domains were allocated to LSOAs is provided below.

Education domain

Except for the indicator on adults with no or low qualifications all new and updated indicator data in the education domain was provided at the postcode level. As many postcodes do not sit wholly within one LSOA, a method of apportioning postcode-level data to multiple LSOAs was used. 

In this apportionment method, data are weighted by the proportion of the dwellings in a postcode that sit within each intersecting LSOA. It is assumed that the proportions of dwellings in each intersecting LSOA will be broadly equivalent to the proportions of the postcode population living in each intersecting LSOA. Finally, where postcodes do sit wholly within one LSOA, data are given a weighting of 1. 

To apply this method, a postcode-to-LSOA lookup (with calculated dwelling weights) was developed by the Welsh Government Data and Geography team and initially used to match as many postcodes to LSOAs as possible.

Any non-matched postcodes were then subsequently matched using an ONS lookup that assigned postcodes to LSOAs based on the nearest geographical centre of an LSOA. This lookup was not weighted so a default weight of 1 was assigned to all matched data. 

Any data still unmatched at this point was excluded from indicator calculations.

Access to services domain

Service point location information (such as for schools, post offices) used in the travel time indicators were geocoded and allocated to LSOAs using a Graphic Information System (GIS). 

Ofcom data on ability to receive superfast broadband is publicly available at Output Area (OA) level from Ofcom’s Connected Nations reports. The spring 2025 dataset was aggregated to LSOA using an ONS geography lookup.

Community safety domain

Data on recorded crimes and incidents of anti-social behaviour were made available at microdata level by the four police forces in Wales. These datasets included information on the geographical location of occurrence (postcode and/or grid reference) which were assigned to LSOAs using a spatial smoothing method. Specifically, a 10m buffer was drawn around a crime/incident location and that crime/incident was shared equally by all LSOAs intersecting with the 10m buffer. This process was undertaken separately for every crime and incident record in the base microdata to account for known issues with the police data geocoding.

Data on the grid references of fire incidents were sourced from the Incident Recording System (IRS) and mapped to LSOAs by the Welsh Government Data and geography team.