Scotland's Census 2022 - Linking Census to Administrative Data: Veteran and Age Checks
Background
National Records of Scotland (NRS) is responsible for Scotland’s Census. It happens every decade, providing information on all people and households in Scotland. All households in Scotland are required to complete a census return for all usually resident persons.
The cleansing phase of the processing ensures that the quality of census data received is as high as possible. For various reasons, data items may not be accurate or complete. Respondents could have missed a question (deliberately or in error), or there could be scanning errors with paper forms. Unsubmitted returns from the Online Collection Instrument may be incomplete and not contain information on various questions. Although data from administrative sources is not added to the census, administrative data can be used to guide the handling of such problems.
1.1 Age Check
Date of birth is used to derive age at census day, one of the most important census variables that is used in the majority of census outputs. However not all records in the 2022 dataset contained date of birth, and some contain a date of birth that is inconsistent with other information provided on the census return. This could be due to scanning errors, deliberate non-response, or from unsubmitted returns.
This paper is an update to the original External Methodology Assurance Panel (EMAPs) paper, which was published before data processing started. The Date of Birth check was carried out as described in the EMAPs Paper.
1.2 Veteran Check
Scotland’s Census 2022 asked a question on whether respondents had previously served in the UK Armed Forces. This information is needed to help users and service providers support veterans and their families, in line with the Armed Forces Covenant.
People who are currently serving should answer ‘No’ to the question ‘Have you previously served in the UK Armed Forces?’. However, testing showed that some currently serving members will answer yes to this question (as they see themselves as previously serving). Therefore, in order to obtain accurate information on individuals who had previously served (but were not serving on census day) the responses of those who were serving on census day needed to be changed to ‘No’.
In order to do this, an extract of the Joint Personnel Administration (JPA) dataset was provided by the Ministry of Defence (MoD). This dataset included information on every currently serving member of the UK Armed Forces. This included name, date of birth and sex, but did not include any location information. This dataset was then linked to the census data. Any person who indicated that they had previously served, but also appeared on the currently serving data, would then have their response corrected from ‘Yes’ to ‘No’.
NRS did test a question that included an option for ‘Currently Serving’, and liaised with the MoD on this. The MoD concluded that this option posed a potential security risk as it would have identified currently serving personnel, so it was not included in the final question.
Age Check
The age check was introduced to improve the Edit and Imputation process.
It is vital that all census records contain date of birth and age, and that this is as accurate as possible. Date of birth may be missing due to deliberate non-response or the question being skipped by mistake. Date of birth may also be incorrect in the census dataset sue to scanning errors, or respondents making mistakes when inputting data, leading to inconsistent or implausible data.
The age check linked the census dataset to the National Health Service Central Register (NHSCR) using name, postcode and sex. It compared the census date of birth (if present) with the administrative data (NHSCR) date of birth. If the census date of birth is missing, or the age differs from the administrative age, the administrative age is added to the census dataset to be used in Edit and Imputation. The administrative age is not used to populate the main census age variable, but is used during Edit and Imputation to assist the selection of a similar donor.
If name, postcode and sex matched exactly the links were automatically accepted. A small proportion of links that were strong matches, but not exact were sent to clerical review so a person could determine whether the two records represented the same person. This was generally due to a person using a nickname on the census form (e.g. Joe rather than Joseph), differences in spelling, or a person having a middle name included in one dataset but not the other.
Veteran Linking
The linking method used was based on the census–census linking. More information on that linking methodology can be found at: Scotland's Census 2022 - Census–Census Linking and Overcount Correction – Final Methodology
Only individual census records that indicated they previously served, or did not answer the ex-service question, were linked to the MoD dataset. Individuals who answered ‘No’ the ex-service question were not included in the linking process. Individuals aged under 16 on census day were also excluded from the linking.
This subset of the census data was then linked to the MoD data, to identify any currently serving members of the armed forces who identified as having previously served. Records are linked on name and date of birth, and given a score of how likely it is that a pair of records are the same person. This score is then used to determine whether to accept or reject the link. If name and date of birth matched exactly between the Census and MoD data, then these links were automatically accepted.
Some census records linked to multiple MoD records. The ex-service status was updated for these records, as regardless of which MoD record was the correct link, the end result would be the same.
Records that linked but not automatically accepted or rejected were sent to clerical review, where a reviewer would decide whether the two records appeared to be the same person, or different people.
Once the linking and clerical review had taken place, all census records that linked to the MoD data had their ex-service status updated to ‘No’, so currently serving personnel would not be grouped with those who had previously served in the analysis. The original responses to this question are deleted from the census dataset, so this cannot be used to infer who is currently serving.
Conclusion
The Date of Birth check ran successfully, and updated approximately 80,000 census records to include administrative age where date of birth was missing, incomplete, or differed from the administrative date of birth. Including administrative age on these records meant that imputed ages are more likely to represent the true age of an individual, improving data quality.
The census data has also been updated so that currently serving personnel are not included in the ex-service outputs. This will provide users and service providers with an accurate measure of the number people who have previously served in the UK armed forces, but were not serving on census day.