This step is applied to census data to account for specific errors, and prepare the data so it is suitable for later statistical processes.
The four main data cleansing processes are:
- Remove False Persons (RFP)
- Resolve Multiple Reponses (RMR)
- Filter Rules
- Name Re-ordering – on paper questionnaires only
Remove False Persons
Remove False Persons, or RFP, looks for possible false person records in the census data. This is done using two methods:
- The 2 of 6 Rule
- Name Check Rule
The 2 of 6 Rule identifies and removes false records from the census dataset. The rule states that to be considered a genuine person, a person must have filled out at least two of the following six questions. One of the two questions must also be either of the name or date of birth:
1.Name in the individual form
2.Name in the relationship matrix or household form
3.Date of Birth
4.Relationship to others on household
The Name Check Rule will filter for obviously false names (such as ‘Anonymous’ or ‘No one’) or other indicators that there is not a person on a record. Records with false names will be changed to ‘missing’ and the record will then go through the 2 of 6 process.
More information on RFP can be found in the External Methodology Assurance Panel document PMP011: Remove False Persons (RFP).
Resolve Multiple Responses
The Resolve Multiple Response (RMR) process attempts to identify and merge duplicate records. This could be records of communal establishments, households and people which appear in the census dataset.
This process involves matching the census dataset to itself, based on key variables, such as name and date of birth. We can then identify potential duplicate records within a given postcode. These records are reviewed and are either merged into one record (if thought to be the same person) or both records are kept (if thought to be different people).
This process helps to reduce the potential overcount of the population and therefore the error in the census dataset.
More information on RMR can be found in the External Methodology Assurance Panel document PMP015: Resolve Multiple Responses.
Filter Rules is a process which resolves issues in the answering path (routing) of a questionnaire. In 2022, this will be required for paper questionnaires.
When a census paper questionnaire is received and scanned, each question is captured and coded separately. Filter Rules are used to resolve routing errors or issues in a record once coded answers are combined again.
For example, not everyone is required to answer each question. In this case the questionnaire directs the respondent to the next appropriate question. However, sometimes people overlook the guidance, or they simply wish to answer the question.
When the data is coded, blank questions are automatically marked as “Missing”. Filter Rules check which related questions have been answered. It then changes those which do not agree with the routing rules to “No Code Required”.
An example of this is as follows. The landlord and tenure filter rule resolves routing issues between the answers on question H12 (Does your household own or rent this accommodation?) and H13 (Who is your landlord?).
If either, “Owns outright” or “Owns with a mortgage or loan” are ticked, question H11 should be skipped. The rule resolves those who tick one of the two options but go on to answer Who is your landlord?
This process compares the names on the household form to the names on the individual forms and tries to make sure they match up correctly.
For a census return, respondents are asked to add people to the household form in the same order that people will complete their individual form. However, sometimes this does not happen.
This Name Re-ordering process was developed to identify and correct where names do not appear in the same order in the household and individual forms. This improves the data quality for further data processing steps.
More information on Name Re-ordering can be found in the External Methodology Assurance Panel document PMP005: Name Reordering Methodology.
Statistical Methodology Rehearsal 2020
From April to June 2020, National Records of Scotland (NRS) carried out a rehearsal of several statistical methodologies for Scotland’s Census 2022. The data cleansing methodologies were tested during this rehearsal.
The evaluation reports from this rehearsal can be found on the Statistical Methodology Rehearsal 2020 web page, which will be published in September 2020.
Statistical Quality Assurance
NRS has published a Statistical Quality Assurance Strategy (PDF 1MB) which provides more information on how these data cleansing processes, and others, will be quality assured.
NRS aims to harmonise statistical methodologies with other UK census offices as much as possible. We share ideas and provide feedback on methodologies through harmonisation working groups with the Office of National Statistics (ONS) and the Northern Ireland Statistics and Research Agency (NISRA).
For more information on how NRS work with the other UK censuses to harmonise our statistical methodology for UK Data Users, as well as share best practice and lessons learned, please see the UK Census Data tab of this website.
In February 2020, NRS ran Statistical Methodology Stakeholder events aimed at the general public and data users. These events gave attendees a high level overview of what happens to census data from when NRS receives census responses through to producing the outputs.
During these events, we sought feedback to help to further develop our plans to ensure the highest quality of outputs for our users. Slides from the event (PDF 4.5MB) are published on the event page of the Scotland’s Census 2022 website.
NRS are seeking feedback on the content provided on these pages to help us provide the information most relevant to you. To share your views and information needs, please contact firstname.lastname@example.org).