Data cleansing prepares census data so it is suitable for later statistical processes. It targets specific errors in the data.
There are four main data cleansing processes:
- remove false persons (RFP)
- resolve multiple responses (RMR)
- filter rules
- name reordering
Remove false persons
RFP looks for possible false person records in the census data. It does this using two methods.
The 2 of 6 rule
The 2 of 6 rule finds and removes false records from the census. It considers a person to be false if they do not answer at least 2 of 6 questions:
- Name in the individual form
- Name in the relationship matrix or household form
- Date of birth
- Relationship to others in household
- Marital status
One of the 2 questions must be either the name or date of birth.
Name check rule
The name check rule filters for obviously false names, like ‘Anonymous’ or ‘No one’.
We’ll change these names to ‘missing’ and put the census record through the 2 of 6 process.
Resolve multiple responses
The RMR process finds and merges duplicate records. These could be records of:
- communal establishments
RMR involves matching census records to themselves based on answers like name and date of birth. This lets us spot any potential duplicates within a postcode.
True duplicate records are merged into one record. We’ll keep both records if they look like different people.
Filter rules apply only to paper questionnaires. It involves fixing problems with the answering path a person takes through the questionnaire.
Not everyone needs to answer each question. In this case, the questionnaire will direct the person to the next question for them. But sometimes people will answer questions they do not need to answer anyway.
For example, a person who says they own their household outright should skip the question that asks ‘Who is your landlord?’.
Filter rules check which questions have been answered. If an answer does not agree with the routing rules through the census questionnaire, we change it to ‘No Code Required’.
Name reordering applies only to paper questionnaires. It makes sure the names on a household form match the names on individual forms.
The census asks a person to add people to the household form in the same order that they will complete their individual forms. For example, person 1 in the household form should complete individual form 1.
Name reordering finds and corrects census questionnaires where the orders of names do not match. It improves data quality for later processing.
Find out more
Find out more about name reordering in the External Methodology Assurance Panel document PMP005: Name Reordering Methodology.