National Challenge - Hiding Wally!
Sponsored by the Queensland Government on behalf of the:
- Office of the Information Commissioner Queensland
- Queensland Health
- Queensland Government Chief Information Office, Department of Science, Information Technology and Innovation
'Hiding Wally' – creating a simple-to-use universal dataset de-identification tool
There are privacy concerns associated with the publishing of datasets containing the data of identifiable individuals. Datasets can be 'de-identified' but there are risks if the identification process is too shallow the data can easily be 're-identified' or if the de-identification process is too thorough it can de-value the dataset.
Australian privacy law takes a balanced position between these two positions – the de-identification process must just ensure that individuals' identities are not being 'reasonably ascertainable'.
De-identification methodologies can be laborious or complex. At present, there is no simple-to-use universal tool available for the Australian jurisdiction that can be applied to a dataset that will de-identify the data.
Ideally, GovHackers will create a program that could 'wash' any given dataset of identifying attributes while retaining to greatest practical extent, the integrity of the remaining data.
There are no privacy issues when the identity of an individual is ‘not apparent’ from the dataset or where the identity 'cannot reasonably be ascertained' from the dataset.
For further guidance on this area see:
OIC dataset publication and privacy
OIC dataset publication and de-identification techniques
OIC dataset publication and risk assessment
NSS confidentiality principles
NSS confidentiality and privacy
OAIC de-identification of data and information
A useful overview of de-identification is the US publication: De-identification of Personal Information
The program must be able to be:
- Applied to and wash any dataset, in at least one commonly used file format; and
- Used by a novice