top of page


Data Classification: Public

  • Masking
    Masking generally describes any technique that replaces values of a certain attribute without any consideration of the original value. This ensures no information about the protected information can be inferred after masking. There are a number of different ways to perform masking with substitution and nulling out techniques being the most common. Substitution masking replaces the values of an attribute with values selected at random from a predetermined substitution table. For example when protecting date of birth a random selection of dates can be used as a substitution table. As such this technique allows specific data types to be maintained, making it particularly useful when using the protected data to test applications. Nulling out, sometimes referred to as masking out, is another powerful protection technique where the attribute value is replaced with a fixed value (e.g turning a credit card number to xxxx-xxxx-xxxx-xxxx). Again this method allows for length or type to be maintained and effectively signals to any data user that this data has been protected. It can also be implemented on part of the data, for example leaving the last four digits of a credit card number (xxxx-xxxx-xxxx-1234). Example data: We can also improve the privacy of this data set by Nulling out the ID values.
  • Local suppression
    Local suppression is the process of applying masking on particular values to increase privacy without having to mask all values of that attribute. Usually it is applied to rare values in a dataset that can increase the risk of re-identification. For example when looking at people's income, very high earners and very low earners could have their income masked as these values will be less common. Example data: If we further consider the data set with the direct identifiers nulled out we can see there are a few outliers in terms of earnings and age. This puts these individuals at increased risk of re-identification . We can remove these values using local suppression.
  • Record suppression
    Record suppression Involves masking the values for particular people within a data set. This could for example be because they have opted out of having their data used. Example data: All the people, with the exception of Gwen and Peter, may have given permission for the data to be used. As such record suppression can be used to remove those individuals' entries.
  • Anatomisation
    The process of anatomisation separates an attribute from the main data. This removes the link between a sensitive attribute and data that could be used to identify the person it relates to. This method can offer both high utility and privacy but does mean the link between an attribute and other information in the data set is lost. Example data: We can consider an example where a data scientist is interested in investigating gender pay gaps. The information required for this analysis allows anatomisation to be used to isolate the attributes that are of interest.
  • Generalisation
    When considering a data set, privacy can often be improved but utility retained by reducing specificity of the data. Generalisation achieves this by replacing a value with a range (e.g. age ranges). While it can easily be applied to numerical data types, generalisation can also be applied to other data. For example towns can be generalised to countries. The purpose of techniques like this is to make any combination of identifying features more common, thereby reducing the risk of re-identification. Example data: After Nulling out the name attribute privacy can be further increased by reducing specificity of the age QID attribute. In this example a 10 year age range is used. By increasing or decreasing the age range the specificity of the data can further be controlled.
  • Pseudonymisation
    The process of Pseudonymisation replaces values with the Pseudonym. A key differentiator to this technique is that it is reversible. The link between pseudonyms and the original values is kept in a separate location which can later be used to reverse the pseudonymisation process. Any data set that has been pseudonymised can however not be considered as anonymised and as such requires different management. Pseudonymisation can also be applied with different policies resulting in different levels of privacy and utility. The pseudonymisation policies are: Deterministic: Each value is mapped to a pseudonym and replaced by that pseudonym every time it appears in the data set. This retains information such as distribution of values. While this can offer analyst insight into the data, the same information about distribution can also be used to re-identify values. Document randomised: In this policy pseudonyms are assigned randomly. As such the same value in the data will not always map to the same pseudonym. If the same dataset is pseudonymised using this policy, the output will always look the same. Fully randomised: This policy behaves much the same way as the document randomised policy with the key difference that each time a data set is pseudonymised it will generate a different output. While it further increases privacy it prevents any single value from being tracked, which may be required for tasks such as application testing. There are a number of different ways to implement pseudonymisation depending on the use case: Encryption Encryption based Pseudonymisation creates a pseudonym for each value by encrypting it. The behaviour can be set to be deterministic, document randomised or fully randomised by using a different salt before encryption. It is important to note that the use of deterministic encryption might not meet the privacy requirements of all regulatory bodies. Counter While this is a simple technique it can be highly effective. Each value is replaced by a count (1,2,3 etc.). The nature of this technique means it is best suited either to deterministic or document randomised policies. Random number generator This technique is better suited to the fully randomised policy and replaces each value with a randomly generated number. Example data: Deterministic Counter: Here we can still see that Julia and Rose appear twice within our data set as Rose is pseudonymised by ‘3’ and Julia by ‘2’. Random number


eXate respects your privacy and is committed to protecting your personal data. This information security and privacy policy will inform you as to how we look after your personal data when you visit our Website (regardless of where you visit it from) and tell you about your privacy rights and how the law protects you.

This notice is layered so you can click through to the specific areas in which you are interested.

bottom of page