AI Development: a data protection toolkit

Ludovico Lugnani Solicitor

Daisy Fulton Associate

As artificial intelligence (AI) technology continues to advance, those involved in the production or development of AI increasingly need to be aware of the significance of data protection law in relation to their work. Google’s recent launch of PaLM 2, its next-generation large language model designed to excel in tasks such as common-sense reasoning, mathematics, logic, and even coding, signals a transformative shift in AI capabilities. Google is also intending to develop Med-Palm, a large language model (LLM) designed to provide high quality answers to medical questions. These new AI developments rely heavily on the use of personal data, so it is essential for developers of such technology to operate within the boundaries of the UK GDPR to avoid being hit with large fines.

How does the AI data protection toolkit work?

This toolkit is designed to help you spot the risk areas where AI development could cause you to be in breach of the UK GDPR. It draws from the valuable guidelines provided by the Information Commissioner’s Office (ICO) to provide an initial overview of the data protection risks associated in operating AI models, divided into the four main stages of AI development. In each of the four stages below, we have referred to the pressure points highlighted by the ICO for breaching the UK GDPR. We have also suggested practical steps to reduce those risks.

Who is this toolkit for?

The toolkit has been curated to assist anyone operating in the AI industry. Whether you are creating your own LLMs or making use of existing ones, the toolkit provides you with some initial guidance on the aspects you must account for to comply with data protection law.

Stage 1: Business requirements and design

The design stage in AI development, involving detailing of business needs and system architecture, is critical for the AI model’s ultimate commercial success. This phase gives AI its required ‘intelligence’ by aligning the model to business objectives and functionalities. The aim of this stage is that by uncovering issues and adjusting the AI design to suit the business context, the resulting system effectively addresses the organisation’s unique needs before going to market.

Issue	Data protection risk	Risk reduction measure
AI systems produce unfair outcomes for individuals	Individuals suffer from unfair treatment and/or discrimination. You risk being in breach of the UK GDPR’s principle of fairness	Conduct assessments to determine how the AI system could result in unfair processing of personal data. For example, this could occur if the AI system is trained on data which leads it to treat certain groups of individuals less favourably without objective justification Implement preventative measures to limit the risk of unfair treatment or discrimination by the AI system In cases of discrimination resulting from imbalanced training data, the ICO suggests balancing this out by adding or removing data about under / overrepresented subsets of the population as part of pre-processing techniques
Excessive collection of personal data	Individuals suffer from unlawful processing of their personal data by the AI system. You risk being in breach of the data minimisation and storage limitation provisions of the UK GDPR	Document the data collected for training the AI and carry out an assessment of the accuracy, adequacy and relevance of that data. Ensure your use of the data falls within the purpose you collected it for Set out in your privacy policy the retention periods for the data you collect, together with an explanation on your review process of these retention periods Hold regular discussions about the personal data that is being collected and consider if it is necessary
Tokenistic human review of AI system outputs	Individuals may suffer from prohibited processing or unfair decisions being made about them due to solely automated decision making by the AI system. You risk being in breach of the individual’s right to object to automated individual decision-making under the UK GDPR	Implement measures to permit adequate human review of outputs from the AI system and grant reviewers the ability to challenge and override automated decision-making Specify who will conduct the review and deliver appropriate training and support to enable the reviewers to correctly identify unfair decisions by the AI system Maintain a record of information reviewed by human reviewers for final decisions

Issue

Data protection risk

Risk reduction measure

AI systems produce unfair outcomes for individuals

Individuals suffer from unfair treatment and/or discrimination. You risk being in breach of the UK GDPR’s principle of fairness

Conduct assessments to determine how the AI system could result in unfair processing of personal data. For example, this could occur if the AI system is trained on data which leads it to treat certain groups of individuals less favourably without objective justification
Implement preventative measures to limit the risk of unfair treatment or discrimination by the AI system
In cases of discrimination resulting from imbalanced training data, the ICO suggests balancing this out by adding or removing data about under / overrepresented subsets of the population as part of pre-processing techniques

Excessive collection of personal data

Individuals suffer from unlawful processing of their personal data by the AI system. You risk being in breach of the data minimisation and storage limitation provisions of the UK GDPR

Document the data collected for training the AI and carry out an assessment of the accuracy, adequacy and relevance of that data. Ensure your use of the data falls within the purpose you collected it for
Set out in your privacy policy the retention periods for the data you collect, together with an explanation on your review process of these retention periods
Hold regular discussions about the personal data that is being collected and consider if it is necessary

Tokenistic human review of AI system outputs

Individuals may suffer from prohibited processing or unfair decisions being made about them due to solely automated decision making by the AI system. You risk being in breach of the individual’s right to object to automated individual decision-making under the UK GDPR

Implement measures to permit adequate human review of outputs from the AI system and grant reviewers the ability to challenge and override automated decision-making
Specify who will conduct the review and deliver appropriate training and support to enable the reviewers to correctly identify unfair decisions by the AI system
Maintain a record of information reviewed by human reviewers for final decisions

Stage 2: Data acquisition and preparation

Data acquisition and preparation is a crucial stage in developing AI systems. It involves collecting and curating relevant data to train the AI models. This stage is the foundation for what the AI model will be capable of achieving once deployed as it will rely on this information to produce valid responses. However, careful consideration is required to ensure the data is accurate, adequate, and relevant to the system’s purpose.

Issue	Data protection risk	Risk reduction measure
Failing to choose an appropriate lawful basis for the AI system’s data processing	Unlawful collection of personal data leading to loss of trust and unfair processing. You risk being in breach of the requirements for lawful processing and transparency under the UK GDPR	Identify the legal grounds you rely on to collect and use personal data. Keep a record of that identification process Set out the lawful basis in your privacy notice in a way that is easily accessible and understandable to the people the AI system will be applied to Ensure processing is necessary and consider whether there are alternative means to achieving your purpose which would involve less personal data
Bias in the AI system output	Bias in the AI system decision making caused by insufficient bias analysis resulting in individuals suffering from discriminatory outcomes. You risk being in breach of the UK GDPR’s principle of fairness	To detect existing bias, the ICO suggest conducting data protection impact assessments to determine the likelihood of bias in the AI system’s output Consider whether you will need to process more personal data to carry out the bias analysis. If you do, ensure that individuals whose personal data you will be using understand what you will be doing with their data Implement technical approaches to mitigate bias. The ICO suggest modifying the data, changing the learning process, or modifying the model after training
Collection of too much personal data to train the AI system	Individuals suffer from unlawful processing of their personal data by the AI system. You risk being in breach of the data minimisation provisions of the UK GDPR	Consider if all the personal data you will be processing is strictly necessary Assess the purposes of the AI system data processing to determine if such purposes could be achieved with less personal data Remove identifiers from the AI system’s training data before the data is shared internally or externally

Stage 3: Training and testing

This stage of developing AI is key to creating a model which will be commercially successful. Studies have shown that many AI projects fail and this is often down to this stage of development. The training at this stage gives the AI the ‘intelligence’ it needs to perform as intended. Testing the AI reveals any latent issues which can then be addressed before the product goes to market.

Issue	Data protection risk	Risk reduction measure
Function creep by learning algorithm (where the functionality diverges from the intended purpose or use)	The purpose for which you originally collected the personal data is no longer the same as the way the data is now used. You risk being in breach of the purpose limitation restrictions in the UK GDPR	Identify all the purposes for which you collected the personal data Put a mechanism in place to regularly check the current purposes against the purposes for which data was used. This check is particularly important when new features are released If the purpose has changed, you will need to ensure your use of the data is fair, lawful and transparent and reflected in the privacy policies you provide to data subjects
Overfitting (where the learning algorithm becomes too focused on the training data and is not able to generalise and adapt to other situations)	If you do not take steps to ensure an appropriate level of security, individuals may suffer from personal data breaches. You risk being in breach of the UK GDPR’s requirements around ensuring the security of the personal data you process	Collect varied data and consider if you need to collect more data (but balance this with the obligation not to use more data than you need, in order to comply with the principle of data minimisation) Use data which is not personal data as far as possible, for example by using properly anonymised data
Security (insufficient security tests)	If you do not take steps to ensure an appropriate level of security, individuals may suffer from personal data breaches. You risk being in breach of the UK GDPR’s requirements around ensuring the security of the personal data you process	Ensure that the security measures you are taking are appropriate to the risk which arises. Consider the following: the likelihood and severity for individuals of the risk materialising; the way the algorithm is used; your existing risk management processes and whether they need to be updated; and how you document the movement and storage of personal data

Stage 4: Deployment and monitoring

Deploying the learning algorithm is when you integrate that algorithm into the real world, so it can be used by others. However, deployment should not be the final step in this process. Unlike conventional software, where the code is deterministic and (should) continue to run as written, AI is dynamic and susceptible to changes in the real world. This means you need to keep monitoring your AI after it is deployed to avoid, among other risks, breaching data protection law.

Issue	Data protection risk	Risk reduction measure
Undetected model drift (deterioration of the algorithm’s performance over time which goes unnoticed)	If you do not take steps to avoid undetected model drift, the algorithm may make unfair decisions about individuals, in contravention of the UK GDPR’s principle of fairness	Implement regular system testing to catch and correct model drift. The frequency with which you need to undertake such testing depends on the purpose of your algorithm. For example, if your AI assesses CVs against job requirements, you may need to test more frequently as the job requirements change. In contrast, if your AI is used to analyse more static requirements you will not need to test as frequently In order to comply with the principle of accountability, the ICO also recommends establishing and documenting thresholds for determining whether the AI needs to be retrained
Individuals being subject to decisions which cannot be explained to them due to the complexity of the algorithm	There is a risk of breaching the UK GDPR’s principle of transparency, as individuals may not understand the decisions being made about them by the AI	It is important to provide enough data information to individuals so they can understand the logic behind the algorithm’s decisions. You may want to consider what clear ways you can use to provide this information, for example graphics or summary tables. You should consider the audience you are aiming your explanation at, with particular attention given if you are processing childrens’ data The ICO suggests that there are four principles to follow to ensure that using your AI is explainable. You need to: be transparent be accountable consider the context, and reflect on the impact of your algorithm on individuals as well as society more widely Once deployed, seek feedback from individuals to ensure that your explanations are clear to your target audience. If necessary, update your explanations based on this feedback
Lack of human review (where there is no meaningful review by humans to interpret the output from the algorithm)	Individuals have the right under the UK GDPR not to be subject to decisions based solely on automated processing (in certain circumstances)	Where this right applies, it is important to ensure there is the right to obtain human intervention, so the relevant data subject can express their point of view and, if they wish, to contest the decision made by your AI system The people who are reviewing the algorithm’s decisions must be able to overrule those decisions and take account of other information which may not have been submitted to the algorithm

Thoughts to leave you with

With new technology comes new ways to meet (and breach) data protection law. As innovative and increasingly intelligent algorithms are developed and come to market, it is important to ensure that you are not breaching data protection law through the AI you are developing. We have set out some of the key risks above, but this is not an exhaustive list.

Moreover, you will need to manage competing interests when assessing the risks. Some of these competing interests are within the data protection sphere, for example you will need to balance the UK GDPR’s principle of accuracy against the principle of data minimisation, to ensure you have enough data to train accurate AI while not taking more data than you need. Other competing interests span the data protection and commercial spheres, for example there is research suggesting that security risks are increased when you make AI more explainable (which you are required to do in accordance with the UK GDPR’s transparency principle). It is important to note here that having competing commercial interests is not an excuse to avoid being compliant with your data protection obligations.

If you would like advice on navigating the data protection laws which apply to your learning algorithm, we are here to help. Please contact Daisy Fulton or Ludo Lugnani for further assistance.