AI Development: a data protection toolkit
As artificial intelligence (AI) technology continues to advance, those involved in the production or development of AI increasingly need to be aware of the significance of data protection law in relation to their work. Google’s recent launch of PaLM 2, its next-generation large language model designed to excel in tasks such as common-sense reasoning, mathematics, logic, and even coding, signals a transformative shift in AI capabilities. Google is also intending to develop Med-Palm, a large language model (LLM) designed to provide high quality answers to medical questions. These new AI developments rely heavily on the use of personal data, so it is essential for developers of such technology to operate within the boundaries of the UK GDPR to avoid being hit with large fines.
How does the AI data protection toolkit work?
This toolkit is designed to help you spot the risk areas where AI development could cause you to be in breach of the UK GDPR. It draws from the valuable guidelines provided by the Information Commissioner’s Office (ICO) to provide an initial overview of the data protection risks associated in operating AI models, divided into the four main stages of AI development. In each of the four stages below, we have referred to the pressure points highlighted by the ICO for breaching the UK GDPR. We have also suggested practical steps to reduce those risks.
Who is this toolkit for?
The toolkit has been curated to assist anyone operating in the AI industry. Whether you are creating your own LLMs or making use of existing ones, the toolkit provides you with some initial guidance on the aspects you must account for to comply with data protection law.
Stage 1: Business requirements and design
The design stage in AI development, involving detailing of business needs and system architecture, is critical for the AI model’s ultimate commercial success. This phase gives AI its required ‘intelligence’ by aligning the model to business objectives and functionalities. The aim of this stage is that by uncovering issues and adjusting the AI design to suit the business context, the resulting system effectively addresses the organisation’s unique needs before going to market.
Issue | Data protection risk | Risk reduction measure |
---|---|---|
AI systems produce unfair outcomes for individuals | Individuals suffer from unfair treatment and/or discrimination. You risk being in breach of the UK GDPR’s principle of fairness |
|
Excessive collection of personal data | Individuals suffer from unlawful processing of their personal data by the AI system. You risk being in breach of the data minimisation and storage limitation provisions of the UK GDPR |
|
Tokenistic human review of AI system outputs | Individuals may suffer from prohibited processing or unfair decisions being made about them due to solely automated decision making by the AI system. You risk being in breach of the individual’s right to object to automated individual decision-making under the UK GDPR |
|
Stage 2: Data acquisition and preparation
Data acquisition and preparation is a crucial stage in developing AI systems. It involves collecting and curating relevant data to train the AI models. This stage is the foundation for what the AI model will be capable of achieving once deployed as it will rely on this information to produce valid responses. However, careful consideration is required to ensure the data is accurate, adequate, and relevant to the system’s purpose.
Issue | Data protection risk | Risk reduction measure |
---|---|---|
Failing to choose an appropriate lawful basis for the AI system’s data processing | Unlawful collection of personal data leading to loss of trust and unfair processing. You risk being in breach of the requirements for lawful processing and transparency under the UK GDPR |
|
Bias in the AI system output | Bias in the AI system decision making caused by insufficient bias analysis resulting in individuals suffering from discriminatory outcomes. You risk being in breach of the UK GDPR’s principle of fairness |
|
Collection of too much personal data to train the AI system | Individuals suffer from unlawful processing of their personal data by the AI system. You risk being in breach of the data minimisation provisions of the UK GDPR |
|
Stage 3: Training and testing
This stage of developing AI is key to creating a model which will be commercially successful. Studies have shown that many AI projects fail and this is often down to this stage of development. The training at this stage gives the AI the ‘intelligence’ it needs to perform as intended. Testing the AI reveals any latent issues which can then be addressed before the product goes to market.
Issue | Data protection risk | Risk reduction measure |
---|---|---|
Function creep by learning algorithm (where the functionality diverges from the intended purpose or use) | The purpose for which you originally collected the personal data is no longer the same as the way the data is now used. You risk being in breach of the purpose limitation restrictions in the UK GDPR |
|
Overfitting (where the learning algorithm becomes too focused on the training data and is not able to generalise and adapt to other situations) | If you do not take steps to ensure an appropriate level of security, individuals may suffer from personal data breaches. You risk being in breach of the UK GDPR’s requirements around ensuring the security of the personal data you process |
|
Security (insufficient security tests) |
If you do not take steps to ensure an appropriate level of security, individuals may suffer from personal data breaches. You risk being in breach of the UK GDPR’s requirements around ensuring the security of the personal data you process |
|
Stage 4: Deployment and monitoring
Deploying the learning algorithm is when you integrate that algorithm into the real world, so it can be used by others. However, deployment should not be the final step in this process. Unlike conventional software, where the code is deterministic and (should) continue to run as written, AI is dynamic and susceptible to changes in the real world. This means you need to keep monitoring your AI after it is deployed to avoid, among other risks, breaching data protection law.
Issue | Data protection risk | Risk reduction measure |
---|---|---|
Undetected model drift (deterioration of the algorithm’s performance over time which goes unnoticed) | If you do not take steps to avoid undetected model drift, the algorithm may make unfair decisions about individuals, in contravention of the UK GDPR’s principle of fairness |
|
Individuals being subject to decisions which cannot be explained to them due to the complexity of the algorithm | There is a risk of breaching the UK GDPR’s principle of transparency, as individuals may not understand the decisions being made about them by the AI |
|
Lack of human review (where there is no meaningful review by humans to interpret the output from the algorithm) | Individuals have the right under the UK GDPR not to be subject to decisions based solely on automated processing (in certain circumstances) |
|
Thoughts to leave you with
With new technology comes new ways to meet (and breach) data protection law. As innovative and increasingly intelligent algorithms are developed and come to market, it is important to ensure that you are not breaching data protection law through the AI you are developing. We have set out some of the key risks above, but this is not an exhaustive list.
Moreover, you will need to manage competing interests when assessing the risks. Some of these competing interests are within the data protection sphere, for example you will need to balance the UK GDPR’s principle of accuracy against the principle of data minimisation, to ensure you have enough data to train accurate AI while not taking more data than you need. Other competing interests span the data protection and commercial spheres, for example there is research suggesting that security risks are increased when you make AI more explainable (which you are required to do in accordance with the UK GDPR’s transparency principle). It is important to note here that having competing commercial interests is not an excuse to avoid being compliant with your data protection obligations.
If you would like advice on navigating the data protection laws which apply to your learning algorithm, we are here to help. Please contact Daisy Fulton or Ludo Lugnani for further assistance.