Skip to main content
CLOSE

Charities

Close

Corporate and Commercial

Close

Employment and Immigration

Close

Fraud and Investigations

Close

Individuals

Close

Litigation

Close

Planning, Infrastructure and Regeneration

Close

Public Law

Close

Real Estate

Close

Restructuring and Insolvency

Close

Energy

Close

Entrepreneurs

Close

Private Wealth

Close

Real Estate

Close

Tech and Innovation

Close

Transport and Infrastructure

Close
Home / News and Insights / Insights / Regulator consults on web scraping to train generative AI

The UK’s data protection regulator has launched a consultation on generative AI. The Information Commissioner’s Office will have a key role in regulating AI in the UK, so this consultation is a useful signpost for the direction the ICO is heading, as well as an opportunity to influence it.

The first phase of the consultation focuses on the use of web scraping to train generative AI models. AI models are often trained on data scraped from publicly accessible web pages. The developers either scrape the data directly or use databases compiled by third parties who have used the same method.

There are ongoing legal claims relating to AI and web scraping, but they have mostly focused on copyright infringement issues. Getty Images’ ongoing claim against Stable Diffusion is one example.

The ICO’s consultation is about data protection compliance and the issues raised when personal data is scraped from the web for AI training purposes. The ICO shares its initial thoughts about whether there is a lawful basis under UK data protection laws to use personal data in that way.

The ICO’s initial conclusion is that yes, there is potentially a lawful basis. However, the ICO seems to have some serious reservations about that.

Developers will need to show that they have a ‘legitimate interest’ in using the personal data to train the model. Significantly, the ICO seems to think wanting to build a model is not a legitimate interest in itself; whether the interest is legitimate might depend on what the model is intended to be used for.

The consultation repeatedly highlights the risks of individuals losing control of their data once it is used to train an AI model, as well as the potential risks to those individuals from subsequent use of the model. Those risks have to be balanced against the interest in training the model.

The consultation also highlights that the risks to individuals may differ depending on how a developer makes the model available to third parties:

  • Developers who make their models available via API can potentially exercise greater technical control over how the model is deployed. For example, the API could be designed to prevent the model from responding to the types of queries most likely to cause data protection risks.
  • Developers who adopt an open-source model have few options for controlling how it is used. If developers simply rely on their contracts with customers, the ICO will apparently expect to see evidence that the contracts are complied with.

The consultation concludes that developers using personal data from web scraping will need to:

  • Evidence and identify a valid and clear legitimate interest.
  • Consider the potential impact on individuals’ rights particularly carefully when they do not or cannot exercise meaningful control over the use of the model.
  • Demonstrate how the interest they have identified will be realised and how the risks to individuals will be meaningfully mitigated, including their access to their information rights.

You can read the consultation and submit your views here.

Related Articles

Our Offices

London
One Bartholomew Close
London
EC1A 7BL

Cambridge
50/60 Station Road
Cambridge
CB1 2JH

Reading
The Anchorage, 34 Bridge Street
Reading RG1 2LU

Southampton
Grosvenor House, Grosvenor Square
Southampton SO15 2BE

 

Reading
The Anchorage, 34 Bridge Street
Reading RG1 2LU

Southampton
Grosvenor House, Grosvenor Square
Southampton SO15 2BE

  • Lexcel
  • CYBER ESSENTIALS PLUS

© BDB Pitmans 2024. One Bartholomew Close, London EC1A 7BL - T +44 (0)345 222 9222

Our Services

Charities chevron
Corporate and Commercial chevron
Employment and Immigration chevron
Fraud and Investigations chevron
Individuals chevron
Litigation chevron
Planning, Infrastructure and Regeneration chevron
Public Law chevron
Real Estate chevron
Restructuring and Insolvency chevron

Sectors and Groups

Private Wealth chevron
Real Estate chevron
Transport and Infrastructure chevron