AI Bias: What is it and how can it be avoided?
The risk of bias in AI systems is widely recognised. Systems trained on biased data can reproduce that bias in their algorithms which creates unfair outcomes when those systems are used to make decisions.
Most approaches to testing for bias rely on access to demographic data about individual users. For example, to test whether a recruitment system unfairly disadvantages applicants in certain age groups you would need to know the age of applicants to test whether they correlate with unfavourable outcomes.
Accessing demographic data can be a challenge for developers. A recent report from the Centre for Data Ethics and Innovation (CDEI) highlights some innovative approaches, and the potential for a new ecosystem of solution providers to tackle that challenge in the UK. These include:
Data intermediaries are organisations that facilitate sharing demographic data. The CDEI report focuses on intermediaries that act as stewards of demographic data. These intermediaries could facilitate bias audits in a couple of ways:
- they could provide a developer with access to demographic data about individuals in a controlled way, so it is only used for the bias audit; and
- they could store users’ demographic data and conduct the bias audit themselves so that the AI developer only receives the results of the audit and never has access to the underlying demographic data. This provides an additional layer of privacy protection for individuals.
The CDEI report identifies some real world examples of intermediaries along these lines, such as the ONS Secure Research Service and the US National Institute of Standards and Technology Face Recognition Vendor Test programme.
However, the report also notes a lack of organisations offering these kinds of data intermediary services. A market for these services has not yet developed in the UK, although the UK government has committed to support the development of an intermediary ecosystem.
Proxies for demographic data
When AI developers cannot access demographic data one solution is to use proxies instead. In short this means using existing data to infer demographic data. To give a basic example, a developer might use forename as a proxy for gender.
Inferring data raises its own ethical, legal and practical challenges. Accuracy of inferences is a particular concern. The CDEI report explores these challenges in detail but suggests that in certain circumstances, proxy data might be a viable solution for bias detection so long as robust safeguards and risk mitigations are in place. The report suggests that developers using proxy data should:
- establish a strong use case for the use of proxies as opposed to direct demographic data;
- select an appropriate proxy method, considering guidance from the Information Commissioner’s Office, the risk of model drift, and the feasibility of testing the accuracy of the method; and
- implement robust safeguards and mitigations, for example measures to inform individuals about how their data will be used, and implementing privacy enhancing techniques.
The CDEI report notes that many of the most popular proxy methods and tools have been developed in the US. Developers in the UK thinking about using those tools will need to consider the different approach to data protection law here when evaluating those tools.
As AI systems proliferate the need to test and mitigate their biases will only grow. Access to demographic data is a significant challenge in achieving that, but the challenge could be overcome if the UK creates the right regulatory and commercial environment. The CDEI report suggests that government recognises the opportunity to create an ecosystem of data intermediaries and similar services to meet this growing need.
Until that ecosystem is more developed collecting demographic data directly from individual users will often remain the best option. Data protection laws do not necessarily prevent that either; with the right compliance approach the law acts more like safety rails than a barrier to innovation.