Will We Ever Be Able to Prevent Sexist Credit Cards or Racist Chatbots?
The promise of AI-based systems coupled with machine learning is a challenge for quality assurance. On the one hand, we are talking about lots of data from different sources, analysed with complex (and very often secret) algorithms that then influence other algorithms.
On the other hand, exactly these data sources serve as input for the learning mechanisms of these algorithms, and we have all seen the news about sexist credit cards (gender bias while determining creditworthiness), racist chatbots (teaching chatbots to spew out anything) and moral machines for autonomous cars (or any decision-making engine).
From Micro-bias and unintentional slights to plain out ill-intentions – all play a role when developing (and testing) these concepts. Take into account that you need to test for black swan events and it’s clear that we are not only talking about marriage-ending recommendations by Netflix (“Honey, I really can’t explain why Netflix would suggest that…”) or ads you’d rather not see, but systems with the potential to ruin a reputation, finances or even health.
Which solutions or concepts exist today to test AI-based systems? What do we need in the future to be ready for this? And what is the standards world up to tackle these issues?