Blog Post 8: Bias Blog 2

Published on:


Website:

Understanding Potential Sources of Harm throughout the Machine Learning Life Cycle


What is the goal of this case study?

This case study discusses the biases and sources of harm that can arise in ML situations, and begins the discussion on their mitigation. It aims to provide a framework to discuss and address these biases, and hopes that future works will be able to list the types of biases they address.


Discussion:

Questions:
  1. How might these sources of harm manifest in a project you are currently working on, or have worked on in the past?
    • I worked on creating an ML algorithm and model to predict if messages sent by autonomous vehicles are malicious. I used data generated by a simulation in Luxemburg to simulate the messages, and based all of my training off of that. Since I would be applying this in the US, there would be some representation bias present, as I would be applying this to a very different environment where it was simulated. Thus, the target population does not reflect the use population. There would definitely also be measurement bias, as our dataset was not from real simulations, and was only one set of simulations, so we probably chose features that were significant in that environment, but not normally.
  2. Can you think of other sources of harm that aren’t captured in this case study?
    • I think a large source of harm with ML techniques is that they numbers listed for their accuracy and effectiveness are misleading. We see 99% or something like that, and we decide that we can trust it fully. The small percentages lead to us reducing the abilities and facilities to contest the decisions. This combines with the biases and can result in unfair treatment of minorities without the ability for that to be contested or disputed. With humans, we all believe we have the capacity to make mistakes, and when we see the high accuracy numbers of AI models, we think we don’t have to worry about them making mistakes, because they barely ever do. But we still need to have facilities for when they inevitably make mistakes.
  3. Think about a specific ML-based project or product (something that you’ve worked on, used, or are familiar with) and think about how you might conduct each step of the data collection, development, and deployment process while being conscious of potential sources of harm.
    • I am designing an AI to perform facial recognition. I would collect date equally representative of the population of the earth, and make sure to include an equal subset of that in the testing dataset. I would ensure that the model performs adequately on every subset of the population, to ensure there are no oversights present there. Then, I would have an independent party validate these tests, either though a peer reviewed paper or a third party company. Finally, upon deployment, I would make sure it is placed into environments that use it as a nice to have, like remembering your passcode on your laptop, or other situations where it slightly inconveniences you when it doesn’t work, but there are still other options in place to ensure that you don’t have to enlist to the facial detection in the first place, and if it doesn’t work, you don’t miss out on anything. This will ensure that if any problems or biases occur, the resulting harm will be minimal, and we can address it without fully inhibiting a subset of the population.

My Question:

Which of these sources of harm do you think is the most important? Which should we aim to address first? Why?

Why?

It is unrealistic to try to solve every problem in one fell swoop, so it would be great to have a list or order to start addressing these concerns. That way, we can send the list to companies and organizations, and they can work through it to have the greatest effect, and solve the biggest problems first.


Reflection:

After watching the movie, I am impressed by just how imperfect these systems are, while we think of them as perfect mathematical devices. Reading this case study, I see that even more. There are so many ways that developers imbue their beliefs and biases into ML algorithms without knowing it, and propagate their biases further. That is what struck me most about this reading, and I greatly appreciate it.