Blog Post 7: Bias Blog 1
Published on:
Website:
AI’s Regimes of Representation: A Community Centered Study of Text-to-Image Models in South Asia
What is the goal of this case study?
This case study aims to highlight the disparity and lack of representation of South Asian cultures in AI text to image models, and get the perspectives of South Asians on this issue. They focus on the misrepresentations they see when they prompt AI for elements of their culture, and how that makes them feel.
Discussion:
- What does cultural representation mean to you, and how might this definition and your experience with past representation impact how you would evaluate representation of your identity in generative AI models? What aspects of your identity do you think you would center when evaluating representations in AI model output?
- Cultural representation, to me, means showing the things you can’t just see - showing off the traditions, foods, voices, and perspectives unique to that community. It is quite hard to catch and show all this briefly. For full cultural representation, time and energy has to be put into learning and experiencing the culture. To evaluate AI’s effectiveness at representing my identity, I would look at the small things, that have a major impact on what I am, and what my culture means. I would look at foods and get togethers AI tries to recreate, and see how much I see my own experience in them. I would center how and when get togethers happen in my culture, and what they look like.
- What do you think is the role of small-scale qualitative evaluations for building more ethical generative AI models? How do they compare to larger, quantitative, benchmark-style evaluations?
- Large scale tests almost always are created by western companies on western datasets, meaning the representation of western culture is the accuracy that matters most. If a model is always wrong about one culture, but that culture only makes up 1% of the dataset, it appears to be 99% accurate. These small-scale evaluations, on the other hand, have a much deeper impact - they have the capacity to bring light to the problems with the larger datasets, and try to ensure equity for everyone who could use the product. My colleague this summer created a training dataset for his facial detection software, and it performed quite well testing on white subjects. When he tested it on a standard large dataset, it performed well, but through further testing, he realized it was much more accurate on white subjects, showing how these qualitative evaluations are necessary to ensure equal and equitable representation.
- Participants in this study shared “aspirations” for generative AI to be more inclusive, but also noted the tensions around making it more inclusive. Do you think AI can be made more globally inclusive?
- I think it would be amazing if we could simply shift our dataset to be much more neutral and inclusive, but most of the data we have is on the western markets. The easiest way to balance this is to reduce the total size of the dataset to have better proportions, but then we run into the problem that our dataset is smaller! Realistically, I think AI can be made more globally inclusive, but I think that goal will take a while, as we find other ways to expand the inclusivity and suitability of these models.
- What mitigations do you think developers could explore to respond to some of the concerns raised by participants in this study?
- I think developers can focus their training on more South Asian works and descriptions made by South Asians, instead of the western outside perspective. Similarly, developers should add more importance on the small details of cultures, and add protections against portraying cultures as worse and more rustic than they actually are. Because human art takes more time and personality, there is an intention to the art, and that intention allows the artist to ensure the art portrays what they want it to. Developers could add similar checks to the generated imagery the model creates, instead of spitting out an image from a black box, and saying that is that.
- As mentioned in this case study, representation is not static; it changes over time, is culturally situated, and varies greatly from person to person. How can we “encode” this contextual and changing nature of representation into models, datasets, and algorithms? Is the idea of encoding at odds with the dynamism and fluidity of representation?
- I think encoding is definitely at odds with representation. Because you need a lot of data, and a lot of time to train a model, you end up with plenty of occasions when the pace of data being available falls incredibly behind the change in cultures and representation, and can’t catch up. To try to get back some of this variability, we need to start thinking about some sort of expiration date for the data - like an LSTM would have. That being said, the reduction in data will always have adverse effects on the general quality of the model. With our current technique of making models better, which is just throwing more data on the problem, we can’t really make any more headway on pliability of representation using this. If we actually make the techniques better, we could get to a point where we get good results without losing dynamic ability of our model by having a giant dataset.
- How can we learn from the history of technology and media to build more responsible and representative AI models?
- We have constantly had privacy and accessibility issues with AI and every technology as it has been developed. When we look at moderation and laws around social media and the management there, we see lessons we can bring forth for AI as well. The introductions of rules that solidifies and creates full rules to ensure that we have proper representation and ethically accurate AI models, and allows for action to be taken when that is not followed. We’ve seen this help in social media and other prior technology cases, so I believe we will also see success if we put in place similar rules on AI.
My Question:
What limitations do you see in this work? How would you build on this study?
Why?
Since they only had 36 participants, and their requirements for participants is simply some cultural knowledge of their country, which could mean they don’t have enough experience to note some subtlety in the cultural prompts, and 36 participants could be limiting. I want to know what others think about these limitations, and what they would improve.
Reflection:
This was a really interesting case study to read. I realized through doing this that I definitely take my representation and the accuracy at which AI recreates my culture for granted, and AI has been disproportionately been developed with people like me in mind. They have less elements of other cultures in their datasets, and you can clearly see the effects.
