Blog Post 5: Society & Tech 1

Published on: September 24, 2025

Blog Post:

Podcast:

Databite No. 161: Red Teaming Generative AI Harm

I just listened to the Databite podcast on Red Teaming GenAI models. Red Teaming is a cybersecurity term, that means trying to break the software by pretending to be a bad actor. In cybersecurity, this is generally trying to hack into your own software, to test the defenses you have created. In AI, this is the broader concept of trying to get the AI to react in a manner you don’t want it to be able to act in. On the podcast, host Borhane Blili-Hamelin had guests Tarleton Gillespie, Camille Francois, Brianna Vecchione, and Lama Ahmad. Tarleton is a Critical Sociologist working for Microsoft Research, and has been researching and questioning what happens when these media companies get more humanitarian responsibilities, and how the evolution of AI into more spaces becomes a problem for developers. Camille Francois, a professor at Columbia University, has done several talks and research on how we can use cybersecurity methods, such as Red Teaming, to protect from AI harm. Also on the board is Brianna Vecchione, a tech researcher investigating auditing and accountability in regards to social justice, engagement, and policies. Finally, the last guest on this episode of the podcast was Lama Ahmad, a tech program manager at OpenAI, who started using Red Teaming before the introduction of ChatGPT, and focuses on the human side of Red Teaming. Most of their discussion revolved around the cultural aspects and affects of this Red Teaming, but they also discussed the political aspects around policies on it.

The main discussion that stood out to me as a listener of this podcast was the variance in what companies are doing with Red Teaming, and the different methods in place. They discussed that there were three main groups doing Red Teaming: the AI labs, who were doing local Red Teaming within their companies, the Vendors, who were outside companies and organization that the AI companies would use to perform the Red Teaming for them, and the Enthusiasts, who were hobbyists, trying to break models for fun, and help improve the security of their community. Each of these groups used different techniques to perform the work, from automated tests and algorithms to actual humans intending to break the models. An interesting problem was brought up during this discussion that really got me thinking - on top of all of the semi-seasonal jobs of labeling the AI data, where workers are treated terribly, and the majority of them are doing it solely because they need a job, the jobs of doing this Red Teaming sometimes fit the same situations. Because there is no stadardization of these jobs, we have no clue what these companies are doing, and what all the different implementations of these jobs are like. Tarleton was asked about the labor part of Red Teaming, and he mentioned that there were mental health risks to these activities, and connected it to content moderation on social media platforms. In a similar way, we are promised success without any transparency of how it gets done. According to Tarleton, this makes it hard to have any agreements on the accountability of the platform doing the regulation, or in this case the Red Teaming. On top of that, he discussed how easy it is to politicize these unknown methods - if someone gets something bad on their feed that got through the moderation, they can simply blame it on the company - even if it ins’t there fault, no one knows enough about how it works for the public to know/care.

This is one of the big reasons I think this needs to be discussed now. We are finally getting these larger AI models that can start doing more, and our employers and societies are starting to make it more and more the norm to use them. All at once, the theoretical problems we discussed before are becoming real problems people face. The minority groups that AI has been biased against now are starting to see actual damage come from it. Similarly, every potential bad message before is now as public as ever, and reducing these reactions is forefront in every AI lab’s mind.

The culmination of the discussion of multiple different techniques of performing this Red Teaming, not surprisingly, is the need, and lack of, a standard way to do it. While there was a presidential order that provided some broad guidelines on model regulation in general, it has since been walked back, and there exists no standard or guidelines at all any more. They discussed some of the advantages of this, that the disperse techniques going into place as a result will result in faster and more creative innovation in testing methods, and theoretically allows them to converge naturally on the optimal method. I can definitely see this as an advantage, but I think the wide variety of methods will lead to many companies doing nothing, or participating in inhumane practices that will reduce cost. Because of this, I believe this regulation and standardization is necessary, and that leads me to the question I would ask them: What would standardization guidelines look like for Red Teaming and model testing? I don’t honestly know if they would even be able to answer - there was nothing in the podcast about the nitty gritty of what they are doing, and I can imagine everyone there and everyone working on this testing would have very different ideas of what the best practices are. That being said, I think it is crucially important to keep these companies in check, especially as these AI models get better and better.

This podcast was particularly important to me, because even before these AI models have started doing well, my sister has been fascinated with AI, and training and aligning these AI models in particular. She has spent a long time worrying about what will happen with them, and what happens when we get it wrong. Because of her fear and worry, and the amount of time she has spent researching this, I have always been concerned as well. This podcast was helpful to ease my worry on this a bit, but more learn more about the state of things now, and what is being done.

This was a long one, but it was a quite a good podcast, so I had a lot to say. It was one of the more fur blog posts I have done, and I think this post reflects that. I also wrote this in Obsidian, the note taking app, and I am really liking it. If I didn’t have to handwrite most of my notes, it would be perfect for me.

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Will Dirkswager

Blog Post 5: Society & Tech 1

Blog Post:

Podcast:

Share on

You May Also Enjoy

Blog Post 12: Final Blog

Blog Post 11: Privacy Blog 1

Blog Post 10: Info Blog 2

Blog Post 9: Info Blog 1