What even is AI justice? Proposing a Theory of Justice at FAccT 24

By on

In 2022, Politico reported that Crisis Text Line (CTL)–a non-profit SMS suicide hotline–-used one-on-one crisis conversations to train a for-profit customer service chatbot. In response, CTL maintained that they did not violate reasonable expectations—hotline users “consented” to a lengthy Terms of Service (TOS), which specified that CTL could use data for business purposes. 

This uninformed consent procedure and irresponsible use of data is an obviously egregious move in many research circles, such as my own. However, in the eyes of the law, CTL, and many others, CTL didn’t violate any reasonable user expectations. The consensus was that, while TOS consent procedures are imperfect, it’s the cost of having free technology services. In my paper, I ask: are we still okay with paying this price?

The price here is justice—our ability to give everyone their due right. In offline settings, injustice evokes the image of human rights violations and massive inequities. However, the picture is less clear in AI settings. As we all contribute our data, knowledge, and time to AI systems, what do we deserve as a matter of justice? To answer this question, we formulated a precise theory of justice that captures current tensions between users and tech companies in AI/ML settings.

To quote John Rawls, 

“A theory however elegant and economical must be rejected or revised if it is untrue; likewise laws and institutions no matter how efficient and well-arranged must be reformed or abolished if they are unjust.”

In other words, a theory of justice is an attempt to appropriately represent the state of the world, articulate societal values, and inform future change. When we have a true theory of justice, we do not need to rely on subjective moral intuitions. Rather, we can agree upon an ethical compass for building our laws and institutions.

Our paper proposes data agency theory (DAT). Data agency is one’s capacity to shape action around the data they create. For example, individual privacy settings on Google increase one’s data agency by allowing users to opt into (i.e., consent to) data sharing with third-party advertisers. Data agency theory argues two premises. First, consent procedures outline data agency systematically and, therefore, are institutional. Second, inspired by justice scholars such as Rawls and Young, this institutional way of outlining data agency is a matter of justice. In sum, justice in a predictive system demands considering how institutional routines (i.e., consent procedures and terms of services) transform agency at a group level. Concisely, data agency is a contributor to justice and a product of consent policies in a predictive system.  

DAT is a data-centric theory of justice that directly translates to next steps for achieving ML & AI ethics goals. As ML/AI systems need larger datasets, ethicists in the field have made numerous calls for better data management throughout the pipeline—involving questions of consent, data storage, and responsible use of predictive outcomes. However, these calls have been mostly unanswered; many social media sites still use dense Terms of Service agreements that have been criticized for over a decade. Here, we argue that this lack of action to increase a user’s data agency is not just a moral imperfection, it is injustice.

 By raising the stakes of problematic consent procedures, we hope to catalyze action. In the paper, we reimagine consent procedures in two salient ML/AI data contexts: (1) social media sites and (2) human subjects research projects. For example, we imagine affirmative consent on social media sites, sustained efforts by researchers to reaffirm consent, and the ability to withdraw one’s data from benchmark datasets. AI justice demands consent procedures that proactively solve systemic information and power gaps around one’s data. This paradigm shift is crucial to evaluating current consent procedures and generating better ones.

What does a just world with AI look like? How can we evaluate the justice of our AI systems before they cause material harm? Check out our paper for discussions of these questions, or come to my talk at FAccT during the Towards Better Data Practices session at 3:45pm (UTC-3) on June 4th.

PhD Lessons from Running and Escaping Rooms

By and on

The PhD students at GroupLens have a variety of hobbies! From knitting to playing video games, we all have non-research activities that contribute to our lives. In this article, we asked two PhD students, Leah Ajmani and Alexis Tarter, how their hobbies have helped them become more successful researchers. What does distance running and working an escape room have to do with research? Read below to find out

Leah and her dog, Yogi, finishing their first half-marathon

Lessons from Running

As a very unathletic kid, I didn’t pick up running until I was a PhD student. I’ve run numerous community races in the past two years, including a half marathon! Here’s what I’ve learned:

There’s no such thing as “junk miles”

Sometimes our runs suck. Similarly, sometimes our writing sucks, our research is going slow, or we have to miss a deadline. However, there is no such thing as junk mileage in research. All research, even the research that doesn’t end up in papers, builds our capacity to do research. In that sense, it is useful!

You have lots of different paces; the key is to switch between them

You may have heard “it’s a marathon, not a sprint” about your PhD. The advice here is to go slow and not burn out. Running has taught me to extend the metaphor one step further. I have a marathon pace, but I also have a sprinting pace. I even have a 5k and 10k pace for things in the middle! The key to not burning out is to use the right pace at the right moment.

Think about how you would run a marathon. You may have a “marathon pace,” a goal for each mile to run a certain marathon finish time. For example, my goal is to run a half-marathon in under 2 hrs and 15 min. In theory, I would need to run a 10-min mile 13.1 times. In practice, though, my first few miles would be >10 minutes so I can get into a rhythm. Then, each mile gets progressively faster so I can “ramp up.” The idea is not to just run slow. It’s to run slow at the beginning so that you can go ham and truly race in those last few miles. In a PhD, paper deadlines require you to have enough energy left in your tank to race those last few miles, so be judicious with your pacing.

Your mind will want to quit early and often

In running, we say, “Your mind will want to quit before your body does.” Obviously, if you’re injured or battling physical limitations, you should STOP RUNNING. But if the reason you want to stop running is because your mind is telling you to quit, it’s probably best to keep going.

In research, I use the “quit-three rule.” If I’m reading a paper, writing, or even running, I tell myself that I have to think, “I should quit doing this,” three distinct times before I’m allowed to give up on the task at hand. This rule gives me the ability to pivot off of things that simply will not happen at the moment while still building the resilience to do the things I want to. It’s not perfect! Sometimes, I’m phoning in the task, but it’s a good way to practice focus. 

Alexis and friends finishing an escape room!

Lessons from Escape Rooms

I once worked at an escape room, and it turns out it was helpful for my PhD (and it was more entertaining than watching TV). 

Ask for help earlier than you think you need to.

Often, groups would come to complete an escape room and be overconfident about their abilities. Maybe they had completed plenty of rooms in the past. Maybe they really enjoyed solving puzzles. Maybe they just thought they were particularly smart. But more often than not, those groups would perform worse than others. Why? Because they didn’t ask for help early and burned precious time on simple problems. 

The same flawed thinking can occur in a PhD program. Rather than admitting to your advisor or peers that you are stuck, you may be tempted to battle against a problem all by yourself. Don’t be like those overconfident escape room groups! Asking for help and being vulnerable with others can help you tackle a problem and connect with those around you. 

Answers can come from unlikely places.

“Wait…I think I’m Janet!” said my colleague. Turns out a fantastic group had been calling the escape room employee “Janet”, an all-knowing being from the show The Good Place. While, unfortunately, none of us can ask a not-a-girl, not-a-robot character the answer to any question in the universe, we can find answers in places we least expect them. 

The same is true during a PhD program. While courses and your advisors are key sources for support, engaging with experiences that bring you joy is also vital. Maybe the family member you are trying to describe your project to can help you find a way to frame your research question. Maybe the crafting event on campus introduces you to another student with whom you can collaborate. Maybe an intramural pickleball game clears your mind, and you discover the next direction for your dissertation topic. A PhD program is a time to explore not only intellectually but also personally.

Communication is key.

Escape rooms are all about effective communication, whether it is joining hands to close a circuit, yelling out numbers from around a corner, or describing what’s inside a hidden room. It is astounding how many times I’ve seen a teammate find the key to solving a puzzle and put it silently in their pocket. As one of my favorite characters states, you have to “talk it through, as a crew.”Stede Bonnet, Our Flag Means Death. 

And, as I am sure you’ve noticed the trend by now, the same applies to a PhD program! Unfortunately, some lab and campus cultures discourage meaningful connections and collaborations. It is crucial in those situations to find people you can communicate with such as a support office or friends and loved ones. However, in all situations, it is how we talk to ourselves and with others that often determines how successful we can be.

Whether it’s running, escaping, or even knitting, inspiration is everywhere for being a successful researcher! Which hobbies have helped you as a PhD?

The Reddit Blackout: An Initial Exploration with Support-Seeking Subreddits

By on

Reddit is no stranger to conflict between its users, but in its most recent controversy, the company found itself playing the antagonist. In June 2023, Reddit made headlines for being the subject of one of the largest scale protests by users of a site the Reddit blackout. However, not much is known about the effects of the event on Reddit’s communities. This semester, I began exploring how the culture of support-seeking subreddits was impacted by the blackout.

Image from the Wikimedia Commons

The Reddit Blackout

On June 12, 2023, more than 7,000 communities on Reddit went private — making them inaccessible to non-subscribers. The collective disabling and restricting of subreddits is known as a Reddit “blackout.” The decision to organize this blackout was largely made in protest of the company’s decision to charge for API access. Moderators fulfill their duties mainly by relying upon third party apps that were built using Reddit’s API. By fixing a price for the API that popular third party developers could not possibly afford, Reddit was essentially shutting down these apps. Moreover, by shutting down these apps, Reddit was ignoring the needs of moderators, arguably their most important users. 

In order to remind Reddit of their importance, moderators came together to devise a blackout of unprecedented scale. Subreddits went private for 48 hours. Hopefully, the company would recognize how much they needed their moderators and give in to their demand of reducing the price of API access. 

Unfortunately, this objective was not met. Reddit was determined to remain faithful to its business decision and wait out the blackout. Moderators were also unwilling to back down. After the originally planned 48-hour protest period was over, many subreddits remained private. The company then began to antagonize moderators by threatening to remove them and forcing subreddits to reopen. 

While some subreddits remain private even today, the blackout largely came to an unsuccessful end. The company was able to force many subreddits back into some form of normalcy, but community sentiment towards management has never been lower. In a post about the blackout, a moderator said, “I believe that Reddit administration has demonstrated an unsurprising but none-the-less disappointing lack of foresight and understanding of how their website operates… I believe [they do] not understand the value that their unpaid moderators bring to the website” (“[Modpost] Reddit Blackout – What’s Happening Next,” 2023).  Moderators feel that Reddit’s actions have made it abundantly clear how little the company cares about their users’ perspectives and moderator labor. 

This semester, I wanted to understand how the Reddit Blackout affected (1) Reddit as a community nurtured by volunteers and (2) Reddit as a dataset. The following questions guided my work:

  • How do people use Reddit to seek support?
  • What does current research say about the struggles of Reddit moderators?
  • Now that API access is gone, is running a large-scale data analysis about the blackout feasible?

Support Seeking on Reddit

Social support is the receipt of help from others by an individual (Zou 2024). Reddit is a social media platform structured into subject-specific communities (called subreddits) where users can post and interact. This topic-specificity makes Reddit a convenient venue for seeking social support. In fact, it has been praised for hosting certain support-seeking communities, particularly those serving people attempting sobriety (Sowles 2017). Subreddits that support drug recovery are just one of many support-giving communities. Redditors can receive emotional, informational, and tangible social support through the platform (Zou 2024). 

Moderator Labor

Volunteer moderators are an integral part of the culture and maintenance of Reddit. However, moderators’ important labor is often underappreciated due to misconceptions regarding what they do. These misconceptions largely stem from two main problems: (1) the lack of visibility around much of moderators’ contributions (Li 2022) and (2) a heightened focus on controversial tasks that they are seldom directly responsible for (Gilbert 2020). 

Reddit is designed such that comment removal is highly conspicuous (comments removed by moderators are replaced with the text “[deleted]”), promoting the idea that moderators’ main service is censoring users (Gilbert 2020). However, the majority of comment and post removal is actually performed by bots (Li 2022). So while moderators do find and implement technical workarounds, such as bots, they do not typically perform removals themselves. The misconception that moderators censor users results in community backlash and undue emotional burden on moderators (Gilbert 2020). Moderators also have over 64 other non-removal actions they are responsible for (Li 2022). Unfortunately, a recent study found that approximately 43% of their extensive labor is essentially “invisible” (Li 2022). 

The misunderstood and unseen labor of moderators complicates their relationship with Reddit as a company. The value and legitimacy of labor is typically correlated with its level of visibility (Gilbert 2020), so moderators are in a position of disadvantage at the negotiation table with the company (Li 2022). The company’s misconception of the labor of their moderators allows them to neglect the volunteers that keep their platform usable and prioritize investing in what they think will maximize revenue generation. This has been the root cause behind the major “blackouts” that the platform has experienced (Matias 2016). 

Data Exploration and Struggles

Given Reddit’s decision to charge for API access, obtaining subreddit data is not as straightforward as it once was. Fortunately, I was able to find a post download tool that enabled us to retrieve data from several support-seeking subreddits. 

The subreddit I initially chose to focus on was r/Depression. I analyzed a total of 115,093 r/Depression posts from April 2023 to February 2024. As I was looking through r/Depression posts during the original 48 hour period of the blackout, I realized that no one was mentioning the blackout. The subreddit hadn’t participated in the blackout, but I was still surprised that there were zero references to it. Both during and in the week after the blackout, the most frequently used words were the epitaphs “[removed]” and “[deleted].” I wasn’t sure if this meant that discussion around the blackout had been expunged or that there had never been any discussion around it at all. This made it very difficult to find patterns in posts and sentiments from during the blackout by members of the r/Depression subreddit.

(Left) Plot of r/Depression posts by day, in which June 12th, the first day of the blackout, is circled in red. It had the most posts of the month. (Right) Top 10 most frequently used words in posts during the blackout. The epitaphs [deleted] and [removed] were most common.

I then chose to redirect my attention to a subreddit that actually participated in the blackout, hoping this would make lack of discussion around the blackout unlikely. I looked at 29,229 r/SocialAnxiety posts from January to December 2023. There were no posts on the subreddit from during the blackout dates. I was not sure if that meant they all got deleted, or if posts from when a subreddit was private are not accessible via the post download tool. Either way, it was clear I needed to adopt a new approach to understand the blackout’s impact. 

Planned Quasi-Causal Analysis

After conducting some preliminary exploration of Reddit data from the blackout period, I became interested in determining the effects of the blackout on the culture of supporting-seeking subreddits. Specifically, I want to look at the blackout as an intervention and perform a comparative analysis on the culture of these subreddits before and after the blackout. To do so, I am going to use Regression Discontinuity in Time (RDiT), which we have seen applied successfully in similar work on Wikipedia (Hill 2021). 

RDiT is a quasi-causal method that compares regressions before and after an intervention date in order to identify causal effects. The intervention dates will be the initial 48 hour blackout period from June 12, 2023 to June 14, 2023. RDiT is a useful method for the data given that it relies on normalized intervention dates rather than a standard intervention score. RDiT also controls for fluctuations that organically occur over time, ensuring the comparison is strictly in relation to the intervention. This semester, I selected 12 candidate subreddits for analysis.

Selecting Subreddits for Analysis

I want to use data from support-seeking subreddits for my quasi-causal analysis. In particular, I want to analyze support-seeking subreddits across four categories: mental health subreddits that participated in the blackout, mental health subreddits that did not participate in the blackout, non mental health subreddits that participated in the blackout, and non mental health subreddits that did not participate in the blackout. I want to understand whether the intervention had an effect on participating subreddits by looking at how it affected both participating and non participating subreddits. I also want to grasp how the cultures of mental health support-seeking subreddits specifically were impacted by decisions to become private or restricted. After all, these subreddits can be especially critical for people in crises, making their shutdowns potentially more damaging than others’ for consumers of Reddit. 

Table of the 12 subreddits selected for the quasi-causal analysis.

Selecting subreddits within the four categories faced three main challenges: they each had to be support-seeking, have around the same number of members as their counterparts, and be available for research and download. Furthermore, each subreddit had to be investigated to determine whether they participated in the blackout or not. Initially, a post on the r/ModCoord subreddit that listed all blackout-participating subreddits was consulted. From here, after manual inspection of member numbers, the following 6 subreddits were selected that participated in the blackout: r/BPD, r/Autism, r/SocialAnxiety (3 mental health subreddits), and r/Confidence, r/LearnMath, and r/MaleHairAdvice (3 support-seeking but not mental health related subreddits). Manual inspection of subreddits by size on the “Top Communities” pages of Reddit yielded 6 more subreddits. These 6 subreddits did not participate in the blackout: r/CPTSD, r/Lonely, r/MentalHealth (3 mental health subreddits), r/FreeFood, r/NeedAFriend, and r/Texts (3 support-seeking but not mental health-related subreddits). Together, these 12 supporting-seeking subreddits form the dataset upon which I will be conducting my quasi-causal analysis. 

Closing Remarks

The Reddit blackout remains one of the largest and best documented online protests. Although the disruption appeared to have little impact on Reddit’s business decision, its consequences for the people who rely on Reddit’s communities for social support are unexplored. Next semester, I look forward to better understanding how online collective action changes support-seeking communities long term. 

References

Gilbert, S. A. (2020). “I run the world’s largest historical outreach project and it’s on a cesspool of a website.” Moderating a Public Scholarship Site on Reddit: A Case Study of r/AskHistorians. Proceedings of the ACM on Human-Computer Interaction, 4(CSCW1), 1–27. https://doi.org/10.1145/3392822

Hill, B. M., & Shaw, A. (2021). The Hidden Costs of Requiring Accounts: Quasi-Experimental Evidence From Peer Production. Communication Research, 48(6), 771-795. https://doi.org/10.1177/0093650220910345

Li, H., Hecht, B., & Chancellor, S. (2022). All That’s Happening Behind the Scenes: Putting the Spotlight on Volunteer Moderator Labor in Reddit. Proceedings of the International AAAI Conference on Web and Social Media, 16(1), 584-595. https://doi.org/10.1609/icwsm.v16i1.19317

Matias, J.. (2016). Going Dark: Social Factors in Collective Action Against Platform Operators in the Reddit Blackout. 1138-1151. 10.1145/2858036.2858391. 

Morrison, S. (2023, June 20). Reddit blackout: What is it and why are subreddits going dark? Vox. https://www.vox.com/technology/2023/6/14/23760738/reddit-blackout-explained-subreddit-apollo-third-party-apps

Peters, J. (2023, June 30). How Reddit crushed the biggest protest in its history. The Verge. https://www.theverge.com/23779477/reddit-protest-blackouts-crushed

R/3d6 on Reddit: [Modpost] Reddit Blackout – What’s Happening Next? Reddit. (n.d.). https://www.reddit.com/r/3d6/comments/14bbp9d/modpost_reddit_blackout_whats_happening_next/ 

Sowles, S. J., Krauss, M. J., Gebremedhn, L., & Cavazos-Rehg, P. A. (2017). “I Feel Like I’ve Hit the Bottom and have no Idea what to Do”: Supportive Social Networking on Reddit for Individuals with a Desire to Quit Cannabis Use. Substance Abuse, 38(4), 477–482. https://doi.org/10.1080/08897077.2017.1354956

Zou, W., Tang, L., Zhou, M., & Zhang, X. (2024). Self-disclosure and received social support among women experiencing infertility on reddit: A natural language processing approach. Computers in Human Behavior, 154, 108159-. https://doi.org/10.1016/j.chb.2024.108159

How do relationship conflicts look from the other side? Here are answers from body-swapping in VR

By on

Most people have ample experience with personal conflicts, whether it be a disagreement with your significant other, your mom, or just a really close friend. And most would agree that they are extra tricky to deal with: as seen in the 4-panel comic above, the real issue in this couple’s argument is not actually about the pizza. Just like how for arguments over who does the dishes at home, it’s usually not just about the dishes. Personal conflicts can involve differences in perspective that run deeper in the relationship and are hard to resolve via surface-level conversation.

To really enable a change in perspective for those stuck in personal conflict, we propose and evaluate an autobiographically-accurate retrospective embodied perspective-taking system based in VR that enables users to immersively re-experience a past conflict interaction as their partner, essentially
“body swapping”:

We conducted a mixed-methods controlled study with 26 couples to compare the types of insights and changes in conflict behavior evoked by our “body swapping” approach to the current industry practice of video recall—rewatching footage of both partners in a conversation.

We found that the experience of retrospective embodied perspective-taking led individuals who were in conflict with their significant other to develop transformative insights constituting major changes in opinion about their partner, themselves, and even the issues of conflict. One woman mentioned how the experience changed a negative view she had of her husband which had persisted throughout 10 years of their marriage prior to the study:

“I found a lot of value in watching his hands. My husband does a lot of repetitive hand movements when he’s nervous, and it tends to frustrate me, and make me feel like he is uncomfortable with what I’m saying. Watching him do it from his perspective, I felt uncomfortable vs. frustrated. Seeing myself talk to him the way I did, I can now understand why he would make those kinds of gestures because even ‘I’ was nervous with how absolute and sure I was when speaking to him.

I think my biggest realization is that I thought my husband was the major reason that we had trouble communicating. And while he might not like conflict, I spend a lot of time saying what he’s doing, versus what I’m doing. I have taken this approach to this conversation so many times, and hearing/watching myself from this point of view makes me think about how many times my partner has been on the receiving end of me pointing out things and for me, doing that it felt like, here we go again, but not from my standpoint, from his standpoint — of like, here she goes again.

Our findings showed that addressing personal conflicts isn’t always about talking through the details of an issue — VR-enabled body swapping can help people understand what others are actually thinking and experiencing, which gets at the personal perspectives at the core of conflict in close others.

Want to see the full story on how embodied perspective-taking impacts conflict in close relationships? Check out our paper, or come watch my in-person talk on May 13, 2024 at 4:30pm Hawaii time!

Seraphina Yong, Leo Cui, Evan Suma Rosenberg, and Svetlana Yarosh. 2024. A Change of Scenery: Transformative Insights from Retrospective VR Embodied Perspective-Taking of Conflict With a Close Other. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI ’24), May 11–16, 2024, Honolulu, HI, USA. ACM, New York, NY, USA, 18 pages. https://doi.org/10.1145/3613904.3642146

Reflecting on Consent at Scale

By and on

image by Freepik

In the era of internet research, everyone is a participant. Picture this…

A PhD stood at the front of a crowded conference hall.  They’d just presented their paper on social capital in distributed online communities. As the applause settled, an audience member scuttled to the microphone, eager to ask the first question.

A professor from University College. Thank you for the great talk. It was refreshing to attend a talk with such rigorous methods. You scrapped data from so many different subreddits and made such a compelling argument for how these results will generalize to other online spaces. My question is less about the research and more about your experiences with data contributors. How did the various subreddit community members react when you talked to them about this exciting work?

What kind of question is this? The PhD thinks to themself. It’s not feasible to get consent from every user. We got an IRB exemption, got approval from subreddit moderators, and followed all the API terms of use and regulations for researcher access. Do other researchers really ask for consent at scale? Did I get consent…?

You may be in a similar situation now! Using social media data for research is a common method that has massive potential for large-scale analyses in both quantitative and qualitative research. However, it can be frustrating to simultaneously hold individual, affirmative consent as the golden standard and recognize its limitations as a viable option for many researchers. To that end, we’ve made a reading list about getting individual consent at scale, particularly in research settings. We hope this reading list serves as a provocation for discussion rather than a list of solutions to this problem.

Normative Papers

1. The “Ought-Is” Problem: An Implementation Science Framework for Translating Ethical Norms into Practice. Our resident ethicist (Leah Ajmani) loves this paper so much! It basically uses informed consent as a case to describe the larger translational effort needed to move from normative prescriptions to actual implementation.

2. Yes: Affirmative Consent as a Theoretical Framework for Understanding and Imagining Social Platforms. A contemporary classic in CHI,  this paper does a really good job of describing affirmative consent as the ideal situation but then using the “ideal” for explanatory and generative purposes. There is merit to having an ideal, even if it is not perfectly attainable!

HCML Papers

We’re obviously biased because she’s a GroupLenser, but Stevie Chancellor does a great job at describing consent at scale as an ethical tension rather than a “must-have.” It is something researchers need to navigate with justified reasoning.

1. A Taxonomy of Ethical Tensions in Inferring Mental Health States from Social Media

2. Toward Practices for Human-Centered Machine Learning

Design Papers

These papers are both critical of current consent design and do a great job of discussing alternatives, even if it is outside of a research context.

1. (Un)informed Consent: Studying GDPR Consent Notices in the Field

2. Limits of Individual Consent and Models of Distributed Consent in Online Social Networks

From grappling with moral nuance to designing better consent procedures, these readings can take our discussions of individual consent at scale from a theoretical ideal to an operationalizable goal. So, let’s embrace difficult discourse about how to move forward and continue to traverse the space between the idyllic and the feasible. Comment or tweet which papers you would add to this list!