Michael Calore: So the arguments in both of these cases have already been made. The court has heard them. They’re not going to release the decisions for months, many months. We do know how the arguments were made by the lawyers, and we know what questions the justices asked. So is there any way to foreshadow or predict whether the rulings will be drastic? No big deal? Somewhere in between?
Jonathan Stray: So from the questions that the justices were asking on the first case, the Gonzalez v. Google case on Section 230 specifically, I think they’re going to shy away from making a broad ruling. I think it was Kagan who had this line, “We’re not the nine greatest experts on the internet,” which got a big laugh by the way. And what she means by that is, it was part of a discussion where she was asking, “Well, shouldn’t Congress sort this out?” I think that that’s honestly the answer here. In fact, there are a bunch of proposed laws in Congress right now which would modify Section 230 in various ways, and we can talk about which of those I think make sense and which ones don’t. But I think the court would like to punt it to Congress, and so is going to try to figure out a way to either dodge the question entirely, which they might do, because if you answer no on the second case, on the Taamneh case, and say, “Well, even if they’re not immune under Section 230, they are not liable if they were trying to remove terrorist content and didn’t get it all.” And so that would allow them to just not rule on that case. I think that’s a reasonably likely outcome. I think they would like to find some way to do that, but who knows.
Lauren Goode: All right, Jonathan, this has been super helpful background. We’re going to take a quick break and then come back with more about recommendation systems.
[Break]
Lauren Goode: So, Jonathan, you’ve been researching recommendation systems for years, and obviously this is a space that evolves a lot. It’s a relatively new area of tech. We’ve maybe only been experiencing these for 20 years or so, and a lot of research has been done, but recently a new paper was published that said that some of the previous work around the extreme content on platforms like YouTube and TikTok might have been “junk”—that the methodology in this research has been problematic. Can you explain this? And also, does this mean that our worries about extreme content are all over and we can just go back to the internet being a happy place?
Jonathan Stray: Right.
Lauren Goode: That was a hyperbolic question. Yeah,
Jonathan Stray: Right. OK. Well, I may have been a little hyperbolic in “junk,” but OK. So I’m an academic, which means I have the luxury of not needing to root for a particular side in this debate, and I can take weirdly nuanced positions around this stuff. Basically the problem is this: There’s all kinds of things that could be the bad effects of social media. It’s been linked to depression, eating disorders, polarization, radicalization, all of this stuff. The problem is, it’s pretty hard to get solid evidence for what the actual effects of these systems are. And one of the types of evidence that people have been relying on is a type of study which basically goes like this: You program a bot to watch … Let’s say if you’re doing YouTube. You can do this on TikTok or whatever. You program a bot to watch one video on YouTube, and then you’re going to get a bunch of recommendations on the side, up next, and then randomly click one of those, and then watch the next video and randomly click one of the recommendations after that. So you get what they call a “random walk” through the space of recommendations. What these kinds of studies showed is that a fair number of these bots, when you do this, are going to end up at material that’s extreme in some way. So extreme right, extreme left, more terrorist material. Although the really intense terrorist material is mostly not on platforms, because it’s been removed. OK. So this has been cited as evidence over the years that these systems push people to extreme views. What this paper which came out last week showed—and this is a paper called “The Amplification Paradox and Recommender Systems,” by Ribeiro, Veselovsky, and West—was that when you do a random walk like this, you overestimate the amount of extreme content that is consumed, basically because most users don’t like extreme content. They don’t click randomly, they click on the more extreme stuff less than randomly. So as an academic and a methodologist, this is very dear to my heart, and I’m like, “This way of looking at the effects doesn’t work.” Now, I don’t think that means there isn’t a problem. I think there’s other kinds of evidence that suggests that we do have an issue. In particular, there’s a whole bunch of work showing that more extreme content or more outrageous or more moralizing content or content that speaks negatively of the outgroup, whatever that may mean for you, is more likely to be clicked on and shared and so forth. And recommender algorithms look at these signals, which we normally call “engagement,” to decide what to show people. I think that’s a problem, and I think there’s other kinds of evidence that this is incentivizing media producers to be more extreme. So it’s not that everything is fine now, it’s that the ways we’ve been using to assess the effects of these systems aren’t really going to tell us what we want to know.