Social Safety in Games
Moderating voice chat in the metaverse.
Long before the word “metaverse” started to be thrown around, video game developers were realizing a core truth: Games—which had started decades before as unique, curated experiences—were becoming customizable, continuously evolving, locations where friends chose to gather and socialize.
This transformation of games from activities to social environments has been good for users who took advantage of these venues when COVID forced us to meet virtually. It has been good for the industry, which has seen continued rapid growth in recent years to record-breaking sizes. And it has been good for the individual studios that see retention, playtime, and engagement reach unprecedented levels, thanks to the strength of the communities forming in and around these games.
That is not to say these changes do not come with challenges, though. Many players fear participating in the wider social ecosystem due to a persistent specter: toxicity.
Let’s be clear: Toxicity is not the same as friendly trash talk. It is perfectly fine, when gaming with your friends, to rag on each other, toss a few swear words around, or otherwise poke fun. What is not okay is when someone tries to groom a child for sexual exploitation, to radicalize a disaffected teen, or to jump in with a group of strangers and begin spewing racial epithets while barely even pausing to take a breath. It is this latter type of behavior we mean when we say “toxicity.”
This kind of severe toxicity is, thankfully, not the norm. Various studios and platforms consistently find that only between 2 and 5% of players persistently engage in toxic harassment or hate speech, and only a tiny fraction of a percentage of users are pedophiles or extremists. But unfortunately, this tiny minority manages to impact a much larger fraction of players. Roughly 77% of adult players encounter severe toxicity in games, according to the Anti Defamation League, with women, children, and other members of underserved demographics often experiencing the brunt of this harmful behavior.
Why is this problem so extreme in the online world? Many folks argue that the game content itself must be to blame—surely the violence of, say, Call of Duty begets violence in other forms? However, this has been summarily disproven in the literature [Elson and Ferguson 2013]. Others note that the norms in these online spaces, and games specifically, are not well-defined, and there is little to no consistency across gaming spaces in terms of clear codes of conduct or terms of service. The pseudonymity of the online world is surely also a factor, though anyone who spends time online today knows that little is truly pseudonymous if it is worth the time of authorities to investigate.
But the most salient argument to me is the simple fact that bad behavior is rarely penalized online due to a lack of moderation. Start shouting racial slurs in the middle of nearly any public place will get you tossed out; do it inside of your favorite video game and, unless you have been matched with one of the rare players who submit actionable offense reports, what is really going to happen to you?
The story gets more complicated when you factor in voice chat during games. In text chat, we have long had powerful tools to analyze what you are saying and, if necessary, filter it. But these filters also teach users how to circumvent them. For instance, if you tried to type the n-word and didn’t get through, then just replace the “i” with a “1” and keep iterating until it works.
Voice moderation is the opposite. It is much more difficult to implement than text filters, but once it has been implemented it is substantially more robust. This is because it is significantly harder to distort voice chat content to “fool” the filters without just distorting the audio until it is impossible to understand in the first place. This fact, as well as the fact that voice is simply more primal and emotionally powerful for most people than text is what makes voice moderation so crucially important. (I run a voice moderation company, though I would argue we only work on this problem because the need is so compelling).
So briefly, let us talk about how one could build a voice moderation solution.
The first idea is to transcribe everything. Put aside the fact that, in doing this, you wouldd lose all the emotion and nuance—which is a crippling problem in its own right—but the biggest problem is that it is expensive. Transcription vendors typically charge over $1 per hour of audio, and top game platforms might see tens or even hundreds of millions of hours each month.
Maybe you try to limit your attention to only a small portion of the audio. Most platforms choose to rely on player reports here. And indeed, this is an important part of any moderation solution. If your players have a bad experience, then they need to have a channel to report it. But it is not on them to clean up the platform, and only 5–10% of users ever hit that report button. On top of that, few people report the more insidious harms like child grooming or radicalization, because those bad guys choose their targets so that they will not even know what is happening.
So here is how I believe you solve the problem.
Imagine bringing your kid to the playground. You’ll probably join the other parents on the sidewalk, chatting about other things, but still watching your kid out of the corner of your eye. This is just enough to see a suspicious adult approaching them, spot them falling off the monkey bars, or notice that the other kids are excluding them. Only after you realize that something is going wrong do you get closer, investigating the situation more deeply and taking appropriate action.
This is how we should moderate voice chat. By analyzing a combination of emotion, prosody, speech patterns, interruptiveness, speaker profiles and history with each other, and a variety of other key pieces of data, it is possible to, in effect, “spot out of the corner of your eye” when a conversation is beginning to take a problematic turn. This allows you to then dig in closer and kick off more expensive analysis, including transcription. At the end of the day, you thus have a detailed, sophisticated understanding of each actual incident, without needing to spend the money (or impose on player privacy) on analyzing everything.
It is simultaneously a simple recipe and a tricky-as-heck process to actually perfect—especially when taking into account the privacy risks inherent to recording any user audio. But with patience and care, it can be done, and by taking on this responsibility, platforms only position themselves to be more successful. And even more fundamentally, we should never forget what games are—they are promises of a magical new environment, unique opportunities for creativity and passion, experiences that would be impossible in the physical world, and, most of all, community. Players play online games because they want to spend time with each other and learn about themselves, and game developers make games because they want to create environments they would like to spend time in.
Safety is not just a responsibility for platforms, it is a core part of the vision.
Mike Pappas is CEO of Modulate, which offers ToxMod, a machine-learning driven voice moderation tool providing the kinds of features described in this piece.
. 2013. Twenty-five years of research on violence in digital games and aggression. Eur. Psychol. 19, 1 (2013).
Share this story. Choose your platform.
Want more updates and information from ACM? Sign up for our Newsletter.
Related Articles
The Toxic Cost of Cheap Usernames
Toxicity in video games, acting in a rude, abusive, bullying, or deliberately losing manner, ruins competitive team-based video game experiences for everyone involved.
August 12, 2024,
What Are the Points of Concern for Players about VR Games
In recent years, the VR boom has signaled fruitful applications in fields such as education, industry, animation, and entertainment.
August 22, 2024,
An Empirical Study of VR Head-Mounted Displays Based on VR Games Reviews
In recent years, the VR tech boom has signaled fruitful applications in various fields.
August 22, 2024,