325
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Does algorithmic content moderation promote democratic discourse? Radical democratic critique of toxic language AI

&
Received 14 Aug 2023, Accepted 19 Mar 2024, Published online: 13 May 2024

ABSTRACT

Algorithmic content moderation is becoming a common practice employed by many social media platforms to regulate ‘toxic’ language and to promote democratic public conversations. This paper provides a normative critique of politically liberal assumption of civility embedded in algorithmic moderation, illustrated by Google’s Perspective API. From a radical democratic standpoint, this paper normatively and empirically distinguishes between incivility and intolerance because they have different implications for democratic discourse. The paper recognises the potential political, expressive, and symbolic values of incivility, especially for the socially marginalised. We, therefore, argue against regulation of incivility using AI. There are, however, good reasons to regulate hate speech but it is incumbent upon the users of AI moderation to show that this can be done reliably. The paper emphasises the importance of detecting diverse forms of hate speech that convey intolerant and exclusionary ideas without using explicitly hateful or extremely emotional wording. The paper then empirically evaluates the performance of the current algorithmic moderation to see whether it can discern incivility and intolerance and whether it can detect diverse forms of intolerance. Empirical findings reveal that the current algorithmic moderation does not promote democratic discourse, but rather deters it by silencing the uncivil but pro-democratic voices of the marginalised as well as by failing to detect intolerant messages whose meanings are embedded in nuances and rhetoric. New algorithmic moderation should focus on the reliable and transparent identification of hate speech and be in line with the feminist, anti-racist, and critical theories of democratic discourse.

Introduction

Social media are deeply intertwined with everyday political behaviour. A large proportion of people consume news and are exposed to political advertising on social media and online forum discussions are becoming common practices (Gil de Zúñiga & Diehl, Citation2019). However, there are growing concerns about the proliferation of online harm, abuse, and hate speech that are claimed to threaten the maintenance of the Internet as a healthy, inclusive, and safe place for all kinds of users (Gagliardone, Citation2019).

Against this backdrop, many social media platforms are partaking in diverse forms of content moderation (Gorwa et al., Citation2020). Following Gorwa et al. (Citation2020), this paper defines content moderation as ‘systems that classify user-generated content based on either matching or prediction, leading to a decision and governance outcome’ (p.3) and focus in particular on hard moderation including removal, blocking, and takedown of posts and/or accounts. Recently, platforms have taken a more pre-emptive and automated approach to scale up their content moderation and use machine learning artificial intelligence (AI) to detect supposedly harmful, dangerous, and ‘toxic’ language online. Google Counter Abuse team and Jigsaw developed Perspective Application Programming Interface (API) to regulate ‘toxic language,’ defined as ‘a rude, disrespectful, or unreasonable comment that is likely to make someone leave a discussion’ (Dixon et al., Citation2018, p. 68). Online news businesses like the New York Times and forums like Disqus and Reddit are using Perspective API to automatically moderate user communications at a big data scale.Footnote1

There have been numerous critiques of the technical limitations of algorithmic content moderation (referred to as algorithmic moderation for brevity in the following sections). Studies have found that algorithmic moderation is susceptible to various forms of biases and misclassification. Messages containing minority identity markers (e.g., Arabs, Black, LGBT+), messages with explicit swearwords, and messages with specific racial and ethnic dialects (e.g., Black-aligned English) have a higher probability to be misclassified as toxic regardless of their communicative contexts (Gorwa et al., Citation2020; Thiago et al., Citation2021; Zhou, Citation2021). AI communities are researching ways to mitigate such biases by experimenting with new ways to deal with these problems trivialised as ‘technical glitches,’ minor, incidental errors that can be patched up through machinery itself. For instance, de-biasing methods recommend using alternative methods for dividing training and testing datasets, hiring coders from diverse backgrounds, and mitigating spurious correlations with additional mathematical methods (Zhou, Citation2021).

However, the fundamental problems with algorithmic moderation lie much deeper than the incidental ‘technical glitches’ that can be easily identified and solved with easy fixes. In the case of algorithmic moderation, the problems are embedded in the very logic of content moderation that certain forms of language are ‘toxic’ and must be censored to promote democratic and productive discourse through AI. But the problem grows as soon as the tech communities’ definition of ‘toxic language’ is loosely and inconsistently defined. Perspective API itself defines toxic language as a hodgepodge of diverse socially undesirable behaviours from rude, insulting, to hateful messages. Here Perspective API assumes that uncivil messages are toxic. Indeed we argue that some uncivil messages may be essential for strengthening and deepening democracy. The uncritical definition of ‘toxic language’ can result in many misclassifications, failing to differentiate incivility and intolerance and their implications for democratic discourse.

This paper provides a radical democratic critique of algorithmic moderation both from normative theories of inclusive and progressive public spheres and empirical evidence. This paper consists of five sections. First, we challenge the political liberal view of civility embedded in the logic of AI for toxic language moderation and argue against regulating incivility. Second, we then theoretically discern incivility from intolerance. We argue that, if at all, AI moderation should focus on hate speech but only if certain conditions are met. We then introduce the methods and sample of this study to empirically evaluate the ability of toxic language AI to detect and regulate incivility and intolerance, followed by results. We end the paper with some concluding remarks about the risks and limitations of the current algorithmic moderation.

Against regulating incivility

Many commentators and, indeed academics, contend that there has been a rise of incivility in recent years, most obviously in political discourse and discourse on social media but also extending to workplace cultures, including universities, and social life more generally. There is a tendency for both those on the political right and left to accuse their opponents of lacking civility and thereby putting the very idea of a shared democratic discourse in question. This is, of course, an empirical matter and there is a clear danger of assuming that there was a time, a ‘golden age’ or a ‘gilded age’, when politicians, neighbours, strangers, and colleagues were, in fact, more civil to each other but it is also a normative question in that it is often thought that incivility serves to wreck the quality of democracy and indeed other domains of our lives (Grinspan, Citation2021). The twin claims that there is a lot more incivility around now than in the past and that it is a bad thing for democracy generally serves to legitimate the employment of AI content moderation such as Perspective API. Here we will leave aside the question of whether there is more incivility about generally in favour of tackling the normative question as this is the more fundamental one that determines whether AI content moderation, or indeed human content moderation, of incivility is justified.

Civility is seen as an essential component of the liberal condition. In this tradition, we are very unlikely to agree on matters of fundamental importance and we are destined to compete with each other while often not sharing the same frame of reference and so with little prospect of consensus, compromise, reconciliation and so on. Our values and beliefs may simply be incommensurable with one another. In such circumstances, civility as a form of politeness and self-control acts as a sort of social glue, calming tempers, and encouraging us to believe despite everything that we belong to the same political community at the same time as belonging to different ones. It helps us to keep disputes within certain bounds discouraging us from fighting each other. In this liberal tradition that emphasises ultimate incommensurability over the contents of the good life, civility therefore plays an essential role in keeping us and society in check. We may not be able to agree about things of fundamental importance but at least we can talk nicely to each other. There is something intuitively compelling in this because it speaks to the sense that no matter how long we spend trying to persuade each other in friendly and respectful tones we will come up against incommensurable differences rather than achieve rational consensus. What we have to fall back on at ground level is the way we actually behave toward one another. Civility in this tradition is a way of keeping people in the conversation at least rather than serving to exclude them and the consequent dangers of such exclusion. Rawls argues for a duty of civility based on the idea of reciprocity and the practice of public reason, listening to others, fairmindedness, and making reasonable accommodations to the views of others (Citation1996, p. 217). For Rawls this duty of civility should provide a moral bedrock.

However, this approach to civility which sees it as fostering and maintaining inclusion despite fundamental differences between individuals and communities, there is another almost diametrically opposed tradition which sees civility as deeply exclusionary. The essential text of this tradition is Elias (Citation2000) Civilising Process. While Elias shares with liberals the idea that civility was developed from the fifteenth century onwards to repress violence it has become a way of excluding people. Economic and political elites come to regard themselves as the bearers of civility and others who are not able to adopt the conventions of civility or who do not wish to adopt such conventions are seen as morally inferior and excluded from the conversation. Instead of civility being about inclusion, therefore, it becomes about exclusion of the already marginalised and the legitimation of that exclusion. In this Elias influenced tradition, therefore, the maintenance of civility stands in tension to the pursuit of democracy. The problem with civility in this tradition (which we place ourselves in) is that it serves to prevent the extension of reciprocity to all.

Feminist scholar Linda Zerilli (Citation2014) makes this connection between incivility and the broader questions of democracy:

Uncivil public behaviour is symptomatic of a more general democratic deficit of public space in which grievances can legitimately be raised and meaningfully addressed by fellow citizens and their elected representatives. If some citizens are more prone to shout, that may well be because those in power are not listening. (p.112)

Incivility here is at once both symptomatic of a lack of democracy or inclusion and a legitimate way to try to get one’s voice heard in a context of economic, social, and political inequalities and when those promoting the importance of civility both consider themselves morally superior and overlook the structural barriers that prevent people from engaging in conversations, civil or otherwise: ‘Not only the charge of incivility but also the practice of civility itself has often times worked to mask relations of power in a veneer of politeness, however sincere.’(Zerilli, Citation2014, pp. 116–117). Zerilli’s main example is the white progressive paternalist reaction to the perceived incivility of the civil rights movement in the US in the 1960s. Zerilli quotes Bickford’s (Citation2011) assertion that civility ‘can make people less likely to perceive actual injustice and oppression’ (p. 1032). If people are able to engage in a civil conversation then this must mean (according to this line of argument) that at least for some participants injustice cannot be so bad and so this not only obscures the injustice but also discourages the development of a motivation to enact radical change. Zerilli also supports the argument of Meyers (Citation2018) that when people have been exposed to extreme injustice and that the consequences of this are that they do not wish or are not able to behave civilly this anger enables them to identify injustice more acutely than those able to engage in civil discourse. Zerilli argues that African Americans and women had played by the rules of civility it would be ‘hard if not impossible to understand’ how the gains made by people of colour and women would have been achieved. For Zerilli (Citation2014) then uncivil political action is the prerequisite for radical change that ultimately extends democracy.

Derek Edyvane (Citation2020) has taken up the question of the democratic effectiveness of incivility addressing the argument that incivility can be counter-productive and lead to a democratically troubling ‘vortex’ (Flinders, Citation2017). Edyvane argues that incivility can have instrumental value in that it disturbs the status quo in a way that civility does not and this on occasion might lead to radical change for the better (it might not, of course, but Edyvane sees this as an open question rather than seeing coercion as the only response to incivility). In addition to this instrumental or political value, however, there is also an expressive or symbolic value. Incivility ‘sends a message’ to those in economic and political power and it expresses a sense of injustice. Edyvane argues that uncivil behaviour is unique in the sense that ‘At its core, incivility expresses a kind of demand, a demand for recognition of one’s stake in the joint commitment to society’ (Citation2020, p. 103) and an attempt to compel recognition or at least a response. Even if such incivility does not provoke a response from the powerful it may still communicate to other oppressed people and promote solidarity and association. Failing that (i.e., if no-one is listening, neither the powerful nor the powerless) such expression may be an important source of self-respect, of having simply resisted in some way to oppression. Incivility then even if we cannot guarantee it will have instrumental democratic value will at least have some expressive individual value that may indeed lead ultimately to a better democracy.

Deliberative theorists take a more systemic approach when assessing the value of incivility. They pay attention to the systematic analysis of deliberative politics, beyond individual instances. Instead of focusing on individual instances of a perceived violation of civility and ‘rational arguments,’ the quality and success of public deliberation are assessed at a systemic level (Mansbridge et al., Citation2012; Parkinson, Citation2012). What might be considered low quality, non-deliberative deliberation (e.g., disruptive and distracting means of communication) in an individual instance might still be beneficial to the overall deliberative system to the extent that it increases the pool of perspectives, claims, reasons, and arguments available to the formal decision-making system (Mansbridge et al., Citation2012; Parkinson, Citation2012; Young, Citation2000).

Habermas (Citation1985) also defends legitimacy of non-deliberative means in certain circumstances such as in when the public resort to illegal, civil disobedience. When the formal deliberation is prematurely closed or when the oppressed lacks privileges and means for influence, civil disobedience and non-deliberative means can be a means to appeal to the public capacity for reason and sense of justice and to open up further deliberation (Habermas, Citation1985). Uncivil communications can be utilised by citizens and activists to raise awareness about and to bring attention to important socio-political issues (Young, Citation2000). Such uncivil actions can still aim for communication.

For regulating hate speech in principle if not in practice

So far we have put forward an argument that questions the moderation of uncivil discourse on the grounds that it may be a good thing for democracy in that it may lead to extending the conditions of democracy to previously excluded groups. Arguing on the same radical grounds of the desire to extend democracy we will now argue that there are occasions where speech can be justifiably curtailed and that AI moderation should focus on identifying those conditions. If AI moderation is unsuccessful in doing so (as we argue it currently the case) it brings AI moderation into question as a whole.

If civility is to do with a speech style (e.g., impolite, rude, emotional), tolerance is to do with moral-political attitudes to others. Tolerance as moral-political respect means that the public regard themselves and others as citizens – majority and minorities – with equal legal and political rights and status (Forst, Citation2003). Marcuse (Citation1969) argues that the practice of tolerance is to counter the pervasive inequality of freedom and to strengthen the freedom and political participation of those oppressed and disadvantaged. Intolerance is then a violation of tolerance – seeking to legitimise inequality, lack of parity between the majority and minorities. Distinguishing incivility and intolerance is crucial therefore as they have different roles and implications in public deliberation from a systematic point of view. Unless successfully differentiating incivility and intolerance, algorithmic moderation has predilection to misclassify and silence of the uncivil (but pro-democratic) voices of the marginalised. Thiago and colleges (Citation2021) find that Perspective API’s understanding of toxicity gives higher toxicity scores to drag queens’ tweets (due to their use of swearwords and homophobic slurs to reclaim the words) than white supremacists and neo-Nazis’ tweets (due to lack of explicit slurs and hateful ideas embedded in rhetoric and contextual nuances).

There is, however, a lack of clarity about what precisely constitutes intolerance and hate speech. Gelber (Citation2021) goes a considerable way towards clarifying this issue and thus specifying the conditions under which speech should be regulated by outlining her approach based on systemic discrimination (Citation2021, p. 394). We follow her argument of distinguishing between intolerant and hate speech – considering hate speech as a sub-and in favour of regulating only hate speech at least in principle. This opens up the possibility of AI moderation if, but only if, it can detect hate speech reliably.

Drawing on Austin speech act theory, Gelber argues that speech can harm both consequentially in that it may lead to other harms (e.g., caused through violence) and constitutively i.e., in the act of saying something. For there to be hate speech, however, the constitutive and consequential harm needs to be directed at a marginalised minority that is systemically discriminated against (Citation2021, p. 402). Gelber argues that for the harm of hate speech to be sufficient to warrant moderation ‘it needs to be publicly directed at a member of a group that is identifiable as being subjected to systemic discrimination in the context within which the speech act occurs’ (Citation2021, p. 407, emphasis in original). Moreover, it needs to constitute ‘an act of subordination’ that marks targets as inferior or unequal thus threatening their ability to participate equally in public discourse. According to Gelber, such an approach to hate speech means that it does not rely on the detection of an emotion of hate or on the detection of ‘vituperative’ speech (they are not, in other words, necessary conditions of hate speech although both may be and often are present). This means that ‘some hate speech comes across as a moderate attempt to intervene in a policy debate, or even as a ‘joke’ (Citation2021, p. 408). Intolerant and discriminatory ideas online (e.g., Islamophobia, xenophobia, homophobia) are often delivered in a quasi-civil and pseudo-academic language, eschewing using explicit slurs (Krzyżanowski & Ledin, Citation2017; Thiago et al., Citation2021). It means that hate speech can only be used to denote harmful speech aimed at systemically discriminated against groups and sees such speech itself as part of a system of oppression linked to other forms of discrimination. In doing so, other types of speech, even those that may be regarded as offensive or intolerant by some (i.e., arguments about ‘hate speech’ against white conservative voices), ought not to be considered validly regulable on free speech grounds. It also has implications for imagining the possibility of institutionally supported counter-speech that challenges what is sayable.

The implications of the above are that any form of content moderation, human or otherwise, needs to distinguish between uncivil, intolerant, and hate speech because while uncivil behaviour may on occasion be very good for extending democracy, hate speech always seeks to curtail it through contributing constitutively and consequentially to systemic discrimination that limits individuals’ abilities to participate in public discourse. It also follows that because there is this uncertainty about the democratic value of incivility that it should not be moderated either by hand or by machine but hate speech because it is bad for democracy should be.

If we now have a definition of hate speech and an argument that details when speech may be justifiably curtailed in order to promote democratic participation, we can now raise the question of the ability of AI content moderation to identify such hate speech reliably. Without confidence that it can do so reliably there are no grounds for AI content moderation as the democratic harms of moderation through moderating legitimate speech may far outweigh the democratic gains of moderating hate speech. We now turn to Perspective API, an AI content moderation tool, that moderates both uncivil and intolerant discourse to analyse how these issues play out in practice and whether this tool is able to identify both uncivil speech and all forms of hate speech. What evidence is there to suggest that Perspective API silences marginalised voices in regulating out incivility? Does it do a good job of regulating out hate speech thus promoting the participation of marginalised groups in the public sphere?

Method and sample

To computationally evaluate whether the AI can distinguish incivility, intolerance, and hate speech in public discussions at a big data scale, we collected tweets about abortion laws and policies in the American and Irish Twittersphere. Due to the emotional and polarised nature of the abortion issue (Ferree et al., Citation2002), abortion is a suitable topic to collect data about uncivil and intolerant communications. The US and Ireland are chosen as case-study countries because abortion debates were one of the most important mainstream political priorities in the past few years.

In the US in 1973, Roe v. Wade was a landmark decision in which the American Supreme Court ruled that the American Constitution protects pregnant people’s liberty to terminate pregnancy without excessive state restrictions (Ferree et al., Citation2002). Since Roe, American anti-abortion movements have tried to introduce creative ways to legislate abortion restrictions or even to reverse Roe. With the revival of Christian nationalism, reactionary backlash, and the presidential election of Donald Trump, American anti-abortion movements were gaining new momentum (Norris & Inglehart, Citation2019). In 2020, there were heated public discussions about the precarious future of Roe and the Planned Parenthood (the American reproductive healthcare provider) as well as debates around the implications of abortion bans. The newly appointed and nominated conservative Supreme Court Justices during the Trump administration such as Justice Brett M. Kavanaugh and Justice Amy Coney Barrett ignited public discussions about the overturn of Roe. The new conservative majority Supreme Court overturned Roe in June 2022.

In Ireland, a historically Catholic country, abortion was banned for a long time. In 1983, the 8th amendment of the Irish Constitution enshrined the right to life of the unborn (Field, Citation2018). There were several more referendums to decide further restrictions on abortion such as to remove the threat of suicide as a ground for legal abortion (12th and 25th referendums) and restrict pregnant people’s rights to travel for abortion and to access information about abortion clinics abroad (13th and 14th referendums) (Field, Citation2018). In May 2018, Ireland held a referendum for the 36th amendment of the constitution to repeal the 8th, 13th, and 14th amendments of its Constitution. The repeal side won by a margin of 66.4% to 33.6% (Field, Citation2018). In the same year, the President of Ireland signed the Health Regulation of Termination of Pregnancy Act, defining the circumstances and processes within which abortion is legally performed in Ireland.

Our American dataset contains 6,305,107 tweets collected between 4 March and 20 October 2020 from the Twitter stream API. The five search keywords for the data collection are Roe v. Wade, abortion ban, Planned Parenthood, pro-choice, and pro-life. For the Irish dataset, we rehydrated 1,842,370 tweets from a publicly shared dataset from Littman (Citation2018), collected between 13 April and 4 June 2018 from the Twitter filter stream API, via 63 hashtags: 32 pro-choice (e.g., #togetherforyes); 14 pro-life (e.g., #savethe8th, #lovebothvoteno); and 7 hashtags that are neutral or ambiguous in relation to abortion (e.g., #8thref, #hometovote).

We filtered out possible spam tweets by filtering out tweets with nine or more hashtags (Chen et al., Citation2015). We also removed tweets containing spam words (e.g., PS4, iPhone Pro, giveaway) that we identified through manual qualitative coding of small random samples (Jashinsky et al., Citation2014). Retweets, text duplicates, and non-English tweets were also excluded from our analysis.

We then conducted computer-assisted content analysis to label characteristics of each tweet in the datasets: such as abortion issue position, gender, and the presence of incivility and intolerance of each tweet. Abortion issue position is classified into three categories: pro, anti, and neutral/ambiguous to abortion rights. Since our American and Irish datasets were collected through different methods, two different tactics were used. For the American dataset, we have classified abortion issue positions using the tf-idf scores of the top 5,000 predictive n-grams for pro – and anti-abortion rights tweets. Since the Irish dataset (Littman, Citation2018) was collected through 53 key hashtags with clear ideological indications, we have classified the Irish dataset into anti-abortion-, abortion rights, and neutral/ambiguous positions following the ideology of the hashtags. Our abortion issue position classification achieves, when tested on the testing dataset, 71.8% accuracy for the American data and 84% for the Irish data, which are as high as machine learning classifier results (Sharma et al., Citation2017).

We classified the gender of Twitter users based on their first names in their profiles, using an automatic gender recognition (AGR) package gender (Mullen, Citation2021). Gender predicts gender by name from large historical data, typically population data gathered by American Census and Social Security data (Mullen, Citation2021). Gender automatically allocated Twitter users in our datasets into three nominal categories: women, men, N/A (gender unidentifiable; e.g., pseudonyms and anonyms like ‘Old Glory’). However, the use of automatic gender recognition should be read with caution. It is impossible to verify the ‘real’ gender of the account holder with the plausible exception of blue-ticked, verified accounts. Nonetheless, predicting the gender of online users using their first names has been used in diverse research projects involving big online datasets (King & Frederickson, Citation2021). By using this method, we are not assuming that all individuals are correctly gendered in the results, but this provides insight into gender’s effects in aggregate. Furthermore, while implementing and interpreting the AGR results, their inherent limitations must be acknowledged. AGR is unable to see beyond the state-imposed gender binary and exclude individuals who do not conform to the societal concept of gender as binary (Mihaljević et al., Citation2019; Mullen, Citation2021). We are using AGR while acknowledging its limitations to gain still valuable insights about the gendered aspects of uncivil and intolerant language moderation.

Finally, we also classified the presence of incivility and intolerance in our datasets by applying a lexicon-based classifier. Due to the differences in how incivility and intolerance manifest in language (i.e., explicit swearwords and rude language versus hateful ideas embedded in nuances and coded language), our incivility classifier consists mostly of unigrams whereas the intolerance classifier includes longer phrases such as trigrams and 4 + grams – which result in lower performance rates of accuracy, precision, and recall for intolerance compared to incivility during our training and testing processes.Footnote2 Our incivility classifier includes word stems instead of words to successfully capture the variations of uncivil words (e.g., fuck.* to capture the variance of uncivil word fuck: fucked, fucking, fucker, etc.). Muddiman et al. (Citation2019)’s work to build ‘manually validated organic dictionaries’ of incivility has been a great reference for our classifier building process. Manual labelling and validation can enable the successful construction of incivility and intolerance classifiers that are theoretically derived and context-dependent (Muddiman et al., Citation2019). Furthermore, this semi-supervised, lexicon-based approach eliminates the chances of algorithmic biases known in many machine-learning algorithms and hence provides for a level of transparency (Bender et al., Citation2021). For each of the tweets, the coder labelled its incivility and intolerance using a coding scheme inspired by Ferree et al. (Citation2002) as well as Coe et al. (Citation2014) and Rossini (Citation2020). provides examples of incivility and intolerance in abortion discourse.

Table 1. Examples of incivility and intolerance.

In this paper, incivility and intolerance are separate variables with binary values. Treating them as separate variables is important to acknowledge that some hate speech can contain intolerance without being explicitly hostile, aggressive, and insulting. Some hate speech is presented in a seemingly civil and pseudo-rational form. By separating the two, we can see whether AI’s understanding of toxic language is influenced more by the presence of incivility (rude speech style) or intolerance (the essence of hateful, intolerant ideologies).

To build incivility and intolerance classifiers, the authors manually coded 6,000 tweets from the American dataset and 5,700 tweets from the Irish dataset to create dictionaries of uncivil and intolerant words and phrases. We then checked the performance of the classifiers and added (or removed) words and phrases to enhance the performance. The final version of the American incivility classifier achieved a mean accuracy of 85.8% with a mean precision of 91% and a mean recall of 73.9%. The American intolerance classifier yielded a mean accuracy of 86.5% with a mean precision of 82% and a mean recall of 67.8%. The mean accuracy of the Irish incivility classifier was 93.7% with a mean precision of 86% and a mean recall of 61.8%. The mean performance of the Irish intolerance classifier was 98.3% accuracy, 99.2% precision, and 70.8% recall. Recall performances were lower than accuracy or precision as it was hardly possible to identify every varying form and expression of incivility and intolerance in big data.

The classification identified 231,989 uncivil and/or intolerant tweets in the American data and 29,272 tweets in the Irish data. We then ran the Perspective API onto these uncivil or intolerant tweets to analyse in-depth the AI’s assessment of ‘toxic language’ in comparison to the authors’ manual, theory-driven distinction between incivility and intolerance. illustrates these research steps.

Figure 1. Research steps.

Figure 1. Research steps.

Results

Our results indicate that Perspective API’s understanding of ‘toxic language’ is biased towards incivility than intolerance. The mean and median scores of toxicity indicates that Perspective API gives higher toxicity scores to uncivil tweets than intolerant tweets in both American and Irish samples (). This result is concerning from our theoretical perspective, as we argue that the true threat to democracy is intolerant communications, and not uncivil communications. Equating incivility to toxic language can have a chilling effect on democracy while disregarding its expressive and symbolic values (Edyvane, Citation2020; Zerilli, Citation2014).

Table 2. Mean, median, and standard deviations of toxicity scores of uncivil and intolerant tweets

We also compare the mean and median values of toxicity scores of intolerant tweets, depending on whether the intolerance is expressed with explicit hate words and slurs, or it is embedded in rhetoric and contexts, Without explicit slurs. We use the list of hate words compiled by Hatebase.Footnote3 Hatebase database provides lists of hate words in different languages regarding race, ethnicities, religions, genders, sexualities, class, and disabilities. We use 944 English hate words, ranging from mildly to extremely offensive to automatically classify intolerant tweets into either with or without explicit hate words. demonstrates that Perspective API’s understanding of toxicity is related to explicit hate words and slurs, but it tends to miss intolerant tweets when the meanings are embedded in contexts and nuanced expressions. Since many intolerant and discriminatory online discourse avoids using explicitly hateful expressions (Krzyżanowski & Ledin, Citation2017; Thiago et al., Citation2021), this might indicate that many intolerant ideas and expressions can go unnoticed under the current content moderation AI.

Table 3. Mean, median, and standard deviations of toxicity scores of intolerant tweets with and without explicit hate words and slurs.

shows the odds ratios and 95% confidence intervals (CIs) of the multiple regression coefficients to facilitate interpretations. The odds ratios substantiate the descriptive observation in . Our multiple regression analysis indicates that Perspective API’s toxicity scores are biased to incivility, but not intolerance. Not only are the odds of higher toxicity scores increased by 9% and 5% for uncivil tweets in the American and Irish samples respectively, but the odds are decreased by 4% and 1% respectively for intolerant tweets in the American and Irish samples. The negative association between intolerance and toxicity scores indicates that many intolerant online communications involve subtlety and deceptiveness and do not build their claim in an ostensibly hateful manner (Krzyżanowski & Ledin, Citation2017; Thiago et al., Citation2021), and Perspective API’s reliance on incivility to detect ‘toxic language’ fails to capture such forms of intolerant and hateful speech.

Figure 2. Odds ratio and 95% confidence intervals of toxicity scores of the American and Irish abortion-related tweets.

Figure 2. Odds ratio and 95% confidence intervals of toxicity scores of the American and Irish abortion-related tweets.

The results also show that tweets posted by certain social groups are associated with higher odds of getting higher toxicity scores. The pro-abortion rights position is related to higher toxicity scores in both countries. The odds for abortion rights tweets to get higher toxicity scores are 3% higher in the American data and 6% higher in the Irish sample in contrast to anti-abortion tweets. The odds of getting higher toxicity scores are also 2% higher for women’s tweets in comparison to men’s tweets in the American sample. However, as acknowledged in the method section, this empirical finding about the gendered moderation of toxic language should be taken with caution due to the problematic nature of our automatic gender recognition to assume users’ gender from their first names.

Given that Perspective API’s toxicity scores are influenced by incivility than intolerance, we can infer that the higher odds of toxicity for pro-abortion right tweets (and also women’s tweets in the US) are attributed to their uncivil rhetoric in their activism. However, as we argued, incivility is symptomatic of the dysfunction in democratic decision-making processes (Zerilli, Citation2014). When those in power are not listening to the voice of the marginalised, those minorities are more prone to rely on disruptive, disorderly, and annoying communications to get their voices heard (Zerilli, Citation2014; Young, Citation2000). But the important thing is that uncivil rhetoric can still aim for public communication by appealing to the sense of justice and solidarity of the society (Habermas, Citation1985; Young, Citation2000). It is more than hasty to treat uncivil public speech as ‘toxic,’ anti-democratic, and deserving of hard moderation such as take-down.

An in-depth Reading of ‘highly toxic’ tweets

We conducted in-depth reading and analysis of the American and Irish tweets with +75% and +70% toxicity scores. 70% is a threshold that is deemed as very likely to be toxic by Perspective API itself. We chose a slightly higher threshold (75%) for the American sample since the American sample had a larger size vector which gave an error when converting the dataframe to document-term-matrix for word clouds (discussed later in this paper) due to the limited memory (RAM) capacity of the authors’ machine.

There are 24,205 tweets with +75% toxicity scores in the American data and 2,671 tweets with +70% toxicity scores in the Irish data. shows the descriptive distribution of ‘toxic’ tweets across abortion issue positions and the gender of the Twitter users. A noteworthy finding is that in the Irish dataset, abortion rights tweets are substantially more likely to be flagged as ‘highly toxic’ by Perspective API in comparison to anti-abortion tweets. A similar pattern occurs in the American sample that women’s tweets are flagged more as ‘highly toxic’ than men’s tweets. In connection with our theoretical claim that the socially marginalised and disadvantaged (e.g., women and pro-choice activists against the Catholic state (Ireland) or the conservative majority American Supreme Court) need to rely on uncivil, unruly, and disruptive tactics of communications to express their claims effectively (Edyvane, Citation2020; Zerilli, Citation2014), this skewed distribution of ‘highly toxic’ language detection raises a question about the fairness of AI for political speech moderation.

Table 4. The distribution of ‘toxic’ tweets (75%+ and 70%+ toxicity scores).

Frequent words in the ‘highly toxic’ tweets in the American and Irish samples are visualised in the word clouds in and . Frequent words captured in the word clouds are common swearwords such as ‘fuck,’ ‘fucking,’ and ‘shit,’ while other words are connected to ideological positions in abortion politics (e.g., pro, life, choice, Planned Parenthood, Trump, TogetherForYes) and certain marginalised identity markers such as women and black. This indicates that the Perspective API’s current moderation of ‘toxic’ language is primarily based on incivility – which can work as a mere tone-policing and sanitising the uncivil but meaningful public deliberation. It also highlights the potential bias of AI that discussions involving certain identities (women and Black) tend to get silenced more (Thiago et al., Citation2021; Zhou, Citation2021).

Figure 3. Word cloud of the American tweets with 75%+ toxicity scores.

Figure 3. Word cloud of the American tweets with 75%+ toxicity scores.

Figure 4. Word cloud of the Irish tweets with 70%+ toxicity scores.

Figure 4. Word cloud of the Irish tweets with 70%+ toxicity scores.

Finally, a close reading of a random sample of ‘highly toxic’ tweets posted by women and abortion rights activists further demonstrates our issue with Perspective API. These cited tweets were slightly paraphrased for ethical reasons to prevent the Twitter users from getting backtracked and exposed to potential harm and threats. While paraphrasing, we tried to change the text as little as possible to preserve the nuanced meaning and wording of the original tweets. The four example tweets below show the cases in which women and abortion rights users may use strong uncivil language, but with instrumental and symbolic values discussed above.

  • Grow the fuck up you gobshite. Abortion is the reason why I don’t have an unloved child in care. I was broke, abandoned by my family, and had terrible mental health issues. Sorry for the crude example but abortion is not a choice. It’s a grim necessity for women in a shitty situation (Ireland) (Toxicity 89.16%)

  • We are in the middle of a pandemic and these fuckers are still sitting outside abortion clinics with your fake and hypocritical ‘religious’ values. If Jesus loves all his children, then that counts the young women who want AUTONOMY to their own bodies. Find a better hobby, assholes (US) (Toxicity 83.93%)

  • Shove it up your arse @VP @BetsyDeVos @ScottPruittOK and the rest of the evangelical right here and abroad. We will not back down! #HomeToVote (Ireland) (Toxicity 76.63%)

  • @TomCottonAR Senator if you don’t want women to have reproductive healthcare you can go fuck yourself. Who the hell are you to pass judgement? (US) (Toxicity 73.82%)

When uttered by the marginalised in society, these ‘toxic’ comments may not convey malice, but come from a place of solidarity to boost their armour, to call out existing injustice and inequalities, and to fight against those in power who were pushing reactionary anti-abortion policies. The key is to understand that the underlying message of such ‘toxic’ language amongst feminist women and abortion rights activists is not necessarily to promote hate and discrimination, but to redeploy ‘toxic’ words for self-empowerment (Thiago et al., Citation2021, p. 719). Regulating these activists’ uncivil rhetoric as ‘toxic’ is to fail democracy with its naïve liberal assumption about civility and democracy. Young (Citation2000) argues, ‘Because disadvantaged and excluded sectors cannot wait for the process to become fair, because there are often so many contending interests and issues, oppressed and disadvantaged groups have no alternative but to struggle for greater justice under conditions of inequality’ (50). In their struggles for social change, the marginalised and activists resort to incivility to communicate (Zerilli, Citation2014). Regulating incivility is to shut down such communicative attempts.

Our close reading suggests that when intolerant tweets contain explicitly hateful words and slurs, Perspective API detects intolerance and hate speech quite well. The following five tweets contain explicit misogynistic words (e.g., whore, slut, bimbo, cunt, bitches) directed at female politicians, activists, or general public.

  • Chuck Schumer is a lying sob! Nancy Pelosi is a power-hungry drunken whore! Democrats suck! Baby killers! We Americans are watching you! #maga2020 @nancypelosi2020 @chuckschumer (US) (Toxicity 90.61%)

  • This Nazi slut whore-mocrat Michigan governor Gretchen Whitmer is wearing a Planned Parenthood hat to celebrate killing babies (US) (Toxicity 90.61%)

  • She’s so fucking dumb pulling on god-fearing bullshit. Remember that this ‘pro-life’ bimbo said all satanic pregnancies should miscarry. Her rant has no concern or solace, it is just fearmongering. #trumpistheworstpresidentever #trumpismdiesease (US) (Toxicity 89.24%)

  • @john_mcguirk Maria Steen is an evil cunt from Satan’s bum hole and both of you will be sobbing into your holy water on Saturday. Kisses! #repealthe8th #together4yes (Ireland) (Toxicity 78.57%)

  • @labour party lets these bitches travel to #murder their babies they have no compassion they are not mothers and must be charged with murder upon arriving back to Ireland. #voteno #savethe8th ##abortionnever #abortionismurder (Ireland) (Toxicity 75.11%)

However, our close reading also finds that many tweets that are manually labelled as ‘intolerant’ by the authors are not flagged as ‘highly toxic’ by Perspective API, substantiating again the significant flaw of the operation of Perspective API and its capability to detect intolerance and hate embedded in rhetoric and nuances:

  • @anonymised @PPact I’d be okay with targeted drone missile strikes on all Planned Parenthood sites. This is war, it’s time to start acting like war. You just have to eradicate them. Otherwise, like rats and roaches, they will just reproduce and regenerate. @PPact must be wiped from the face of the earth (US) (Toxicity 57.03%)

  • Can we destroy planned parenthood since the founder was a white supremacist who wanted to eradicate black ppl (aka weeds). Who wants to have a bonfire? (US) (Toxicity 55.36%)

  • Black people should take a stand against the promotion and overexposure of homosexuality and transsexualism to our children (US) (Toxicity 44.3%)

  • Biden seriously scares me. He will hand over womanhood directly to the trans cult (US) (Toxicity 27.13%)

These four examples illustrate that despite their exclusionary ideas, they get lower toxicity scores than the four uncivil (but pro-democratic) tweets cited above. In our datasets, there are numerous discriminatory and exclusionary expressions such as dehumanising the other side, insinuating violence, as well as pseudo-intellectualising their exclusionary ideas, eschewing the use of explicit slurs and other hateful vocabularies (Gelber, Citation2021; Krzyżanowski & Ledin, Citation2017). AI falls short when it comes to detecting intolerant, exclusionary and discriminatory messages whose exclusionary ideologies are concealed in more rhetoric and context (Thiago et al., Citation2021). Content moderation AI like Perspective API seems to give lower toxicity scores for this type of quasi-civil (lacking explicit toxic words) but intolerant expressions.

Concluding discussion: limitations and future of algorithmic content moderation

As we have argued, algorithmic moderation tends implicitly to draw on liberal notions of the value of civility (e.g., Rawls, Citation1996). At the very least such moderation should make such reliance explicit and reflect upon such assumptions. Our position is that the use of algorithmic moderation to regulate incivility as ‘toxic language’ clearly has undemocratic consequences in that it serves to further exclude the already marginalised from political debate and discussion (Zerilli, Citation2014). Consequently, algorithmic moderation should shift its focus away from detecting and censoring incivility. It should focus instead on detecting and moderating instances of hate speech defined according to Gelber (Citation2021), which are anti-democratic. The issue here is that algorithmic moderation is at present unreliable in detecting hate speech and this effectively brings the legitimacy of such moderation into question.

Our empirical results sustain our theoretical concern that Perspective API’s understanding of toxicity is not promoting democracy but rather undermining it by aggravating participatory inequalities between the social majority and minorities. Our analysis demonstrates that Perspective API can overregulate uncivil but pro-democratic speech of the marginalised and activists by detecting their uncivil tweets as ‘toxic.’ Algorithmic moderation fails to acknowledge the expressive and symbolic values of incivility and disruptive, disorderly rhetoric – that is not to convey malice or hate, but to express political demands and appeal to the public sense of justice and equality (Edyvane, Citation2020; Young, Citation2000; Zerilli, Citation2014). Our analysis also highlights that tweets with marginalised identity markers (e.g., women, black) are associated with highly toxic tweets. Moreover, Perspective API’s liberal understanding of civility and democracy makes it unreliable to capture diverse forms of intolerant and hate speech, whose exclusionary ideas are concealed and disguised in quasi-civil and pseudo-rational rhetoric. Our statistical analysis and close reading reveal that Perspective API can only detect intolerant tweets when they contain explicit hate words and slurs. Future developments in algorithmic moderation could usefully focus on how to detect such concealed, polite forms of intolerance and hate speech.

What makes it difficult for algorithmic moderation to discern incivility and intolerance and hate speech is that they manifest in language differently. Uncivil rhetoric often includes visible swear words and rude, aggressive language that are more easily detectable with pre-trained AI whereas intolerant and hateful messages can be hidden in embedded nuances without explicit hateful slurs and emotional words (Gelber, Citation2021; Thiago et al., Citation2021). Furthermore, slurs and offensive language can have starkly differently implications depending on who the speaker is: e.g., slurs used by the marginalised communities to reclaim the word and empower themselves carry the opposite implications to the use of the term by the marginalising (e.g., homophobic slurs used by homophobes versus LGBT + communities themselves in Thiago et al., Citation2021).

What is required is a highly context-sensitive AI that is capable of taking into account an online speaker’s identity in its assessment of ‘toxicity’ of political opinion expressions. We emphasised throughout the manuscript the expressive and pro-democratic nature of uncivil language depends on the position of the speaker within existing power inequalities and injustice.

The development of AI with contextual awareness is not a straightforward, easy technical solution to rectify a ‘glitch’ and it raises a diverse set of ethical and political questions. For instance, how much access to user data should we allow for content moderation AIs that are in many cases developed and used by private business enterprises? For context sensitive AIs to regulate intolerance and hate speech, the tool requires contextual metadata that are beyond the written texts to infer an identity of the speaker, perhaps through accessing and evaluating their names, profile pictures, other tweets they have written, posts that they have shared and liked to infer their race, ethnicity, gender, sexuality, religion, and so on. This ultimately touches upon many other sensitive subjects including privacy and data exploitation, which needs an in-depth discussion on its own right.

The argument and findings of this paper contribute to the theory of the democratic public sphere and civility and social platforms by providing an empirically validated critique of the liberal assumptions embedded in their algorithmic moderation models. We recommend that the concept of ‘toxic’ language be avoided as it unhelpfully brings together both potentially pro – and anti-democratic discourse. Algorithmic moderation should not regulate uncivil speech as it may be pro-democratic and such moderation would serve to further exclude already marginalised groups. At present, algorithmic moderation does not reliably identify either intolerant or hate speech. Consequently, we recommend that very careful consideration should be given to implementing moderation as the democratic harms may well outweigh the democratic benefits of doing so. We recommend that the platforms consider new algorithmic moderation in line with the theories of democracy and democratic public spheres informed by anti-racist, feminist, and other critical theorists (e.g., Edyvane, Citation2020; Gelber, Citation2021; Young, Citation2000; Zerilli, Citation2014) rather than moderation based on unexamined, liberal assumptions. We also recommend that the development of algorithmic moderation should focus on the reliable and transparent identification of hate speech. If it is unable to do so, there can be no democratic grounds for adopting such a system of automatic moderation.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Notes on contributors

Dayei Oh

Dr Dayei Oh is a postdoctoral researcher at the Helsinki Institute for Social Sciences and Humanities at the University of Helsinki. Her research interests include the intersections of digital technologies, public spheres, and democratic discourse in relation to feminism and politics. Currently, she is investigating epistemic instability and contestations in datafied society [email: [email protected]].

John Downey

Professor John Downey is Professor of Comparative Media Analysis at Loughborough University. His research interests include digital media, comparative media analysis, and political communication [email: [email protected]]..

Notes

2 The final version of the American incivility classifier achieved the mean accuracy of 85.8% with the mean precision of 91% and mean recall of 73.9%. The American intolerance classifier yielded the mean accuracy of 86.5% with the mean precision of 82% and the mean recall of 67.8%. The mean accuracy of the Irish incivility classifier was 93.7% with the mean precision of 86% and mean recall of 61.8%. The mean performance of the Irish intolerance classifier was 98.3% accuracy, 99.2% precision, and 70.8% recall. Recall performances were lower than accuracy or precision as it was difficult to identify every creative and nuanced form of incivility and intolerance in big data. Minimising false-positives provides a reliable standard for assessing the performance of natural language classifiers (Muddiman et al., Citation2019).

References

  • Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? In Conference on fairness, accountability, and trans-parency (FAccT ’21), March 3–10, 2021, Virtual Event, Canada (pp. 14). ACM. https://doi.org/10.1145/3442188.3445922
  • Bickford, S. (2011). Emotion talk and political judgment. The Journal of Politics, 73(4), 1025–1037. https://doi.org/10.1017/S0022381611000740
  • Chen, C., Zhang, J., Chen, X., Xiang, Y., & Zhou, W. (2015). 6 million spam tweets: A large ground truth for timely Twitter spam detection. 2015 IEEE International Conference on Communications (ICC) (pp. 7065–7070). IEEE.
  • Coe, K., Kenski, K., & Rains, S. A. (2014). Online and uncivil? Patterns and determinants of incivility in newspaper website comments. Journal of Communication, 64(4), 658–679. https://doi.org/10.1111/jcom.12104
  • Dixon, L., Li, J., Sorensen, J., Thain, N., & Vasserman, L. (2018). Measuring and mitigating unintended bias in text classification. Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, 67–73. https://doi.org/10.1145/3278721.3278729
  • Edyvane, D. (2020). Incivility as dissent. Political Studies, 68(1), 93–109. https://doi.org/10.1177/0032321719831983
  • Elias, N. (2000). The civilizing process: Sociogenetic and psychogenetic investigations (Revised ed.). Blackwell.
  • Ferree, M. M., Gamson, W. A., & Rucht, D. (2002). Shaping abortion discourse: Democracy and the public sphere in Germany and the United States. Cambridge University Press.
  • Field, L. (2018). The abortion referendum of 2018 and a timeline of abortion politics in Ireland to date. Irish Political Studies, 33(4), 608–628. https://doi.org/10.1080/07907184.2018.1500461
  • Flinders, M. (2017). The Culture of Nastiness and the Paradox of Civility. OUPblog. Retrieved August 9, 2023, from https://blog.oup.com/2017/03/culture-nastiness-paradox-civility/
  • Forst, R. (2003). Toleration, justice and reason. In C. McKinnon, & D. Castiglione (Eds.), The culture of toleration in diverse societies (pp. 71–85). Manchester University Press.
  • Gagliardone, I. (2019). Extreme speech| defining online hate and its “public lives”: what is the place for “extreme speech”? International Journal of Communication, 13, 20.
  • Gelber, K. (2021). Differentiating hate speech: A systemic discrimination approach. Critical Review of International Social and Political Philosophy, 24(4), 393–414. https://doi.org/10.1080/13698230.2019.1576006
  • Gil de Zúñiga, H., & Diehl, T. (2019). News finds me perception and democracy: Effects on political knowledge, political interest, and voting. New Media & Society, 21(6), 1253–1271. https://doi.org/10.1177/1461444818817548
  • Gorwa, R., Binns, R., & Katzenbach, C. (2020). Algorithmic content moderation: Technical and political challenges in the automation of platform governance. Big Data & Society, 7(1), 1–15. https://doi.org/10.1177/2053951719897945
  • Grinspan, J. (2021). The Age of acrimony: How Americans fought to Fix their democracy, 1865-1915. Bloomsbury.
  • Habermas, J. (1985). Civil disobedience: Litmus test for the democratic constitutional state, J. Torpey (trans.). Berkeley Journal of Sociology, 30, 95–116.
  • Jashinsky, J., Burton, S. H., Hanson, C. L., West, J., Giraud-Carrier, C., Barnes, M. D., & Argyle, T. (2014). Tracking suicide risk factors through Twitter in the US. Crisis, 35(1), 51–59. https://doi.org/10.1027/0227-5910/a000234
  • King, M. M., & Frederickson, M. E. (2021). The pandemic penalty: The gendered effects of COVID-19 on scientific productivity. Socius, 7, 1–24.
  • Krzyżanowski, M., & Ledin, P. (2017). Uncivility on the web. Journal of Language and Politics, 16(4), 566–581. https://doi.org/10.1075/jlp.17028.krz
  • Littman, J. (2018). Ireland 8th tweet Ids. Harvard Dataverse, V1. https://doi.org/10.7910/DVN/PYCLPE
  • Mansbridge, J., Bohman, J., Chambers, S., Christiano, T., Fung, A., Parkinson, J., … Warren, M. E. (2012). A systemic approach to deliberative democracy. Deliberative Systems: Deliberative Democracy at the Large Scale, 1–26.
  • Marcuse, H. (1969). Repressive tolerance. In R. P. Wolff, B. Moore Jr, & H. Marcuse (Eds.), A critique of pure tolerance (pp. 93–138). Jonathan Cape.
  • Meyers, D. T. (2018). Feminists rethink the self. Routledge.
  • Mihaljević, H., Tullney, M., Santamaría, L., & Steinfeldt, C. (2019). Reflections on gender analyses of bibliographic corpora. Frontiers in Big Data, 2, 29. https://doi.org/10.3389/fdata.2019.00029
  • Muddiman, A., McGregor, S. C., & Stroud, N. J. (2019). (Re)claiming our expertise: Parsing large text corpora with manually validated and organic dictionaries. Political Communication, 36(2), 214–226. https://doi.org/10.1080/10584609.2018.1517843
  • Mullen, L. (2021). Gender: Predict gender from names using historical data (R package version 0.5.4.1000). https://github.com/ropensci/gender/
  • Norris, P., & Inglehart, R. (2019). Cultural backlash: Trump, Brexit, and authoritarian populism. Cambridge University Press. https://doi.org/10.1093/ia/iiz097
  • Parkinson, J. (2012). Democratizing deliberative systems. Deliberative Systems: Deliberative Democracy at the Large Scale, 151–172. https://doi.org/10.1017/CBO9781139178914.008
  • Rawls, J. (1996). Political liberalism. Harvard University Press.
  • Rossini, P. (2020). Beyond incivility: Understanding patterns of uncivil and intolerant discourse in online political talk. Communication Research, 399–425.
  • Sharma, E., Saha, K., Ernala, S. K., Ghoshal, S., & De Choudhury, M. (2017). Analyzing ideological discourse on social media: A case study of the abortion debate. In Proceedings of the 2017 International Conference of the Computational Social Science Society of the Americas (pp. 1–8). Association for Computing Machinery.
  • Thiago, D. O., Marcelo, A. D., & Gomes, A. (2021). Fighting hate speech, silencing drag queens? Artificial intelligence in content moderation and risks to LGBTQ voices online. Sexuality & Culture, 25(2), 700–732. https://doi.org/10.1007/s12119-020-09790-w
  • Young, I. M. (2000). Inclusion and democracy. Oxford University Press.
  • Zerilli, L. M. G. (2014). Against civility: A feminist perspective. Civility, Legality, and Justice in America, Winter 2010, 107–131. https://doi.org/10.1017/CBO9781107479852.005
  • Zhou, X. (2021). Challenges in automated debiasing for toxic language detection. University of Washington.