Abstract
Online forums that allow participatory engagement between users have been transfor- mative for public discussion of many important issues. However, such online conversations can sometimes escalate into full-blown exchanges of hate and misinformation. Understanding these online conversations, often involving humans and AI in centralised and decentralised social networks, is an important, yet difficult problem with unique challenges. Online conversations typically unfold over a series of posts and replies to those posts, forming a tree structure within which individual posts are usually short and may implicitly refer to the semantic context of other posts in the tree. Such semantic cross-referencing makes it difficult to understand a single post by itself.
In the first part, I propose two general-purpose frameworks to discover appropriate conversational context for online posts in social media conversations. At first, I propose GraphNLI, a graph-based natural language inference model that uses probabilistic graph walks to incorporate the wider context of a conversation in a principled manner. I eval- uate it on two downstream NLP tasks – polarity prediction and hate speech detection. GraphNLI with a biased root-seeking random walk performs 3 and 10 percentage points better in macro-F1 scores than baselines for polarity prediction and hate speech detection, respectively. Furthermore, I propose GASCOM, a Graph-based Attentive Semantic Con- text Modeling framework for online conversation understanding. Specifically, I design two algorithms that utilize both the graph structure of the conversation and the semantic in- formation from individual comments for retrieving relevant context nodes from the whole conversation. It also contains a token-level multi-head graph attention mechanism to pay different attentions to different tokens from selected context utterances for fine-grained conversation context modelling. GASCOM further improves macro-F1 scores by 4.5% for polarity prediction and by 5% for hate speech detection. This demonstrates the potential of context-aware models in understanding online conversations.
In the second part, I design two families of Conversation Kernels, a flexible general- purpose mechanism which explores different parts of the neighborhood of a post through different kernel shapes in the conversation tree and therefore, build relevant conversa- tional context that is appropriate for a specific task. I apply this method to conversations crawled from slashdot.org with highly different labels to posts, such as ‘insightful’, ‘in- teresting’, ‘funny’, etc. I perform extensive experiments and find that context-augmented conversation kernels can significantly outperform transformer-based baselines, with absolute improvements of up to 20% in accuracy and up to 19% in macro-F1 score.
In the third part, I study online harms such as hate speech and toxic content in centralized social media conversations. Supervised machine learning approaches to detect hate speech often rely on a “ground truth” label. However, obtaining one label through majority voting ignores the important subjectivity information. In the first sub-part, I develop AnnoBERT, a deep learning architecture integrating annotator characteristics with a transformer-based model to detect hate speech, with unique representations based on each annotator’s characteristics via Collaborative Topic Regression. This approach displays advantages in detecting hate speech, especially in case of a large class imbalance and edge cases with annotator disagreements, suggesting its practical value in identifying real-world hate. In the second sub-part, I investigate a relatively new but pro-active approach of tackling hate speech by suggesting a rephrasing of hate speech content even before the post is made. I show that large language models (LLMs) perform well with a few-shot prompt in generating high quality hate-rephrased text that is semantically similar to the original text.
In the fourth part, I study and characterize toxic content in decentralized conversa- tions. Decentralized and interoperable social networks (such as the “fediverse”) create new challenges in moderating toxic content. This is because millions of posts generated on one server can easily spread to another server in a non-synchronized fashion, and each server may only have a partial view of an entire conversation. To address this, I propose a de- centralized conversation-aware content moderation approach which employs a GraphNLI model and trains it locally on each server. It is evaluated on data from Pleroma, a ma- jor decentralized micro-blogging network containing 2 million conversations. This model effectively detects toxicity in larger instances using their local post information (0.8837) macro-F1). In case of small instances with insufficient data, I strategically retrieve infor- mation (posts or model parameters) from other servers to improve toxicity detection and attain a macro-F1 of 0.8826.
In the fifth part, my focus shift towards conversations involving humans and AI and study online harms, specifically hallucinations, in AI conversations with LLMs. As LLM- powered chatbots become popular for health-related queries, lay people risk receiving hal- lucinated health advice. I study hallucinations in LLM-generated responses to real-world healthcare queries. I introduce MedHalu, a medical hallucination benchmark featuring hallucinated responses from LLMs, with annotated hallucination types and text spans. I also propose MedHaluDetect to evaluate LLMs’ abilities to detect hallucinations. I study the vulnerability to medical hallucinations among three groups – medical experts, LLMs, and laypeople and find that LLMs significantly underperform experts and, in some cases, even laypeople. To improve hallucination detection, I propose an expert-in-the-loop approach to integrate expert reasoning into LLM inputs.
In a nutshell, in this dissertation, I contribute various context-aware models and general-purpose frameworks along with large-scale datasets for understanding online conversations and tackling online harms in various social networks, involving humans and AI.