The paper tackles the critical challenge of identifying controversial content within online political discussions, particularly on platforms like Reddit. This is crucial in many applications, as for example national elections, but presents significant technical hurdles.
A key problem highlighted by the research is data imbalance. Unlike artificially controlled settings, real-world online forums often feature a small fraction of controversial posts compared to non-controversial ones. The authors demonstrate that many existing controversy detection methods, often benchmarked on artificially balanced datasets, struggle when faced with this real-world imbalance, limiting their practical applicability. The authors try to circumvent this issue by introducing new features derived from the Topological Data Analysis (TDA) to capture complex, evolving structural patterns in user interactions indicative of controversy. Moreover, they introduce a new dataset that reflects real-world imbalance and a metric to evaluate model robustness in imbalanced scenarios.
Arun, Arvindh*, Karuna K. Chandra*, Akshit Sinha, Balakumar Velayutham, Jashn Arora, Manish Jain, and Ponnurangam Kumaraguru. "Topo Goes Political: TDA-Based Controversy Detection in Imbalanced Reddit Political Data." https://arxiv.org/abs/2503.03500