Logo image
Open Research University homepage
Surrey researchers Sign in
[Dataset] Harvesting social media and using large language models to analyse online discourse: Developing methodology to explore the challenges faced by Sub-Saharan African women in livestock farming
Dataset   Open access

[Dataset] Harvesting social media and using large language models to analyse online discourse: Developing methodology to explore the challenges faced by Sub-Saharan African women in livestock farming

Georgina Tarrant, Taranpreet Singh Rai, Luke Boyden, Kennedy Mwacalimba, Raymond Tiernan, Peter Kimeli, Travis Lee Street, Alasdair J. C. Cook and Kevin Wells
Zenodo
2025

Abstract

social listening animal health gender large language models Livestock Sub-Saharan Africa
xlsx
WiAf-allthemes-anonymised5.31 MBDownloadView
DatasetSocial media listening (SML) data were collected from X (formerly Twitter), blogs, forums, Reddit and Facebook pages using Pulsar Platform™. SML posts were collected if they met the bespoke keyword search criteria and originated in or mentioned one or more of ten Sub-Saharan African countries: Ethiopia, Ghana, Côte d'Ivoire (Ivory Coast), Kenya, Nigeria, Senegal, Tanzania, Uganda, Zambia, Zimbabwe. The searches were grouped into 4 themes: (1) Women in livestock and farming; (2) Challenges faced by women in livestock and farming; (3) Perceptions of disease health and control measures; (4) Women’s training, education and interventions in livestock.All SML posts matching the searches were extracted from Pulsar Platform™, visually inspected for data quality, and processed using NLP techniques. Duplicate posts, advertisements, and irrelevant content were removed through automated filters and manual review.The post content (title and body text) were concatenated prior to topic modelling using non-negative matrix factorisation (NMF).To mitigate data privacy issues, the dataset was anonyised and personal data redacted using Microsoft Presidio. For more information please see the data dictionary in the data file.CC BY-SA V4.0 Open Access

Metrics

10 Record Views

Details

Logo image

Usage Policy