Abstract
Social media listening (SML) data were collected from X (formerly Twitter), blogs, forums, Reddit and Facebook pages using Pulsar Platform™. SML posts were collected if they met the bespoke keyword search criteria and originated in or mentioned one or more of ten Sub-Saharan African countries: Ethiopia, Ghana, Côte d'Ivoire (Ivory Coast), Kenya, Nigeria, Senegal, Tanzania, Uganda, Zambia, Zimbabwe.
The searches were grouped into 4 themes: (1) Women in livestock and farming; (2) Challenges faced by women in livestock and farming; (3) Perceptions of disease health and control measures; (4) Women’s training, education and interventions in livestock.
All SML posts matching the searches were extracted from Pulsar Platform™, visually inspected for data quality, and processed using NLP techniques. Duplicate posts, advertisements, and irrelevant content were removed through automated filters and manual review.
The post content (title and body text) were concatenated prior to topic modelling using non-negative matrix factorisation (NMF).
To mitigate data privacy issues, the dataset was anonymised and personal data redacted using Microsoft Presidio. For more information please see the data dictionary in the data file.