Abstract
The "Green Revolution" has been widely cited as a key enabler in the ability to feed the still growing global population. However, it is not without its problems. These can be both environmental, with high levels of run off of nitrogen and phosphorous from fertilizers, health related, and socio-economic. In particular, at times epidemic levels of farmers' suicides are still pervasive across rural communities in India; at least, $270,940$ Indian agricultural farmers have taken their lives since 1995. India is a country where 60% of its population depend directly or indirectly on agriculture for employment. In recent years, farmer suicides account for 11.2% of all suicides in India.
Activists and academic scholars have attributed the high suicide rates as primarily due to levels of debt induced by the high costs of fertilizers and seeds following the green revolution. However, a deeper analysis indicates that the problem is more complex than this.On the other hand the growing power of machine learning methods will allow us to expand our capabilities on using this technology to extract tremendous amount of information from the text data that are publicly available online. The usage of machine learning for modelling farmer suicides is an interesting topic that has not been covered so far in the literature. Therefore, this thesis, using machine learning presents three novel contributions.
The first contribution of this thesis was in introducing a novel dataset that contain textual representation of articles that have descriptions about farmer suicides. The dataset was collected manually from various sources including academic papers, activist blogs and newspaper articles. The sources were publicly available over the internet.
Our second contribution was the introduction of a novel framework which presents a pipeline for extracting, modelling and even interpreting the causal factors from the suicide related text corpus. For this purpose we developed classifiers that can differentiate a causal sentence from a non-causal one. The causal sentences extracted are then extracted for cause and effect terms.
The final contribution of this thesis was about modelling the cause and effect relationships using Bayesian networks, particularly for analysing the probabilities related to suicidal risks. Although Bayesian networks have been used extensively in the literature for analysing suicidal risks, in this thesis the application to our dataset makes this analysis a novel contribution on its own.
Our original plan was to then run an evaluation of our machine learning models at Telangana state. However, the travel restrictions due to the global pandemic prevented this. Instead, we carried out a geographic analysis of the state. This provided a qualitative framework to evaluate our risk model with regard to its capturing the key factors that impact on suicide risk. It should be emphasised that the parameters for our model provide an initialisation of the risk model. A next step would be to work with members of the community to trial the model in practice and use real-world data and start a process of updating the model as case data is obtained. We expect that the model would have value to support officers in agriculture working at various levels – from identifying individuals in a local community at high risk, through to regional officers identifying communities within the region that are at high risk, to governmental officers exploring the options for interventions to reduce risk at a population level.