Previous research has shown the feasibility of utilizing machine learning models, trained on social media data from a singular platform (e.g., Facebook or Twitter), in distinguishing individuals with either a diagnosis of mental illness or experiencing an adverse outcome from healthy volunteers. However, the performance of these models on data from other social media platforms unseen in the training data (e.g., Instagram, TikTok) have not been investigated.
Our project aims to examine whether it is feasible to build machine learning classifiers that can effectively predict an upcoming psychiatric hospitalization given social media data from platforms unseen in the classifiers' training data, despite the preliminary evidence on identity fragmentation between Facebook, Twitter, and Instagram. It also aims to explain any discrepancies in performance that are found during analysis between intra- and inter-platform classification.
Windowed timeline data from three platforms among patients with a diagnosis of Schizophrenia Spectrum Disorder (SSD) before a known hospitalization event and healthy controls was gathered: Facebook (N = 254), Twitter (N = 54), and Instagram (N = 124). Then, we utilized a 3 x 3 combinatorial binary classification design to test model's performance on testing data from all available platforms. We further compared results from models within intra-platform experiments (i.e., training and testing data belongs to the same platform) to models within inter-platform experiments (i.e., training and testing data belongs to the different platforms). Finally, we utilized SHapley Additive exPlanations (SHAP) to extract top predictive features to explain the underlying constructs that predict hospitalization on each platform.
We found that models within intra-platform experiments on average achieved an F1-score of 0.72 in detecting a psychiatric hospitalization due to SSD, which is 68% higher compared to the average of models within inter-platform experiments at an F1-score of 0.428. When investigating the key drivers for divergence in construct validities between models, an analysis of top features for the intra-platform models shows both low predictive feature overlap between the platforms and low pairwise rank correlation (< 0.1) between the platforms' top feature rankings. For instance, 'anger' is a unique top feature for Facebook while 'sad' is a unique top feature for Instagram. Further, low average cosine similarity of data between-platforms within-participants in comparison to the same measurement on data within-platforms between-participants points to evidence of identity fragmentation of participants between platforms.
All in all, we demonstrated models built on one platform's data to predict critical mental health treatment outcomes, such as a hospitalization, may not generalize to another, because each platform consistently reflects different segments of participants' identities. However, combining data from multiple platforms together may offer a more comprehensive view of a patient's state and situation, and therefore fare better in hospitalization prediction. With the changing ecosystem of social media use among different demographic groups and as online identities continue to get fragmented across platforms, further research on holistic approaches to harnessing these diverse data sources is required.
The SocWeB Lab's mission is to develop novel computational techniques, and technologies powered by these techniques, to responsibly and ethically employ social media in quantifying, understanding, and improving our mental health and well-being.