Machine Learning Models for Predicting, Understanding and Influencing Health Perception

August 11, 2021

Ada Aka and Sudeep Bhatia

Ada Aka is a doctoral student in the Marketing and Psychology departments at the Wharton School of the University of Pennsylvania. She is a 2019 and 2020 Russell Ackoff Doctoral Student Fellow and received grant funding for this research. Ada studies fundamental cognitive phenomena such as how we remember things and make everyday judgments and decisions. She investigates her research questions with a combination of behavioral studies, computational models, and machine learning approaches. Her papers have been published in Journal of Experimental Psychology: General and Learning, Memory, and Cognition.

In 2020 and 2021, COVID-19 disrupted lives all around the world. Medical authorities have stressed the importance of face masks as precautionary measures for reducing the spread of the disease, however, many people are rejecting this advice, aggravating a major health crisis and endangering the lives of others. Although there are many determinants of face mask-avoidance and related risky behaviors, one key factor is perceived severity of COVID-19. Understanding such health perceptions is necessary for influencing and improving behavior during the crisis.

Of course, the importance of health perception in preventative health behavior extends beyond COVID-19. There is a close relationship between perceptions of health outcomes (such as COVID-19 or lung cancer), preventative health decisions (such as wearing a mask or exercising regularly), and the likelihood of engaging in risky behaviors (such as participating in large group gatherings or smoking).

Considerable research in psychology and marketing has found that people are not good at evaluating the severity of different health conditions. Rather, their judgments rely on memory, and emotional, linguistic, social, and other psychological cues that deviate from objective measures. These psychological factors cause variables such as perceived pain, disability, and physical distress to be particularly strong predictors of health perceptions.  In order to facilitate better decision making and preventative behavior, it is essential to predict and understand how people perceive different diseases and other relevant health-related outcomes.

Our Methodology

Because the internet has become an important health information resource for millions of people, we used information communicated through the internet to model health perceptions for hundreds of common diseases. For this purpose, we obtained textual information from the National Health Service (NHS) website, one of the main online sources of health information in the United Kingdom. Using embedding methods to quantify the informational content of health descriptions on the NHS website, we built a machine learning model that is capable of predicting a health perception.

To test our approach, we collected health perceptions for a large set of medical conditions and diseases with an online experiment using a crowdsourcing website.  In the experiment, participants were asked to read NHS summaries for ten randomly selected health conditions ranging from acne to brain aneurysm, and to imagine that they were diagnosed with each of the health conditions. Their task was to report their evaluations of the diseases and health conditions using a scale that varies from 0 (the worst health imaginable) to 100 (the best health imaginable).

We used state-of-the-art language models to transform sentences from each health condition overview into high-dimensional vectors. Our machine learning model mapped these vector representations onto human judgments of health, accurately predicting how participants perceived different diseases and health conditions. Results showed that our model performed better than other competing models that rely only on objective statistics like mortality rates or more simplistic features extracted from the text data (such as text length, concreteness, and sentiment).

We then used our approach to explore which concepts and constructs are most associated with high (and low) health ratings. Using an established dictionary and other participant-generated keywords as inputs into our model, we found that health conditions which contained text related to the constructs of “death,” “pain,” “disability,” “risk,” and “money” were more likely to be perceived as bad health conditions. In contrast, those with text related to other constructs such as present-focused (words like “feel,” “looks,” and “work”) and negation words (words like “shouldn’t” and “don’t”) were more likely to be rated as good health conditions.

In a separate analysis, we presented words from a list of the 500 most frequently-used words that received the highest and lowest predictions by our embeddings model. Here, we found that words signaling some uncertainty such as “occasionally,” “tend,” or “sometimes” lead to higher health ratings (Panel A). By contrast, organ names such as “lung” and “heart” as well as death-related keywords such as “died” and “fatality” are more associated with low health judgments (Panel B).

Figure 1. Frequently used words that received the lowest (red) and highest (green) health predictions using our model

How can we influence people’s health perceptions for many diseases and health conditions?

We next used our model to predict how different descriptions of the same disease can lead to different perceptions. We identified ten health conditions that have overviews in both the NHS website in addition to other health-related websites such as the  We asked a separate group of participants to complete an initial health ratings survey after reading a health condition summary of one of the ten health conditions. We then compared our model predictions with actual participants’ average ratings. Results showed that our model accurately quantified the effects of social media communications.

In summary, our novel machine learning approach can be used to inform policy-insights and behavioral interventions for better health communication. This technique is not only unique in predicting health judgment, but it also has significant time and cost efficiencies as it can be easily applied to diseases without participant data. Policymakers and researchers can use our method to quantify people’s perceptions about different health conditions as well as to better understand the psychological cues used to make health judgments. As language in the disease explanation determines its assessment, it is also possible to make changes to the language used to improve health assessments.