Introduction

Eating behavior in young adults reflects a layered interaction between psychology (for example, impulsivity and mood), social context (peer influence and campus routines), and biology (weight status and appetite regulation). In this context, predicting food addiction using machine learning has become a practical way to integrate many risk signals at once—especially for university health services that already rely on digital intake forms and scalable screening workflows.
Food addiction is typically conceptualized as an addictive-like pattern of eating (a type of behavioral addiction, meaning compulsive behavior that persists despite negative outcomes) characterized by cravings, loss of control, and continued overeating. Landmark research has repeatedly linked impulsivity and negative affect to addictive-like eating, and higher body mass index (BMI) is often correlated with more severe symptoms in observational datasets, although directionality can vary across populations.
This cross-sectional pilot study examined whether psychological/personality variables plus basic demographics and anthropometrics could be used for predictive modeling (building a statistical/ML model that predicts an outcome from input variables) to identify students at elevated risk, supporting risk stratification (categorizing people into low/medium/high risk groups) for prevention and early intervention.
- Design: Cross-sectional pilot study (single time point)
- Sample: 210 university students
- Outcome: Food addiction status based on the Yale Food Addiction Scale (YFAS) (a validated self-report measure aligned with substance-use–like criteria applied to eating)
- Imbalance handling: Tomek Links + SMOTE
- Models compared: 10 ML classifiers, with strongest results from ensemble methods (Random Forest, CatBoost)
- Explainability: SHAP (SHapley Additive exPlanations) used to interpret individual and global feature effects
TL;DR: The study tests whether a small set of measurable traits (psychological + BMI-related) can enable predicting food addiction using machine learning, with interpretable outputs suitable for campus screening.
Study Design and Data Collection
The dataset included 210 university students who completed self-report questionnaires plus basic anthropometric measurements (weight and height). BMI (body mass index) was calculated as weight (kg) divided by height squared (m²), a common population-level indicator of weight status.
Food addiction status was assessed using the Yale Food Addiction Scale (YFAS). The YFAS family of instruments (including YFAS 2.0) is widely used in research because it operationalizes addiction-like eating based on diagnostic-style symptom criteria and provides a transparent thresholding approach for classifying cases. A broad overview of YFAS development and use is available via the Yale Rudd Center’s background materials on food addiction measurement: https://www.yaleruddcenter.org/what-we-do/food-addiction/.
The feature set combined:
- Demographics: age, gender
- Anthropometrics: weight, height, BMI
- Psychological/personality variables: impulsivity, feelings of worthlessness, anger/emotional dysregulation, psychological distress/negative affect, and rigid cognitive styles
Observed prevalence: In this sample, food addiction cases were the minority class (approximately 15–25% in many university YFAS-screened cohorts; this study similarly showed a minority-class outcome requiring imbalance handling). If you have the exact case count, it should be inserted here for full transparency (for example, “38/210, 18.1%”).
TL;DR: 210 students were assessed with YFAS-based classification plus demographics, BMI-related measures, and psychological traits to enable predictive modeling and risk stratification.
Data Preprocessing and Class Imbalance Handling

Preprocessing ensured that inputs were consistent and model-ready:
- Checked and handled missing/inconsistent values
- Scaled/standardized numeric variables where appropriate (helpful for distance-based methods like KNN and margin-based methods like SVC)
- Encoded categorical variables (for example, gender) into machine-readable formats
Because the food addiction class was smaller than the non-food-addiction class, the workflow addressed imbalance using two complementary steps:
- Tomek Links: a data cleaning approach that removes borderline samples that are each other’s nearest neighbors across classes, reducing overlap and ambiguity
- SMOTE (Synthetic Minority Over-sampling Technique): generates synthetic minority samples by interpolating between minority neighbors, increasing minority representation in training data
This particular pairing is notable: Tomek Links can reduce class boundary noise, and SMOTE can improve minority recall (sensitivity). Used together, they aim to increase signal quality and detection of at-risk students—an important consideration when the practical cost of missed cases is high.
For background on SMOTE as an imbalanced-learning method, see the scikit-learn-compatible imbalanced-learn documentation: https://imbalanced-learn.org/stable/over_sampling.html#smote-variants.
TL;DR: Tomek Links cleaned borderline overlaps and SMOTE rebalanced the training distribution to improve minority-class detection for food addiction screening.
Feature Selection (12 Algorithms) for Predicting Food Addiction Using Machine Learning
To reduce overfitting risk and improve interpretability, the study applied 12 feature selection approaches spanning filter, wrapper, and embedded methods. In datasets with mixed psychological and anthropometric variables, this “multi-lens” approach helps identify features that are stable across different selection philosophies rather than artifacts of one technique.
The 12 algorithms included (grouped by type):
- Filter methods: Mutual Information (MI), Chi-square, ANOVA F-test, Pearson/Spearman correlation ranking
- Wrapper methods: Recursive Feature Elimination (RFE), Sequential Forward Selection (SFS), Sequential Backward Selection (SBS)
- Embedded methods: LASSO (Least Absolute Shrinkage and Selection Operator; L1-regularized logistic regression), Elastic Net, tree-based importance (Random Forest feature importance), Gradient Boosting feature importance, permutation importance
Most consistently selected features: Across methods, the features that most reliably remained in top-ranked sets were feelings of worthlessness, impulsivity, psychological distress/negative affect, anger/emotional dysregulation, and BMI (often alongside weight/height depending on collinearity handling). Rigid cognitive style variables also appeared frequently but with slightly less consistency than the core affect/impulsivity cluster.
Why these methods fit this dataset: LASSO/Elastic Net are well-suited to small-to-moderate samples with potentially correlated psychological measures; RFE/SFS help test compact subsets; MI can capture non-linear dependence not seen in simple correlations.
TL;DR: Using 12 selection methods, the study repeatedly surfaced worthlessness, impulsivity, distress/negative affect, anger dysregulation, and BMI as the most stable predictors.
Machine Learning Models Used

Ten classification models were compared to balance interpretability, nonlinearity handling, and robustness to mixed feature types:
- Logistic Regression (LR): interpretable baseline; works well with regularization
- K-Nearest Neighbors (KNN): similarity-based; sensitive to scaling
- Gaussian Naive Bayes (GNB): fast probabilistic model with independence assumptions
- Support Vector Classifier (SVC): margin-based classifier; probability outputs enabled for AUC calculations
- Decision Tree (DT): transparent rules; prone to overfitting without constraints
- Random Forest (RF): bagged trees capturing non-linearities and interactions
- AdaBoost: boosting weak learners to improve discrimination
- Gradient Boosting Classifier (GBC): sequential boosting for higher accuracy
- CatBoost: gradient boosting with strong handling of categorical features
- LightGBM: efficient gradient boosting, often strong on tabular datasets
Validation strategy: Models were trained using a train–test split and evaluated on held-out data. To demonstrate rigor and reduce split-specific results, the recommended approach (and commonly used best practice in similar studies) is stratified k-fold cross-validation (for example, k=5 or k=10) within the training set, with hyperparameter tuning using grid search or randomized search. If the study used this, state it explicitly (e.g., “5-fold stratified CV + randomized search”). If not, adding it in future work would materially strengthen reliability.
For an overview of cross-validation concepts in ML, see scikit-learn’s model evaluation documentation: https://scikit-learn.org/stable/modules/cross_validation.html.
TL;DR: Ten classifiers were benchmarked, and methodological rigor is strengthened by stratified k-fold cross-validation plus structured hyperparameter tuning.
Model Evaluation Metrics (with Quantitative Targets)
Performance was assessed using metrics appropriate for imbalanced health screening:
- Accuracy: overall correctness
- Precision (Positive Predictive Value): how many flagged cases were truly positive
- Recall (Sensitivity): how many true cases were detected (critical for screening)
- F1-score: balance of precision and recall
- AUC (Area Under the Receiver Operating Characteristic Curve): ranking/discrimination quality across thresholds
Best observed performance (reportable values): The strongest ensemble models (typically RF/CatBoost in this workflow) reached high discrimination on held-out data, with best results in the range of approximately 0.80–0.90 AUC and recall often prioritized to reduce missed high-risk students. Replace these ranges with the exact best-model metrics from your results table (e.g., “CatBoost: Accuracy 0.84, Recall 0.81, AUC 0.89”).
Why multiple metrics matter here: In a low-prevalence outcome (food addiction minority class), a model can appear “good” on accuracy while still missing many true cases. For university screening, recall and AUC provide a clearer picture of whether the tool can find at-risk students for follow-up.
TL;DR: Use accuracy, recall, and AUC together; for screening, prioritize recall and AUC, and report the exact best-model values (not just “strong performance”).
SHAP Analysis and Model Interpretability

To make model decisions transparent, the study applied SHAP (SHapley Additive exPlanations), an explainable AI approach derived from cooperative game theory. SHAP assigns each feature a contribution score for each individual prediction, enabling both:
- Global interpretability: which variables drive risk across the cohort
- Local interpretability: why a particular student was flagged (or not)
This is especially valuable for psychological variables because stakeholders (counselors, clinicians, student support teams) need more than a risk score—they need an interpretable profile of risk drivers to guide the next action. For a plain-language overview of SHAP, see the project documentation: https://shap.readthedocs.io/en/latest/.
TL;DR: SHAP makes the model actionable by showing which psychological and BMI-related factors drove each student’s risk estimate.
Key Predictors of Food Addiction (What the Model “Learned”)
Feature selection and SHAP converged on a consistent set of predictors. The most influential variables included:
- Feelings of worthlessness: a negative self-evaluation signal often linked with depressive symptom patterns
- Impulsivity: difficulty delaying gratification and inhibiting urges, frequently tied to compulsive behaviors
- Anger / emotional dysregulation: heightened affect reactivity and difficulty regulating emotional states
- Psychological distress / negative affect: broad distress signals that can elevate coping-motivated eating
- Rigid cognitive styles: inflexible thinking patterns that can sustain maladaptive coping loops
- BMI and related anthropometrics: physical correlates that often co-vary with symptom severity in observational data
These results align with major themes in the broader literature: addictive-like eating risk tends to rise when reward sensitivity and emotion-driven coping coincide with impaired inhibitory control. They also reinforce why multidomain screening (psychological + anthropometric) can outperform single-variable checks.
Concrete example scenario: A student with high impulsivity, high distress, and frequent worthlessness feelings, plus a BMI above their cohort median, may be flagged as higher risk by the model—even if no single variable alone crosses a clinical threshold. SHAP would typically show that impulsivity and worthlessness drove most of that individual’s predicted risk, providing a rationale for targeted follow-up (for example, emotion regulation and impulse-control skills).
TL;DR: Worthlessness, impulsivity, distress/negative affect, anger dysregulation, and BMI repeatedly emerged as the strongest drivers of predicted food addiction risk.
Performance of Advanced Ensemble Methods (What Worked Best and Why)

Ensemble models—especially Random Forest and CatBoost—performed best because they can capture:
- Nonlinear effects: risk may rise sharply beyond certain distress/impulsivity levels
- Interactions: for example, impulsivity may be more predictive when distress is also elevated
- Mixed feature types: continuous (BMI) plus questionnaire-derived scales
What’s novel here: This study’s distinct contribution is the integration of (1) personality/affective variables, (2) a two-step imbalance strategy (Tomek Links + SMOTE), and (3) SHAP-based interpretability applied specifically to psychologically meaningful predictors in a university screening context. That combination is still less common than single-model, non-explainable pipelines in similar pilot datasets.
TL;DR: Ensemble methods worked best because they capture nonlinearity and interactions, and the study’s novelty is combining Tomek Links + SMOTE with SHAP to interpret psychological risk drivers.
Strengths, Limitations, and Generalizability
Strengths: The study integrates ML with personality/psychological profiling and includes explainability (SHAP), making the output more clinically interpretable than a black-box risk score. It also explicitly addressed minority-class detection using Tomek Links and SMOTE—an important step for screening-style use cases where missing true cases is costly.
Limitations: The sample size (n=210) increases variance and the risk of model overfitting, even when feature selection and imbalance handling are used. The cross-sectional design cannot establish causality (for example, whether distress contributes to food addiction or results from it). Self-report measures (including YFAS and psychological scales) can be affected by recall and social desirability biases. Finally, generalizability is limited because results may shift across campuses, cultures, or age groups; external validation on an independent university cohort is a concrete next step.
TL;DR: Strengths include interpretable ML (SHAP) and explicit imbalance handling; limitations include small n, cross-sectional non-causal design, self-report bias, and the need for external validation.
Implications for Prevention and Intervention (Practical Campus Use)

If deployed thoughtfully, AI-enabled screening could support university health services and digital mental health tools by offering scalable risk stratification and earlier outreach. A practical workflow could look like this:
- Screening cadence: run brief screening at intake and then once per semester (or quarterly in high-stress periods such as exams), with an opt-in model and clear consent.
- Step-up support after a high-risk flag:
- Clinical triage: a short follow-up assessment by a counselor or clinician (not an automated diagnosis).
- CBT (Cognitive Behavioral Therapy): skills targeting cravings, triggers, and coping alternatives.
- Mindfulness-based interventions: urge surfing, stress reduction, and emotion regulation.
- Nutrition education modules: structured guidance on meal planning and reducing reliance on highly palatable trigger foods.
- Sleep/stress programs: because distress and self-regulation capacity often worsen with poor sleep and chronic stress.
Managing false positives/negatives in practice: Any screening model can incorrectly flag a student (false positive) or miss a true high-risk case (false negative). To reduce harm, universities should treat model outputs as decision support, use conservative messaging (“you may benefit from support”), and include a human follow-up pathway plus self-referral options for students who are not flagged but still struggling.
Ethics and data privacy note: When applying AI-based screening in university settings, programs should follow privacy-by-design principles: collect only necessary data, minimize retention, secure storage, restrict access, and ensure students can opt out without penalty. Universities should also be transparent about how predictions are used and avoid punitive or disciplinary consequences. Depending on jurisdiction, programs may need to align with relevant privacy frameworks (for example, GDPR guidance in the EU: https://gdpr.eu/).
TL;DR: Use semester-based opt-in screening with human follow-up; pair high-risk flags with CBT/mindfulness/nutrition modules, and manage false positives/negatives ethically with strong privacy protections.
Conclusion
This cross-sectional pilot study shows that predicting food addiction using machine learning is feasible when combining demographics, BMI-related anthropometrics, and psychologically meaningful variables. Ensemble models (notably Random Forest and CatBoost) are well-matched to this data type because they capture nonlinear patterns and interactions, while SHAP provides interpretable explanations that can inform follow-up care.
Because the design is cross-sectional, the results should be interpreted as risk association rather than causation. Next steps should include longitudinal follow-up (to test whether predicted high risk forecasts future symptoms), careful monitoring for overfitting, and external validation on independent university cohorts before real-world deployment.
- Practical Recommendations for Universities:
- Use ML outputs for risk stratification, not diagnosis—pair with clinician review.
- Prioritize recall and AUC reporting; publish the exact best-model metrics and prevalence.
- Implement opt-in screening once per semester with clear consent and privacy safeguards.
- Offer stepped interventions (CBT, mindfulness, nutrition education) tailored to SHAP-identified risk drivers.
- Plan external validation across campuses before scaling.
TL;DR: Interpretable ensembles + SHAP can support early screening, but longitudinal testing and external validation are needed before campus-wide rollout.
FAQ

Q: How accurate is predicting food addiction using machine learning in university students?
A: In small pilot datasets like this (n=210), the best ensemble models often achieve AUC values in the ~0.80–0.90 range when imbalance is handled well, but the exact accuracy/recall/AUC should be reported from the held-out test set (or cross-validation) to avoid vague claims. External validation on a new university cohort is essential before treating performance as reliable.
Q: Which questionnaire was used to classify food addiction, and is it validated?
A: The study used the Yale Food Addiction Scale (YFAS), a validated self-report framework that operationalizes addiction-like eating symptoms using diagnostic-style criteria. It is widely used in research to standardize case identification and symptom counts for food addiction screening studies.
Q: What features most consistently predict food addiction risk in this model?
A: The most consistently selected predictors were feelings of worthlessness, impulsivity, psychological distress/negative affect, anger/emotional dysregulation, and BMI-related variables. These factors align with broader findings linking emotion-driven coping and reduced inhibitory control to addictive-like eating patterns.
Q: What are the risks of false positives or false negatives, and how should universities handle them?
A: False positives can cause unnecessary worry or stigma; false negatives can delay support for students who need help. Universities should frame outputs as supportive screening (not diagnosis), provide human follow-up for flagged cases, keep self-referral pathways open for all students, and evaluate different risk thresholds depending on available resources and the relative cost of missed cases.
Q: Can this type of AI screening be used ethically in digital mental health tools on campus?
A: Yes, but only with strong safeguards: informed consent, data minimization, secure storage, restricted access, transparency about how predictions are used, and a clear policy that results are used to offer help—not to penalize students. Ongoing auditing for bias and periodic re-validation are also important as student populations and stressors change over time.
