| Towards Outcome-Driven Patient Subgroups: A Machine Learning Analysis Across Six Depression Treatment Studies | 10.15154/1528714 | Importance: Major depressive disorder (MDD) is a heterogeneous condition; multiple underlying neurobiological substrates could be associated with treatment response variability. Understanding the sources of this variability and predicting outcomes has been elusive. Machine learning (ML) has shown promise in predicting treatment response in MDD, but one limitation has been the lack of clinical interpretability of machine learning models, limiting clinician confidence in model results.
Objective: To develop a machine learning model to derive treatment-relevant patient profiles using clinical and demographic information.
Design: We analyzed data from six clinical trials of pharmacological treatment for depression (total n = 5438) using the Differential Prototypes Neural Network (DPNN), a neural network model that derives patient prototypes which can be used to derive treatment-relevant patient clusters while learning to generate probabilities for differential treatment response. A model classifying remission and outputting individual remission probabilities for five first-line monotherapies and three combination treatments was trained using clinical and demographic data.
Setting: Previously-conducted clinical trials of antidepressant medications.
Participants: Patients with MDD.
Main outcomes and measures: Model validity and clinical utility were measured based on area under the curve (AUC) and expected improvement in sample remission rate with model-guided treatment, respectively. Post-hoc analyses yielded clusters (subgroups) based on patient prototypes learned during training. Prototypes were evaluated for interpretability by assessing differences in feature distributions (e.g. age, sex, symptom severity) and treatment-specific outcomes.
Results: A 3-prototype model achieved an AUC of 0.66 and an expected absolute improvement in population remission rate of 6.5% (relative improvement of 15.6%). We identified three treatment-relevant patient clusters. Cluster A patients tended to be younger, to have increased levels of fatigue and more severe symptoms. Cluster B patients tended to be older, female with less severe symptoms, and the highest remission rates. Cluster C patients had more severe symptoms, lower remission rates, more psychomotor agitation, more intense suicidal ideation, more somatic genital symptoms, and showed improved remission with venlafaxine.
Conclusion and Relevance: It is possible to produce novel treatment-relevant patient profiles using machine learning models; doing so may improve precision medicine for depression. Note: This model is not currently the subject of any active clinical trials and is not intended for clinical use. | 875/6074 | Secondary Analysis | Shared |
| Treatment selection using prototyping in latent-space with application to depression treatment | 10.15154/1523049 | Machine-assisted treatment selection commonly follows one of two paradigms: a fully personalized paradigm which ignores any possible clustering of patients; or a sub-grouping paradigm which ignores personal differences within the identified groups. While both paradigms have shown promising results, each of them suffers from important limitations. In this article, we propose a novel deep learning-based treatment
selection approach that is shown to strike a balance between the two paradigms using latent-space prototyping. Our approach is specifically tailored for domains in which effective prototypes and sub-groups of patients are assumed to exist, but groupings relevant to the training objective are not observable in the non-latent space. In an extensive evaluation, using both synthetic and Major Depressive Disorder (MDD) real-world clinical data describing 4754 MDD patients from clinical trials for depression treatment, we show that our approach favorably compares with state-of-the-art approaches. Specifically, the model produced an 8% absolute and 23% relative improvement over random treatment allocation. This is potentially clinically significant, given the large number of patients with MDD. Therefore, the model can bring about a much desired leap forward in the way depression is treated today. | 876/5946 | Secondary Analysis | Shared |
| Summary Measures for Quantifying the Extent of Visit Irregularity in Longitudinal Data: The STAR*D Study | 10.15154/1518466 | This chapter applies the measures of irregularity from this thesis to the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) study. The STAR*D study is the largest randomized clinical trial on patients suffering from major depression. This chapter focuses on the first phase of the study which pre-specified a common set of scheduled measurement occasions at weeks 2, 4, 6, 9, 12 post-baseline where individuals had their Quick Inventory of Depression Symptomatology (QIDS) questionnaire score recorded; however there were individuals who missed scheduled visits, and had unscheduled visits. Therefore, interest lies in determining whether visits can be treated as repeated measures. This is followed by a demonstration on how to select the appropriate modelling approach for the study outcome, and how to interpret the resulting parameter estimates. The target of inference of this chapter is to evaluate the mean QIDS score over the first 12 weeks of the trial. | 67/4036 | Secondary Analysis | Shared |
| Trial-By-Trial ERP-Behavior Relationships in Psychosis: Between- and Within-Person Variability in Performance Monitoring Adjustments | 10.15154/2pt7-8962 | Cognitive impairment in schizophrenia, characterized by deficits in performance monitoring, predicts clinical and functional outcomes. The error-related negativity (ERN), a neurophysiological index of error detection, is reduced in psychosis, but it is unclear why this impaired error detection is not closely linked to post-error behavioral adjustments. A possibility is that research has overrelied on examining between-person relationships of average ERN and behavior, rather than focusing on within-person, trial-by-trial changes. This study aimed to determine whether neurophysiological indices of error detection (ERN, error positivity [Pe]) predict within-person post-error behavioral adjustments in psychotic disorders and whether these relationships are weaker in people with psychosis than in controls. ERN and Pe were assessed during a modified flanker task in 72 patients with psychosis and 82 healthy comparison participants. Multilevel location-scale models were used to examine trial-by-trial changes in the relationships between ERPs and behavior (response [RTs], accuracy). Results showed that ERP-RT relationships were similar across patients and controls. In both groups, greater within-person increases in ERN amplitude predicted longer and mor variable RTs following correct trials. Larger within-person increases in Pe predicted shorter and more variable RTs following correct trials, but less variable RTs following error trials. Exploratory analyses in a subset of patients with schizophrenia showed a similar pattern of effects as in the overall analyses. ERP-accuracy relationships were neither observed nor moderated by diagnostic groups. Within-person ERP-behavior relationships were preserved in psychosis, indicating intact performance monitoring at the individual level. This supports performance-monitoring as a transdiagnostic construct and underscores the importance of examining intraindividual variability to understand performance monitoring in psychotic disorders. | 2/154 | Secondary Analysis | Shared |
| Consistent differential effects of bupropion and mirtazapine in major depression | 10.15154/qzg7-n302 | Background: Patients with major depression exhibit heterogeneous symptom profiles and variable responses to antidepressants. Most clinical trials rely on aggregate outcomes such as total symptom severity or remission rates, which often obscure meaningful differences in treatment response.
Methods: We applied the Supervised Varimax (SV) algorithm to identify outcome dimensions that maximally differentiate antidepressants based on symptom-level effects. We analyzed all relevant levels of the STAR*D trial and validated findings in the independent CO-MED study. We assessed statistical significance using permutation testing with familywise error rate (FWER) correction.
Results: SV consistently identified interpretable and statistically significant differences between bupropion, mirtazapine, and other antidepressants. In STAR*D, bupropion monotherapy produced greater improvement in hypersomnia than venlafaxine in Levels 2 and 2A (n=686, difference = 0.384, p_{FWER}=0.007). Bupropion augmentation outperformed buspirone augmentation for increased weight, increased appetite, and fatigue in Level 2 (n=520, difference = -0.322, p_{FWER}=0.005). Mirtazapine monotherapy outperformed nortriptyline for insomnia, decreased weight, and decreased appetite in Level 3 (n=214, difference = 0.401, p_{FWER}=0.022), and venlafaxine with mirtazapine similarly outperformed tranylcypromine in Level 4 (n=102, difference = -0.722, p_{FWER}=0.004). In CO-MED, escitalopram with bupropion and venlafaxine with mirtazapine demonstrated complementary symptom-specific benefits (n=640, difference = -0.302, p_{FWER}=0.022).
Conclusion: Bupropion is most effective for hypersomnia, increased weight, increased appetite, or fatigue, while mirtazapine is preferable for insomnia, decreased weight, or decreased appetite. SV enables statistically rigorous, symptom-level differentiation using only treatment assignment, offering a scalable and clinically aligned framework for guiding antidepressant selection from individual clinical trials. | 61/82 | Secondary Analysis | Shared |