Towards Outcome-Driven Patient Subgroups: A Machine Learning Analysis Across Six Depression Treatment Studies | Importance: Major depressive disorder (MDD) is a heterogeneous condition; multiple underlying neurobiological substrates could be associated with treatment response variability. Understanding the sources of this variability and predicting outcomes has been elusive. Machine learning (ML) has shown promise in predicting treatment response in MDD, but one limitation has been the lack of clinical interpretability of machine learning models, limiting clinician confidence in model results.
Objective: To develop a machine learning model to derive treatment-relevant patient profiles using clinical and demographic information.
Design: We analyzed data from six clinical trials of pharmacological treatment for depression (total n = 5438) using the Differential Prototypes Neural Network (DPNN), a neural network model that derives patient prototypes which can be used to derive treatment-relevant patient clusters while learning to generate probabilities for differential treatment response. A model classifying remission and outputting individual remission probabilities for five first-line monotherapies and three combination treatments was trained using clinical and demographic data.
Setting: Previously-conducted clinical trials of antidepressant medications.
Participants: Patients with MDD.
Main outcomes and measures: Model validity and clinical utility were measured based on area under the curve (AUC) and expected improvement in sample remission rate with model-guided treatment, respectively. Post-hoc analyses yielded clusters (subgroups) based on patient prototypes learned during training. Prototypes were evaluated for interpretability by assessing differences in feature distributions (e.g. age, sex, symptom severity) and treatment-specific outcomes.
Results: A 3-prototype model achieved an AUC of 0.66 and an expected absolute improvement in population remission rate of 6.5% (relative improvement of 15.6%). We identified three treatment-relevant patient clusters. Cluster A patients tended to be younger, to have increased levels of fatigue and more severe symptoms. Cluster B patients tended to be older, female with less severe symptoms, and the highest remission rates. Cluster C patients had more severe symptoms, lower remission rates, more psychomotor agitation, more intense suicidal ideation, more somatic genital symptoms, and showed improved remission with venlafaxine.
Conclusion and Relevance: It is possible to produce novel treatment-relevant patient profiles using machine learning models; doing so may improve precision medicine for depression. Note: This model is not currently the subject of any active clinical trials and is not intended for clinical use. | 4262/6074 | Secondary Analysis | Shared |
Treatment selection using prototyping in latent-space with application to depression treatment | Machine-assisted treatment selection commonly follows one of two paradigms: a fully personalized paradigm which ignores any possible clustering of patients; or a sub-grouping paradigm which ignores personal differences within the identified groups. While both paradigms have shown promising results, each of them suffers from important limitations. In this article, we propose a novel deep learning-based treatment
selection approach that is shown to strike a balance between the two paradigms using latent-space prototyping. Our approach is specifically tailored for domains in which effective prototypes and sub-groups of patients are assumed to exist, but groupings relevant to the training objective are not observable in the non-latent space. In an extensive evaluation, using both synthetic and Major Depressive Disorder (MDD) real-world clinical data describing 4754 MDD patients from clinical trials for depression treatment, we show that our approach favorably compares with state-of-the-art approaches. Specifically, the model produced an 8% absolute and 23% relative improvement over random treatment allocation. This is potentially clinically significant, given the large number of patients with MDD. Therefore, the model can bring about a much desired leap forward in the way depression is treated today. | 4134/5946 | Secondary Analysis | Shared |
Analysis of Features Selected by a Deep Learning Model for Differential Treatment Selection in Depression | Background: Deep learning has utility in predicting differential antidepressant treatment response among patients with major depressive disorder, yet there remains a paucity of research describing how to interpret deep learning models in a clinically or etiologically meaningful way. In this paper, we describe methods for analyzing deep learning models of clinical and demographic psychiatric data, using our recent work on a deep learning model of STAR*D and CO-MED remission prediction.
Methods: Our deep learning analysis with STAR*D and CO-MED yielded four models that predicted response to the four treatments used across the two datasets. Here, we use classical statistics and simple data representations to improve interpretability of the features output by our deep learning model and provide finer grained understanding of their clinical and etiological significance. Specifically, we use representations derived from our model to yield features predicting both treatment non-response and differential treatment response to four standard antidepressants, and use linear regression and t-tests to address questions about the contribution of trauma, education, and somatic symptoms to our models.
Results: Traditional statistics were able to probe the input features of our deep learning models, reproducing results from previous research, while providing novel insights into depression causes and treatments. We found that specific features were predictive of treatment response, and were able to break these down by treatment and non-response categories; that specific trauma indices were differentially predictive of baseline depression severity; that somatic symptoms were significantly different between males and females, and that education and low income proved important psycho-social stressors associated with depression.
Conclusion: Traditional statistics can augment interpretation of deep learning models. Such interpretation can lend us new hypotheses about depression and contribute to building causal models of etiology and prognosis. We discuss dataset-specific effects and ideal clinical samples for machine learning analysis aimed at improving tools to assist in optimizing treatment. | 4132/4800 | Secondary Analysis | Shared |
Differential Treatment Benefit Prediction for Treatment Selection in Depression: A Deep Learning Analysis of STAR*D and CO-MED Data | Depression affects one in nine people, but treatment response rates remain low. There is significant potential in the use of computational modeling techniques to predict individual patient responses and thus provide more personalized treatment. Deep learning is a promising computational technique that can be used for differential treatment selection based on predicted remission probability. Using Sequenced Treatment Alternatives to Relieve Depression (STAR*D) and Combining Medications to Enhance Depression Outcomes (CO-MED) trial data, we employed deep neural networks to predict remission after feature selection. Treatments included were citalopram, escitalopram, bupropion SR plus escitalopram, and venlafaxine plus mirtazapine. Differential treatment benefit was estimated in terms of improvement of population remission rates after application of the model for treatment selection using two approaches: (1) using predictions generated directly from the model (the predicted improvement approach) and (2) using bootstrapping for sample generation and then estimating population remission rate for patients who actually received the drug predicted by the model compared to the general population (the actual improvement approach). Our deep learning model predicted remission in a pooled CO-MED/STAR*D dataset (including four treatments) with an area under the curve of 0.69 using 17 input features. Our actual improvement analysis showed a statistically significant 2.48% absolute improvement (corresponding to a 7.2% relative improvement) in population remission rate (p = 0.01, CI 2.48% ± 0.5%). Our model serves as proof-of-concept that deep learning approaches, with further refinement and work to address concerns about differences between studies when multiple datasets are used for training, may have utility in differential prediction of antidepressant response when selecting from a number of treatment options. | 4132/4800 | Secondary Analysis | Shared |
Randomized Trials with Repeatedly Measured Outcomes: Handling Irregular and Potentially Informative Assessment Times | Randomized trials are often designed to collect outcomes at fixed points in time after randomization. In practice, the number and timing of outcome assessments can vary among participants. (i.e., irregular). In fact, the timing of assessments may be associated with the outcome of interest (i.e., informative). For example, in a trial evaluating the effectiveness of housing services for homeless people with mental illness, not only did the timings of outcome measurements vary among participants, but more days spent homeless were associated with less frequent observation. This type of informative observation requires appropriate statistical analysis. While analytic methods have been developed, they are rarely used.
The purpose of this paper is to review the methods available with a view to developing recommendations for analyzing trials with irregular and potentially informative observation times. We show how the choice of analytic approach hinges on assumptions about the relationship between the observation and outcome processes. We argue that irregular observation should be treated with the same care as missing data, and propose that trialists: adopt strategies to minimize the extent of irregularity; describe the extent of irregularity in observation times; make their assumptions about the relationships between observation times and outcomes explicit; adopt analytic techniques that are appropriate to their assumptions; rigorously assess sensitivity of trial results to their assumptions. | 4262/4262 | Secondary Analysis | Shared |
Variable Selection in Semiparametric Regression Models for Longitudinal Data with Informative Observation Times | A common issue in longitudinal studies is that subjects' visits are irregular and may depend on observed outcome values which is known as longitudinal data with informative observation times (follow-up). Semiparametric regression modelling for this type of data has received much attention as it provides more flexibility in studying the association between regression factors and a longitudinal outcome. An important problem here is how to select relevant variables and estimate their coefficients in semiparametric regression models when the number of covariates at baseline is large. The current penalization procedures in semiparametric regression models for longitudinal data does not account for informative observation times. We propose a variable selection procedure that is suitable for the estimation methods based on pseudo-score functions. We investigate the asymptotic properties of penalized estimators and conduct simulation studies to illustrate the theoretical results. We also use the procedure for variable selection in a semiparametric model for the STAR*D dataset from a multistage randomized clinical trial for treating major depressive disorder. | 4134/4134 | Secondary Analysis | Shared |
Predictors of change in suicidal ideation across treatment phases of major depressive disorder: analysis of the STAR*D data | The effects of common antidepressants on suicidal ideation (SI) is unclear. In the landmark STAR*D trial antidepressants were effective for Major Depressive Disorder (MDD) in early treatment phases, but less effective in later phases. The effects of antidepressants on SI across the entire sample of the STAR*D trial has never been investigated. We performed a secondary analysis of the STAR*D data with the primary outcome of change in score on the suicide item (item three) of the Hamilton Rating Scale for Depression (HRSD17) across all four study levels. We used descriptive statistics and logistic regression analyses. Pearson correlation was used for change in SI versus change in depression (HRSD16). Reduction in mean (SD) SI was greater in levels one: 0.29 (±0.78) (p<0.001) and two: 0.26 (±0.88) (p<0.001) than in levels three: 0.16 (±0.92) (p=0.005) and four: 0.18 (±0.93) (p=0.094). A history of past suicide attempts (OR 1.72, p=0.007), comorbid medical illness (OR 2.23, p=0.005), and a family history of drug abuse (OR 1.69, p=0.008) was correlated with worsening of SI across level one. Treatment with bupropion (OR 0.24, p<0.001) or buspirone (OR 0.24, p=0.001) were correlated with lowering of SI across level two. Improvement in SI was correlated with improvement in overall depression (HRSD16) at level one: r(3756)=0.48; level two: r(1027)=0.38; level three: r(249)=0.31; and level four: r(75)=0.42 (p<0.001 for all levels). Improvement in SI is limited with pharmacotherapy in patients with treatment-resistant depression. Treatments with known anti-suicidal effects in MDD, such as ECT, should be considered in these patients. | 4130/4130 | Secondary Analysis | Shared |
The bias of parameters in Inverse-Intensity Weighted GEEs when excluding subjects with no follow-up visits | Longitudinal data can be used to study disease progression and often features irregular visit times. Traditional methods such as generalized estimating equations (GEEs) and mixed effect models lead to biased estimates when visit and outcome processes are related. Inverse-intensity weighed GEEs (IIW-GEEs) account for \textcolor{blue}{dependency} between the visit and outcome processes. A common issue is that subjects with no visits are excluded from the dataset in practice. We \textcolor{blue}{aim} to examine the bias of regression parameters in IIW-GEEs when subjects without a visit are excluded.
We show analytically that there is bias when subjects with no visits are excluded, and verify this in a simulation study. Moreover, we show that decreasing visit frequency, decreasing maximum follow-up time, increasing proportion of subjects with no visits lead to \textcolor{blue}{increase} in bias on omitting subjects with no visits. We recommend that everyone should be included in the dataset when analyzing, regardless of whether there is follow-up visit. | 4130/4130 | Secondary Analysis | Shared |
Bayesian likelihood-based regression for estimation of optimal dynamic treatment regimes | Clinicians often make sequences of treatment decisions that can be framed as dynamic treatment regimes. In this paper, we propose a Bayesian likelihood-based dynamic treatment regime model that incorporates regression specifications to yield interpretable relationships between covariates and stage-wise outcomes. We define a set of probabilistically-coherent properties for dynamic treatment regime processes and present the theoretical advantages that are consequential to these properties. We justify the likelihood-based approach by showing that it guarantees these probabilistically-coherent properties, whereas existing methods lead to process spaces that typically violate these properties and lead to modelling assumptions that are infeasible. Through a numerical study, we show that our proposed method can achieve superior performance over existing state-of-the-art methods. | 4120/4120 | Secondary Analysis | Shared |
Application of optimal dynamic treatment regimes to STAR*D | This NDA study is linked to the paper "A tutorial of optimal dynamic treatment regimes" submitted to Statistics in Medicine. We applied Q-learning, A-learning, causal tree and the augmented inverse probability weighted estimator to STAR*D to learn/estimate the optimal dynamic treatment regime for patients who did not achieve remission from initial treatment. We restrict our attention to two treatment options, augmentation and switch, for each decision. The original STAR*D data consists of four levels where all patients were treated with citalopram at level 1 and only switch options were available for patients involved in level 4. We therefore only focus on level 2 (level 2 plus level 2A if level 2A available) and level 3. | 4045/4045 | Secondary Analysis | Shared |
Predicting Antidepressant Response with the STAR*D and CAN-BIND-1 Datasets | Collection for the paper published in PLOS One as "Replication of Machine Learning Methods to Predict Treatment Outcome with Antidepressant Medications in Patients with Major Depressive Disorder from STAR*D and CAN-BIND-1"
Objectives: Antidepressants are first-line treatments for major depressive disorder (MDD), but 40-60% of patients will not respond, hence, predicting response would be a major clinical advance. Machine learning algorithms hold promise to predict treatment outcomes based on clinical symptoms and episode features. We sought to independently replicate recent machine learning methodology predicting antidepressant outcomes using the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) dataset, and then externally validate these methods to train models using data from the Canadian Biomarker Integration Network in Depression (CAN-BIND-1) dataset.
Methods: We replicated methodology from Nie et al (2018) using common algorithms based on linear regressions and decision trees to predict treatment-resistant depression (TRD, defined as failing to respond to 2 or more antidepressants) in the STAR*D dataset. We then trained and externally validated models using the clinical features found in both datasets to predict response (≥50% reduction on the Quick Inventory for Depressive Symptomatology, Self-Rated [QIDS-SR]) and remission (endpoint QIDS-SR score ≤5) in the CAN-BIND-1 dataset. We evaluated additional models to investigate how different outcomes and features may affect prediction performance.
Results: Our replicated models predicted TRD in the STAR*D dataset with slightly better balanced accuracy than Nie et al (70%-73% versus 64%-71%, respectively). Prediction performance on our external methodology validation on the CAN-BIND-1 dataset varied depending on outcome; performance was worse for response (best balanced accuracy 65%) compared to remission (77%). Using the smaller set of features found in both datasets generally improved prediction performance compared to using all the STAR*D features.
Conclusion: We successfully replicated prior work predicting antidepressant treatment outcomes using machine learning methods and clinical data. We found similar prediction performance using these methods on an external database, although prediction of remission was better than prediction of response. Future work is needed to improve prediction performance to be clinically useful.
November 30 2021 update - Some minor bugs were found in our processing code. Please see our github for more details. We have uploaded a new copy of the processed STARD data after these bug fixes
| 4045/4045 | Secondary Analysis | Shared |
Summary Measures for Quantifying the Extent of Visit Irregularity in Longitudinal Data: The STAR*D Study | This chapter applies the measures of irregularity from this thesis to the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) study. The STAR*D study is the largest randomized clinical trial on patients suffering from major depression. This chapter focuses on the first phase of the study which pre-specified a common set of scheduled measurement occasions at weeks 2, 4, 6, 9, 12 post-baseline where individuals had their Quick Inventory of Depression Symptomatology (QIDS) questionnaire score recorded; however there were individuals who missed scheduled visits, and had unscheduled visits. Therefore, interest lies in determining whether visits can be treated as repeated measures. This is followed by a demonstration on how to select the appropriate modelling approach for the study outcome, and how to interpret the resulting parameter estimates. The target of inference of this chapter is to evaluate the mean QIDS score over the first 12 weeks of the trial. | 4036/4036 | Secondary Analysis | Shared |
Cross-trial prediction of treatment outcome in depression: a machine learning approach | Background: Antidepressant treatment efficacy is low, but might be improved by matching patients to interventions. At present, clinicians have no empirically validated mechanisms to assess whether a patient with depression will respond to a specific antidepressant. We aimed to develop an algorithm to assess whether patients will achieve symptomatic remission from a 12-week course of citalopram.
Methods: We used patient-reported data from patients with depression (n=4041, with 1949 completers) from level 1 of the Sequenced Treatment Alternatives to Relieve Depression (STAR*D; ClinicalTrials.gov, number NCT00021528) to identify variables that were most predictive of treatment outcome, and used these variables to train a machine-learning model to predict clinical remission. We externally validated the model in the escitalopram treatment group (n=151) of an independent clinical trial (Combining Medications to Enhance Depression Outcomes [COMED]; ClinicalTrials.gov, number NCT00590863).
Findings: We identified 25 variables that were most predictive of treatment outcome from 164 patient-reportable variables, and used these to train the model. The model was internally cross-validated, and predicted outcomes in the STAR*D cohort with accuracy significantly above chance (64·6% [SD 3·2]; p<0·0001). The model was externally validated in the escitalopram treatment group (N=151) of COMED (accuracy 59·6%, p=0.043). The model also performed significantly above chance in a combined escitalopram-buproprion treatment group in COMED (n=134; accuracy 59·7%, p=0·023), but not in a combined venlafaxine-mirtazapine group (n=140; accuracy 51·4%, p=0·53), suggesting specificity of the model to underlying mechanisms.
Interpretation: Building statistical models by mining existing clinical trial data can enable prospective identification of patients who are likely to respond to a specific antidepressant. | 1949/1949 | Secondary Analysis | Shared |
Construction of the Design Matrix for Generalized Linear Mixed-Effects Models in the Context of Clinical Trials of Treatment Sequences | The estimation of carry-over effects is a difficult problem in the design and analysis of clinical trials of treatment sequences including cross-over trials. Except for simple designs, carry-over effects are usually unidentifiable and therefore nonestimable. Solutions such as imposing parameter constraints are often unjustified and produce differing carry-over estimates depending on the constraint imposed. Generalized inverses or treatment-balancing often allow estimating main treatment effects, but the problem of estimating the carry-over contribution of a treatment sequence remains open in these approaches. Moreover, washout periods are not always feasible or ethical. A common feature of designs with unidentifiable parameters is that they do not have design matrices of full rank. Thus, we propose approaches to the construction of design matrices of full rank, without imposing artificial constraints on the carry-over effects. Our approaches are applicable within the framework of generalized linear mixed-effects models. We present a new model for the design and analysis of clinical trials of treatment sequences, called Antichronic System, and introduce some special sequences called Skip Sequences. We show that carry-over effects are identifiable only if appropriate Skip Sequences are used in the design and/or data analysis of the clinical trial. We explain how Skip Sequences can be implemented in practice, and present a method of computing the appropriate Skip Sequences. We show applications to the design of a cross-over study with 3 treatments and 3 periods, and to the data analysis of the STAR*D study of sequences of treatments for depression. See the paper, which available in this web site. Reference: Diaz, F.J. (2018). "Construction of the Design Matrix for Generalized Linear Mixed-Effects Models in the Context of Clinical Trials of Treatment Sequences". Revista Colombiana de Estadística (Colombian Journal of Statistics). Vol. 41, 191-233. | 1440/1440 | Secondary Analysis | Shared |
Early Remission is Associated with Lower Risk of Relapse: Analysis of Major Depressive Disorder using STAR*D | OBJECTIVES: Major depressive disorder (MDD) contributes to a significant burden in the US, where it is the third leading cause of disability. For patients with MDD who benefit from anti-depressant therapies (ADTs), time to (and in) response or remission can vary greatly. Prior studies have indicated that those who experience response or remission earlier have better long-term MDD-related outcomes. This study sought to quantify the relationship between time to acute treatment-induced remission and the risk of relapse of MDD symptoms in the STAR*D trial (NCT00021528).
METHODS: The STAR*D dataset was analyzed to assess whether early remitters (i.e, patients experiencing remission ≤28 days following step start) exhibited reduced risk of a subsequent relapse during a 12-month naturalistic follow-up compared to late remitters (>28 days). A self-reported Quick Inventory of Depressive Symptomatology (QIDS-SR16) score of ≤5 sustained until the end of any treatment step and a score of ≥11 during the 12-month follow-up defined remission and relapse, respectively. A hazard ratio quantifying the relationship between remission timing and risk of subsequent MDD relapse was estimated using Cox regression modeling, adjusted for patient’s age, treatment step, QIDS-SR16 score at step start, and additional forward-selected demographic factors.
RESULTS: Among 1130 patients with MDD who achieved remission (n=231 early remitters; n=899 late remitters), a significantly greater proportion of late remitters (39.3%) relapsed during the 12-month follow-up phase compared to early remitters (24.7%, P<0.0001). Late remitters had a nearly 50% higher risk of relapse than early remitters during the 12-month follow-up phase (adjusted hazard ratio=1.48, P=0.01).
CONCLUSIONS: Patients in STAR*D who remitted earlier showed significantly reduced risk of relapse compared to those remitting later. These findings highlight the importance of quickly inducing remission– both for the immediate relief of symptoms and the improvement of long-term outcomes. | 1130/1130 | Secondary Analysis | Shared |
The clinical relevance of self-reported premenstrual worsening of depressive symptoms in the management of depressed outpatients: a STAR*D report. | OBJECTIVE:
To determine the incidence, clinical and demographic correlates, and relationship to treatment outcome of self-reported premenstrual exacerbation of depressive symptoms in premenopausal women with major depressive disorder who are receiving antidepressant medication.
METHOD:
This post-hoc analysis used clinical trial data from treatment-seeking, premenopausal, adult female outpatients with major depression who were not using hormonal contraceptives. For this report, citalopram was used as the first treatment step. We also used data from the second step in which one of three new medications were used (bupropion-SR [sustained release], venlafaxine-XR [extended release], or sertraline). Treatment-blinded assessors obtained baseline treatment outcomes data. We hypothesized that those with reported premenstrual depressive symptom exacerbation would have more general medical conditions, longer index depressive episodes, lower response or remission rates, and shorter times-to-relapse with citalopram, and that they would have a better outcome with sertraline than with bupropion-SR.
RESULTS:
At baseline, 66% (n=545/821) of women reported premenstrual exacerbation. They had more general medical conditions, more anxious features, longer index episodes, and shorter times-to-relapse (41.3 to 47.1 weeks, respectively). Response and remission rates to citalopram, however, were unrelated to reported premenstrual exacerbation. Reported premenstrual exacerbation was also unrelated to differential benefit with sertraline and bupropion-SR.
CONCLUSIONS:
Self-reported premenstrual exacerbation has moderate clinical utility in the management of depressed patients, although it is not predictive of overall treatment response. Factors that contribute to a more chronic or relapsing course may also play a role in premenstrual worsening of major depressive disorder (MDD). | 1017/1017 | Secondary Analysis | Shared |
Measuring the individual benefit of a medical or behavioral treatment using generalized linear mixed-effects models | A statistical measure of the individual benefit of medical or behavioral treatment and of the severity of a chronic illness is proposed, which are used to develop a graphical method that can be used by statisticians and clinicians in the data analysis of clinical trials from the perspective of personalized medicine.
The method focuses on assessing and comparing individual effects of treatments rather than average effects and can be used with continuous and discrete responses under generalized linear mixed-effects models framework. Analyses of data from the Sequenced Treatment Alternatives to Relieve Depression clinical trial of sequences of treatments for depression and data from a clinical trial of respiratory treatments are used for illustration. | 170/170 | Secondary Analysis | Shared |