A Failed Failed Short-Term Psychodynamic Psychotherapy Trial. Oops.

The article I’m talking about is: “Short-Term Dynamic Psychotherapy Versus Pharmacotherapy for Major Depressive Disorder: A Randomized, Placebo-Controlled Trial.” by Jacques Barber, PhD, et al.

Here’s the thing: Trials of psychodynamic psychotherapy are on the rise, for good reason– it is often overlooked as a bona-fide therapy in both academic and community circles and people are taking notice that it can be quite helpful for all sorts of psychiatric difficulties, including, especially, “treatment-resistant” conditions (see almost everything published through the Austen-Riggs Center for examples). It’s use often generates all sorts of scorn from the CBT and biologically-reductionist crowds and any article about psychodynamics that actually gets published these days gets very closely inspected. Now, trials of dynamic psychotherapy vs. medication are hugely rare, and I love seeing them pop-up here and there.

This, trial, though, purports to be a failed trial of both psychodynamic psychotherapy and antidepressants for treatment of major-depressive disorder. The major result is that no treatment separated from placebo. This result will likely be hyped-up by the mainstream psychotherapy, psychiatry, and anti-psychiatry media. However, it is fundamentally important to be aware that it is an underpowered study (156 people recruited for a study that needed 180 to get power greater than 80%). As a result, this study could only detect an effect size of 0.48!

So, it is very possible that both treatments could have separated from placebo and one treatment could have separated from the other, but this study was not able to detect it, because too few people were randomized. It is entirely possible that there were effect sizes of 0.40 or even 0.30, but this study could not detect them. Shooting for an effect size of 0.48 is a mighty high bar and almost no psychiatric treatment for depression — at first shot, with either therapy or medications — reaches it, in most of the published research.

Should either of the treatments have reached that effect size (0.48) — this would have possibly been the most important psychiatric article of the last 5 years. Oh, but neither did. And there is really nothing that we can tell from this article as a result. It’s like saying that neither my volvo nor your porsche can go 250 mph, therefore they are both unable to go fast and we shouldn’t even try. Not correct.

Oh, and one more thing — in their discussion the authors, themselves, minimized the importance of “power issues.” Fail. If your study cannot detect a difference, it cannot detect a difference, no matter how much you want to look at secondary analyses and try to find some race or socioeconomic or gender-based reasons for your results.

This is a failed, failed study.


Celexa, but not Lexapro (??) causes QTc prolongation

FDA is warning against using anymore than 40mg Celexa due to a real risk of Torsades from QTc prolongation:

The U.S. Food and Drug Administration (FDA) is informing healthcare professionals and patients that the antidepressant Celexa (citalopram hydrobromide; also marketed as generics) should no longer be used at doses greater than 40 mg per day because it can cause abnormal changes in the electrical activity of the heart. Studies did not show a benefit in the treatment of depression at doses higher than 40 mg per day.

Right. It makes sense not to use it, then. But Lexapro doesn’t? No warning there? I know 20mg is the approved max, but, like pushing the dose of Celexa, I’ve seen 30mg of Lexapro given to folks fairly often.

CO-MED Depression Trial Results are in! Blame the Patients!

(The article in question is the Combining Medications to Enhance Depression Outcomes (CO-MED): Acute and Long-Term Outcomes of a Single-Blind Randomized Study in the July 2011 American Journal of Psychiatry, in case you want to follow-along while I read aloud)

Well, don’t exactly blame the patients. Blame the study authors for possibly not diagnosing their subjects correctly during subject recruitment. I want to spend this article talking about the patients in the trial and may leave discussion of the article’s results (simply, that no treatment combo outperformed any other) to another post, because this subject-selection problem is a big deal.

Now, this issue of subject selection is important, too, because it directly impacts our assessment of the results of this trial. Ordinarily, we might accept the results and start second-guessing the quality of our psychopharmacology (which is all the rage in the New York Times these days). But if the subjects in the trial don’t have the illness the trial purportedly studies, then we can not accept the results; we can’t draw any conclusions whatsoever.

Before even reading the article, I predicted that the subject population would primarily include 40 year old women and, lo, that is indeed the subject average (68% of subjects were women about 42 years-old). See my previous posts for other recent examples. We are given a bunch of good demographic data, but not much about their overall health; their BMIs would have been helpful, too. Most curiously, we are not even told if these subjects are married or not.  So, in toto, these subjects tended to be white 42-year old women who attended just under two years of college, and make about $33000 a year.

The average age of the first depressive episode was 24 but just under half (44.6%) of the subjects reported that their first episode began before age 18. This is red flag number one. This is not typical for major depressive disorder; it tends to have a later onset. This may only indicate that the age-of-onset data was wildly skewed, but nevertheless, almost half of the subjects had very early onset of depressive symptoms which is more typical of bipolar illness, personality disorders, substance-induced mood disorders, etc.

Further, a substantial portion were abused. Fully 21% had been sexually abused, almost 20% had been physically abused, and almost 40% had been emotionally abused. This is red flag number two. Based on National Comorbidity Survey data (as reported in the NCS Kessler data), we might expect that these numbers should be closer to: 4.4% experiencing physical abuse (0.32 * 3%  [percent of men in this study times rate of physical abuse reported by men in NCS] + 0.68 * 5% [percent of women in this study times rate of physical abuse reported by women in NCS] = 4.4%) and almost 16% experiencing sexual abuse (0.32 * 4% [percentage of men times rate of rape plus molestation in NCS] + 0.68 * 21% [percentage of women in study times rate of rape plus molestation in NCS] = 15.56%). So, as compared with traditional averages, 15% more patients in this study were exposed to physical abuse and 5% more patients were exposed to sexual abuse.

Also questionable (and this is alluded to in the accompanying editorial) is that the average index episode of depression lasted almost five years! This is red flag number three. How many patients have you  ever seen who had index episodes (or any episode) lasting so long? The typical median duration of an untreated major depressive episode is about 8 months. You are probably scratching your head, for good reason. Who in the world are these subjects, who have depressive episodes that last abnormally long? They might just be patients who over-endorse the nature of their depressive symptoms… …and the study authors should spot this and call bullshit, immediately.

And I can keep going, there are so many problems…and I’m going to stop calling them red flags, because there are just too many. 75% of subjects had co-morbid medical diagnoses and we are not told what they were or if (importantly) they were treated (because we sure as heck know they aren’t getting their 5-year long depressive episodes treated). It is of the utmost importance to determine if these depressive episodes are due to medical illness, which to be fair, is a tragedy of almost every other depression study– we need to know if these (or any) subjects have hypothyroidism, sleep apnea, anemia, autoimmune disease, and on and on and on.

And almost inexcusably, there is no indication as to whether these subjects were actively addicted to or abusing drugs or alcohol. There is no indication in the article that they were even asked about their drug histories or asked to submit to urine drug screens. I tell you, chronic alcoholism sure can produce 5-year-long depressive episodes. So can crashing from coke every day for five years. So can smoking 5 bowls of marijuana a day for 5 years.

In the discussion, the authors admit that they did not use a structured interview to diagnose depression. Reading this, I initially assumed that they meant they didn’t SCID the subjects upon enrollment, but then I realized that I was probably assuming too much. Maybe they didn’t interview the subjects beyond screening for depression and for the few exclusion criteria. We know they didn’t ask about drugs. I bet they didn’t screen for personality disorders. And it’s not clear they even asked about particular medical illnesses. We know these things because they didn’t report them and that none of these things were factored into the exclusion criteria. “But the study approximates real world psychiatry” you say…. No, it doesn’t. I don’t give my chronic alcoholics venlafaxine plus mirtazepine for their depressive symptoms when it is not clear that alcohol isn’t causing the whole problem. I don’t give escitalopram plus bupropion to my patients who have untreated hypothyroidism.

Word of advice– don’t draw any clinical conclusions from this study. Use it to better inform your research methods, so you don’t make the same mistakes these folks did.

The Concise Health Risk Tracking Scale = Suicide assessment by any other name

First, let’s be clear that this scale is horribly named. We need to be absolutely clear that perpetuating the stigma that it is not okay to talk about suicide should not be implicit in the name of a new rating scale designed to assess severity of suicidal ideation. The scale must have “suicide” in the title. That is, unless “health risk” is the new politically correct term for suicide and all the kiddies are calling it that. I doubt that is the case. And I also doubt that this rating scale is just a small part of a much much larger scale that tracks all sorts of health risks and that the study authors decided to publish a study about the suicide-related portion — well, I doubt this less, actually.

Second, I will probably use this scale in my clinical practice. The boiled-down 7-item version appears easy to use and has fairly good psychometric properties. And I think that any time you can put actual patient responses into their chart — and you do actually incorporate the results– that it is fantastic for risk-management.

So, the scale and related article I am talking about is “Concise Health Risk Tracking Scale: A Brief Self-Report and Clinician Rating of Suicidal Risk” and is found in the June Journal of Clinical Psychiatry. As stated in the Methods, the first goal of the study was to assess the CHRT scales themselves. The secondary goals were to assess SSRI-induced suicidality and report on suicidality assessment methods in various settings. This article reports on the first goal, but not the secondary goals (hopefully forthcoming). Who were the subjects enrolled in this study? Well, it is no surprise to me that they were 40-year old women (70.8% of the 240 subjects); this seems to be a given statistic of recent depression research. Furthermore, these subjects tended to be employed, with private insurance, married, have received previous psychiatric care, and have had at least one episode of depression. Average QIDS score was 14 (about moderate depression). What the authors may have wanted to you to gloss-over is that the subjects were possibly not terribly ill — exclusion criteria included patients who failed two previous SSRI trials, had taken antidepressants in the weeks prior to screening, or had taken an antipsychotic within four months of study entry. So what this means is that this study of suicidality assessed people who have not recently been in very rigorous treatment. But this may be okay, as the other goals of the study include looking at SSRI-emergent suicidality, so the authors probably want subjects who are somewhat treatment naive.

I like it when scales are simple. When a scale is short, I will probably use it in practice. I’m quite happy that the authors used factor analysis to identify items from the original scale that seemed to represent distinct clinical factors. For the CHRT, the scale stems loaded on three factors, representing the major general risks for suicide: “hopelessness, perceived lack of social support, and active suicidal thoughts and plans.” Scale or no scale, every clinician should be assessing these three areas with their suicidal patients, at a minimum. The authors assert that the scale’s psychometric properties are good, including that there was “high agreement” between self- and clinician-rated versions. That actually doesn’t seem to be the case. Correlations between ratings on both self- and clinician-rated versions ranged from 0.63 (only somewhat better than 50/50) to 0.81 (which is usually only deemed “acceptable” in other studies). So this begs a very good question: “Why would clinician and patient reports of suicidality differ?” One (my) answer –and it is scary because it probably true and frequent– is that patients probably underreport their suicidality; and, further, patients who are determined to commit suicide may not tell their doctors about it. Given these things, I am not terribly surprised that self-report and clinician ratings differ.

Seriously, though, lets look at this more closely. Table 5 from the article indicates that there is only 80% agreement between clinician- and self-report of having a suicidal plan. This kind of question stem allows for only a binary response — either you have a plan or you don’t. I would love to see all subject response data from this study put through a filter to force all responses (of the 5 total, from strongly disagree to strongly agree) into either agree or disagree. 80% agreement means that 20% of patients are probably withholding some degree of their suicidal plans from their docs because, well, the opposite –that psychiatrists are fabricating suicidal plans for patients– is rather unlikely. Even more worrisome (because suicidal ideation, alone, is so much more common) is that there was only 74% agreement between self- and clinician-rated suicidal ideation. Again, let me use the same arguments to state that this means that a whopping 26% of patients are probably withholding some degree of their suicidal ideation.

Using a Likert scale for responses to a suicidality questionaire does not seem to have good face validity. Take the one all-important factor of suicidal intent (which was very curiously left out of the scale). Either you have it or you don’t. Both “agreeing” and “strongly agreeing” with having it will get me to try to hospitalize you. Answering the neutral “neither disagree or agree” is disingenuous and may lead to involuntary commitment, too; it would signal that my patient is hiding something about their intent from me. And what self-respecting and lawsuit-fearing psychiatrist would choose this option in the clinician-rated form of the scale? After all, it is the Likert scale equivalent of “I don’t know” or “I don’t care.”

At the end of the article, the authors allude to the next, natural, step for a scale like this– to see if it predicts actual suicidal behavior. Doing that reliably has proven quite difficult. It is probably the Holy Grail of psychiatric research (that, along with certainly predicting homicide). And I hope this group gets closer to that goal.

DSM-5 Bipolar Disorder Revision Truth

We’ve all wondered at one time or another what kinds of forces drive revision/development of DSMs. We hope that those on the committees and in other drivers’ seats have the best interest of patients (at minimum) and the science of psychiatry (optimally) in mind. We know of the rumor that revisions are driven, somewhow, in dark shadowy realms and in closed-door meetings (are they different?) by Big Pharma. And we’ve heard those rumors try to be quashed. Now, in the latest public statement on Medscape about bipolar disorder criteria revisions, we hear exactly, and without sugar-coating, that drug-development is driving the revisions. And it makes me sick. It seems as though the cart is being put before the horse; revision for the sake of drug development rather than revision for the purposes of creating increasingly valid diagnoses — and that would fully be enough of a reason by itself.

“Many of these revisions are an effort to capture more clearly what our patients experience and to provide an opportunity to study in a more focused manner the full spectrum of mood disorders,” said Dr. Suppes.

“We’re basically working towards a new classification system, with new codes and new billable options, and I think it’s likely that out of this, new [Food and Drug Administration] targets for drug development will emerge.”

from: Criteria Changes for Bipolar Disorder Proposed for DSM-5

Time for a specialty change?

Into the Fray: Vilazodone

Is it a me-too drug if it combines the effects of two other drugs into one molecule? Well, yes. Can you call it a me-too drug of a drug that doesn’t do all that much? Good question. And we’re all asking ourselves when the last time we actually prescribed buspirone was. Vilazodone (Trade: Viibryd) is about the newest “novel” antidepressant on the market. It is a serotonin reuptake inhibitor and 5HT1A partial agonist. So if you are thinking “Isn’t that just fluoxetine plus buspirone?” you’d be right. It has a low Ki for the serotonin reuptake transporter (SERT) of 0.1 nM, and moderate/high affinity for the norepinephrine and dopamine reuptake transporters (56 nM and 37 nM respectively). It has moderate affinity for 5HT1A receptors (about 2 nM). For comparison, fluoxetine has a Ki for the SERT of 0.9.

The latest article describing the wonder-to-behold that is vilazodone is: A Randomized, Double-Blind, Placebo-Controlled, 8-Week Study of Vilazodone, a Serotonergic Agent for the Treatment of Major Depressive Disorder in the Apr 2011 J Clin Psych issue.

Before I get into the science of the article, let me be very clear about the glaring conflicts of interest with it. First, this study was funded by Trovis’ precursor company PGxHealth. Second, four of the eight study authors are Trovis Pharmaceutical “employees and stock holders” and two more are consultants to Trovis, and third –and most glaringly— this article may have been (wholly or in part) ghost-written by employees of ApotheCom, which received funding from Trovis; the article lists those contributors at the end — it’s only a little less unethical if it is disclosed.

The quick overview of this study is: recruit depressed patients, washout any other psychotropics, blindly randomize to vilazodone or placebo, and compare MADRS scores at the end with those from the beginning. Standard research trial design. 388 subjects completed the study from an initial pool of 481 who were randomized to active drug vs placebo. Who were the patients? They tended to be obese (about 190lb, avg woman is 5’6″ tall which gives you a BMI of 31.6) 40-yo white women with moderate-severity recurrent depression who are not taking any psychiatric medications (all from Table 1).  Are these your patients? I do have some patients like these, and let me tell you up-front, that a good deal of them have sleep apnea, which produces or exacerbates their depressive symptoms, but this study didn’t screen for that. It’s easy, you begin by asking your patient if he or she snores…. (as an aside, I do believe that many of these studies fail to separate appreciably from placebo because these patients aren’t really depressed– they have personality disorders or sleep disorders or one of a host of other things that just aren’t screened-for). Patients dropped out (about 20% in each arm) largely because they were lost-to-followup, had side effects, or withdrew consent (which sounds unusual to me). The primary outcome, change in MADRS scores, was -13.3 points for vilazodone and -10.8 points for placebo, leaving us all with our jaws dropped at the staggering 2.5 point advantage for vilazodone over placebo. Is that worth taking a drug to market? Really? If you aren’t familiar with it, take a gander at the MADRS and try to determine for yourself whether a change in score of 2.5 points is meaningful. I’ll give you a hint, it isn’t. 2.5 points represents the approximate (okay 0.5 points off) difference between adjacent anchors within an item/stem. It is the difference between “difficulties in starting activities” and “difficulties in starting simple routine activities which are carried out with effort,” for example. Further, only 13.4% more patients met criteria for “response” with vilazodone than with placebo. Finally, and most problematic (as it would for any drug) there was no difference between remission rates for vilazodone vs placebo (and the rates for both were less than 30%).

Now for some number-crunching of our own. In order to achieve that 2.5 point improvement, I need to give 8 patients (NNT = 7.46) 40mg of vilazodone. That’s seven patients that I’m wasting time with, who might be getting worse. To be fair, though, a NNT of 8 is respectable, overall, in medicine. For example, the NNT for treatment of bronchiolitis with inhaled epinephrine is 15 and the NNT for inhaled steroids for preventing admission for asthma attacks is 8. But a NNT of 8 for “response” on the MADRS really can’t be compared with these other outcomes. These other NNTs are about full treatment. 43% reduction (the vilazodone response rate) in bronchiolitis symptoms still leaves you with enough symptoms to keep you (or more likely, your child) in the hospital. If I want to talk about treatment to remission with vilazodone, which really, I can’t, because it didn’t separate from placebo, the NNT is 15.

Now for the side effects. Vilazodone will be billed as the no-sexual-dysfunction antidepressant (sorry Wellbutrin). They used a scale called the Changes in Sexual Functioning Questionnaire (CSFQ) to assess the sexual adverse effects of the drug and placebo. Decreased libido was reported by 4.7% of subjects receiving vilazodone. But almost 9% of vilazodone patients did have sexual adverse effects (not listed fully in the article). The NNH for sexual adverse effects is 12. The main adverse effect is diarrhea — in 30% of patients, or a NNH of 5. Next, there is nausea in 26% of patients, which also has a NNH of 5. There weren’t any real weight changes or EKG changes.

The thing that makes this drug different from SSRIs is that 5HT1A partial agonism, which I can get from adding buspirone, which begs the question — “Why not just add buspirone to SSRIs to reduce sexual adverse effects, for (likely) a heck of a lot less money?” Good question. And that is indeed possible. So would I give my patients vilazodone? Probably not. I might think about it only if I had a patient who really really insisted on taking only one medication rather than two.

Shockingly, the horse’s mouth didn’t give the medication rave-reviews: “The science behind vilazodone is ‘pretty good,’ Kahn said, adding that now it is down to the marketing whether the drug is successful.” Nope, you’re fired. What determines if a drug is successful is whether it works and works better than other drugs for its indication. Oh, but by “successful” I do think he meant “makes-buckets-of-cash-for-Trovis-Pharmaceuticals” rather than what we as physicians would consider success, which is “helps my patient feel less depressed and be more functional.” It does seem as the though clinical efficacy is a “secondary endpoint” and profitability is the “primary endpoint” of this whole endeavor. This isn’t shocking. It isn’t even news, because we know all this already. What is surprising, though, is that the lead author is so up-front about saying it.


If you’ve stumbled upon this blog, you probably can’t see anything of value….yet. Be patient. I’ll get some posts up here, but without any predictable regularity. My hope is that, this way, each post will be more substantial, rather than posting something that might just be filler. Hang in there.