One of the things I frequently do on this blog is to critique medical studies. There are also now a lot of blogs, youtube channels, etc. that do the same. However, the average person (understandably) really has no training on how to read or critique a medical study, so it occurred to me recently that providing an occasional post explaining some of the concepts that are frequently used when discussing medical studies might be a useful service to my readers.
Accordingly, today I will explain “power”, which is the term used to describe whether a study is or is not robust enough to detect the effect of an intervention. Let’s start with a silly example.
Imagine that we wanted to determine whether taking high dose fish oil could reduce the risk for a heart attack. Now let’s suppose that we designed a study in which we took one-thousand 50 year old men and divided them into two groups of five-hundred. The first group of five-hundred would get a fish oil supplement, and the second group would get a placebo. Then we follow them for one year. Now suppose that at the end of that year, one person in each group has had a heart attack. Would you conclude that fish oil is no better than placebo at preventing heart attacks?
Probably not. Most fifty year old men are not going to have a heart attack in the next year. So there’s no reasonable way to conclude anything from this study – there just weren’t enough heart attacks to be able to see if the rate of heart attacks was truly the same versus different in the two groups. We would say that this study is UNDERPOWERED to detect any possible effect of fish oil on heart disease.
But now let’s suppose that instead of one thousand healthy fifty year old men, we chose one million fifty year old men, thus giving us two groups of half a million men each. Even in healthy fifty year olds, out of one million men, we would expect to see at least a few dozen heart attacks over a one-year period. Thus in all likelihood, after one year, enough heart attacks will have happened that we will be able to see if any difference exists between the two groups (for example, 40 heart attacks in the placebo group but only 16 in the fish oil group), and thus draw some conclusions. Which brings me to point number one: you can increase the power of a study by making it larger.
Let’s go back to our original example, and imagine now that instead of increasing our study group from one thousand men to one million, we instead increased the timeline over which we followed them. For example, instead of following the men for one year, we follow them for 10 years. In other words, we give 500 men a fish oil from ages 50 to 60, and another 500 men a placebo over the same period. There’s a very good chance that over the 10 years of follow up, there will be a number of heart attacks in each group, and at the end we’ll be able to do a tally and see if there’s any meaningful difference between the two. Again, we’ve increased our study’s power. And if instead of 10 years we followed the men for 15 years or 20 years, we would increase the power even further. So point number two: another way to adequately power a study is to make it stretch out over a long enough timeline.
Now let’s go back again to the original example of 1,000 healthy men followed for a year, only this time let us not choose healthy 50 year old men, but rather diabetic 75 year old men who smoke, again randomized 500 to fish oil and 500 to placebo and followed for a year. Men in their 70s who smoke and have diabetes are at extremely high risk for cardiovascular events, and there’s a good chance that out of the 1,000 men in the study at least a few dozen will have a heart attack within the year. Once again, we will probably now be able to see if there’s a meaningful difference between the two groups. Point number three: you can increase a study’s power by starting with a higher risk group of patients.
There are other ways to increase power too, but they are best left to statistical nerds. For the lay reader, an understanding of the above concepts will suffice to understand two key points:
When reading about a study that does NOT find any benefit to a particular intervention (e.g. “Fish Oil Does Not Appear to Prevent Heart Disease, New Study Finds”) one reasonable question to ask yourself is if the study was adequately powered. Is it really true that there’s no benefit to taking fish oil? Or is it possible that the researchers studied too small of a group for too short of a period with not enough risk of cardiovascular events to detect that benefit?
Understanding the concept of power also sheds some light on WHY scientists design studies the way they do. Time and money are limited, so while it would be nice to do decades long studies on millions of people for every question we have, in the real world there is no practical way to do this. Most of the time, therefore, researchers have to negotiate the tradeoffs that are required to adequately power their study. This is one reason, for example, why most studies of cardiovascular disease have been carried out in men with risk factors such as high blood pressure, diabetes, tobacco use, etc. It’s not only, as some activists would have it, that medicine has a blind spot toward women or toward studying healthier people (though I believe that those are also both correct critiques) but rather that if you want to find out, for example, if prescribing a statin can reduce the risk for a heart attack or stroke, you will need to follow a lot fewer people for a much shorter period of time if you start with a group of patients who are at very high risk for having a heart attack or stroke in the near future. Doing the same study in healthy, younger, women, would require enrolling many, many, more people in the study and following them for a much longer period of time, which may just not be economically or logistically feasible.