Lessons in Confirmation Bias, Double Standards, Strawmen, Edema, Blinding, and Research Critiques: My Final Response on our Volume Study

I debated whether to bother writing this post, as I've already spent a large amount of energy answering questions about our recent volume study.  Now, I have no problems with questions or critiques of our study.  Constructive criticism is one of the things that helps move science forward.  Scientists do studies (all of which have limitations), and other scientists do more studies to improve upon the limitations of previous work.  Over time, a body of knowledge is developed.

However, one thing that I will not tolerate is misrepresentation of our work, or of our positions, in an attempt to smear, defame, and paint a picture of bias and lack of integrity.  Thus, this post is meant to make our positions very clear, and show how we have been misrepresented.  It is also meant to be a teaching moment in terms of how to maintain consistent standards when evaluating research, and to help people understand some of the logistics and what goes on "under the hood" when small studies on human subjects are performed.  Finally, it is meant to expand on a few comments that have been made regarding our study (like the edema issue).  This will be the last post I make on this.

Storytelling

A long time ago, in a blogosphere far, far away, I wrote a post on Gary Taubes.  I discussed how Gary liked to be a storyteller.  To be an effective storyteller, he would weave a particular narrative.  His story would certainly be convincing to someone with little background in science and nutrition.  Of course, to me, his story wasn't convincing at all...it was full of holes.

The thing with Taubes was that he would avoid discussing any data that contradicted his narrative.  He would selectively report only data that confirmed his beliefs, and would misrepresent the positions of many obesity researchers.  I wrote one analysis of one chapter in his book, discussing some of this.  For a more thorough analysis, check out Seth Yoder's blog.

This brings me to the point of this post.  Lyle McDonald has written a blog post regarding our recent volume study.  And, like Taubes, Lyle weaves a certain narrative.  And, like Taubes, Lyle selectively reports things that fit with his narrative, while ignoring other things that don't fit with his narrative.  And, like Taubes, he misrepresents the positions of researchers themselves.

Let's get right to his post and see what he has to say (and what he failed to tell you).  In this post, I will also address some criticisms put forth by some others (such as the edema issue).

First False Statement

Lyle starts early with statements that are demonstrably false.  He says:

My questions at Brad or James went completely unanswered with any number of deflections and obfuscations occurring throughout.

Of course, his questions did not go unanswered.  I wrote two blog posts here and here addressing some of the questions that had been raised.  However, Lyle failed to link to the first post, which addressed at least some of the questions that he had raised.  Lyle also failed to link to this blog post by Brad Schoenfeld which also addressed some of the questions.  So already Lyle is not giving a full presentation of what has been discussed (and there will be more examples of this later).  Lyle's claims of "deflections and obfuscations" are also absurd.  We provided nuanced answers.  Those aren't obfuscations.  Nuance is how science works.  Now, perhaps Lyle wasn't satisfied with the answers provided, but that's a separate issue; the claim that his questions were "completely unanswered" is clearly false.  If Lyle wants to proclaim "intellectual honesty", then he should link to all of our posts, not just one (which in fact had the least relevance).

Argumentum Ad Nauseam

Lyle continues on and on about how we "can't" or "won't" address the issues, even though, as I've linked above, at least some of the issues were addressed.  In fact, throughout his post, as well as his incessant emails, PMs to us, and constant obsessive posts to his FB group, Lyle continues to repeat the same claims that he hasn't been answered or that we won't address the issues.  Lyle employs a tactic here known as argumentum ad nauseam.  He will continue to repeat the same statements over and over, despite at least some of those claims being shown to be wrong.  This argument tactic, often used in politics, starts to convince people that it's true.  But repeating the same thing over and over doesn't make it true.

Lyle also misses the obvious why we have stopped responding to him.  He has incessantly sent us emails and PMs laced with ranting insults and profanities (I count 14 emails in my spam folder, and that doesn't include the PMs or the comments he's tried to leave on my site).  He has also been obsessively posting the same things to his group, almost daily.  He has practically engaged in a smear campaign against myself and my colleagues.  At some point, a rational person will simply stop responding...it does not become worth the time or energy to respond.  This is particularly true when the individual continues to ignore points that have already been made to him.  It is impossible to have any sort of productive discussion.  The constant barrage of emotional hatemail also causes one to question whether such a person can actually be objective about our study.

Lyle's modus operandi appears to be:

  1. Lyle:  Makes incessant abusive emails, PMs, and posts
  2. Us:  Block him due to the abusive nature of his interactions
  3. Lyle:  SEE?  THEY CAN'T ANSWER MY QUESTIONS!

Lyle then goes on to discuss how he was wrong about the statistics in the study, and took a post down about it, as a sign that he could "admit he was wrong."  However, this completely contradicts his own statements in other places, including statements in emails that he has sent us...statements like the following:

But you keep the circle jerk going.  I’ll be shown to be right in the long run

I always am

That's not really the language of someone who is open to being wrong.  In fact, that language (and the behavior regarding incessant emails and PMs) are characteristic of someone who has already made up his mind, and no evidence will convince him otherwise.  In fact, there are numerous examples, of which I'm going to outline, that demonstrate that Lyle bases his decisions on whether his preformed beliefs agree with the outcomes of a study.

Why the Lack of Differences in Strength Gains?

One perfect example of how Lyle ignores answers that have been provided to him is in his statements about the strength gains.  We did not observe significant differences in 1-RM strength in this study.  According to Lyle, that is evidence there was no difference in muscle gains.  Lyle states:

A bigger muscle generates more force and the lack of a difference in strength gains suggests/implies that there were not actually differences in muscular gains.

Of course, this has already been addressed in my other blog post.  Lyle is simply incorrect in his implication that a lack of difference in strength gains means there is no differences in muscle gains.  While there is a relationship between strength and size, the relationship is not 1:1.  In fact, the relationship can be very weak depending upon how strength is being measured.  1-RM improvements can be very poor proxies for changes in muscle size as I've pointed out in a thorough research review on progression and hypertrophy for my research review subscribers.

To provide even more evidence that 1-RM changes cannot be reliably used to infer changes in muscle size, Bamman et al. had subjects perform 27 weekly sets on quadriceps.  They classified the subjects into extreme responders, modest responders, and non-responders based on hypertrophy.  They determined fiber hypertrophy via measurement of individual fiber area; thus, extracellular edema is NOT a confounder here.  Despite a massive difference in the gains in muscle size between the different groups, there were no significant differences in improvements in leg extension 1-RM.

The Preconceived Bias Strawman

 

A strawman is where you misrepresent a person's position to make that position easier to dismantle.  Thus, you have the phrase "beating up a strawman."  The problem is that you aren't actually addressing the original statement...just a strawman caricature of it.

In the section "Are Brad and James Biased?", Lyle quotes the title of a letter-to-the-editor by us regarding resistance training volume and hypertrophy.  He goes on to say:

Basically, Brad and James (and I don’t know the third guy) are already convinced that there is a relationship with volume and hypertrophy.  Admittedly this was based on an earlier review but this is their pre-existing belief system: more volume means more hypertrophy.  This spells potential bias.

Lyle sets up a strawman by failing to reveal important information regarding our position.  He doesn't tell you the whole story.

First, the statement about a "dose-response relationship" was based on the results of our 2016 meta-analysis, where we saw a dose-response relationship up to 10+ sets per week.  In fact, here's a FB post by Brad describing what we found:

1. There is a graded dose-response relationship whereby hypertrophy progressively increases from 1-4 weekly sets per muscle group per week, to 5-9 weekly sets per muscle group per week, and then 10+ sets per muscle group per week
2. 10 sets was the minimum dose we were able to identify to maximize muscle growth; there was not sufficient data to establish where the maximum threshold lies.

Here's an excerpt from a blog post by Brad stating what his thoughts were on volume long before this current study was carried out.  It is very clear that Brad makes no assertions about any sort of dose-response effect above 10 weekly sets:

So how many sets should you perform to maximize hypertrophy? That remains to be determined. While 10+ weekly sets per muscle was established as a minimum threshold, we were not able to determine an upper threshold where optimal muscle growth is achieved. The effects of volume on hypertrophy undoubtedly follows an inverted-U curve, whereby results progressively increase up to a certain point, then level off, and then ultimately decrease at exceedingly high volumes due to the negative consequences of overtraining.

Even Brad's own published papers completely refute Lyle's claims that we believed a priori that there would be a dose-response up to very high volumes.  Here's an excerpt from a published review article, which was published ahead of print in 2017 (final print version appeared in August of 2018), months before data collection of our volume study was complete.

While lacking in empirical evidence, it is possible that the dose-response relationship between RT volume and muscle hypertrophy follows an inverted U-shaped curve, whereby excessive RT volume would lead to negative adaptations. A recent study that tested the effects of German Volume Training provides further insights on the topic (2). The researchers compared two groups, of which, one performed 31 sets per training session and the other 21 sets per training session. Following six weeks of RT, increases in trunk and arm lean body mass favored the lower-volume, 21-set group. These findings suggest that volume should be increased only up to a certain point, and anything above might actually impair recovery.
This contradicts any claim of a belief of a dose-response up to the highest volumes, and refutes the notion that we somehow found what we wanted to find in the recent volume study.
In my massive review on volume for my research review subscribers, I had been consistently been stating since I first wrote it in 2017 that there was insufficient evidence of any benefit beyond 12-18 weekly sets, and I did not change that until the results of our study were analyzed.

Even Lyle himself has quoted Brad as suggesting 10-20 sets as optimal.  Here's a direct quote from his FB post in his group on May 26, 2018:

Thus, only 4 months ago, Lyle was correctly stating our a priori beliefs.  Yet now he's distorting them to make it appear as if we had some sort of preconceived bias regarding very high training volumes, and that we had some sort of expectation of a dose response up to the highest volumes.  This clearly was never our position as I've proven through various links, and Lyle even showed that with his own words back in May.

I'd be willing to bet that if our study had showed the high teens as the "sweet spot" with hypertrophy, Lyle would not make any sort of comments about observer bias, timing of ultrasound imaging, a priori beliefs, etc., because it would confirm what he already believed.    In fact, if we had found a sweet spot (dose response up to teens with no further increase beyond that), that would've been more in line with our a priori beliefs!  This completely refutes any notion that our beliefs before the study had an influence on our findings.  As I stated in another post, we were in fact surprised by our findings.

Confirmation Bias

Lyle talks about the lack of blinding in the study.  This is where Lyle's confirmation bias reveals itself very strongly.  What is confirmation bias?

 

I wrote a detailed blog post on confirmation bias a number of years ago.  In that post, I mentioned a very interesting study performed in 1979.  Students were given a series of studies in favor or against capital punishment.  Students felt the studies in favor of their point of view were superior to the contradictory studies.  However, all of the studies had the exact same methodology, only different results.  And this is exactly what Lyle has done in his blog post.  He will make claims that our study is flawed and unreliable due to issues such as lack of blinding or the possibility of edema, but the other studies he cites as being reliable, such as Ostrowski, have the exact same limitations.  In fact, the vast majority of resistance training studies on hypertrophy have many of the same limitations, yet strangely Lyle hasn't said anything about it until a study is published with results he doesn't agree with.

A hallmark of confirmation bias is where you hold studies you don't agree with to a higher standard than studies you do agree with.  In Lyle's post, he demonstrates this double standard.  He will claim our study cannot be trusted because the lead author was not blinded to the ultrasound results, yet he cites Ostrowski as a quality study "with attentive scientists"even though the researchers also were not blinded to the ultrasound results.  Lyle makes no mention regarding blinding in Ostrowski; apparently it's now not an issue to him, perhaps because he agrees with the outcomes of the study.  In fact, in many resistance training studies, the researchers are not blinded to the outcomes.  Yet strangely it's never been an issue for Lyle, until now.  Thus, if Lyle is going to claim "high risk of bias" for our study due to the lack of blinding, then he needs to claim high risk of bias in nearly all resistance training studies, including Ostrowski and others that support his preconceived views.

An even more obvious example of Lyle's double standards regarding the ultrasound are in his comments on Ostrowski.  Regarding Ostrowski, he states:

Basically, since Ultrasound is subjective, they did pilot work to ensure that he would be consistent across measurements.  This is the mark of attentive scientists.

Yet, Lyle fails to mention this section in the methods of our study:

The between-day repeatability of ultrasound tests was assessed in a pilot study in a sample of 10 young resistance-trained men. The test-retest intraclass correlation coefficient (ICC) from our lab for thickness measurement of the elbow flexors, elbow extensors, mid-thigh and lateral thigh as assessed on consecutive days are 0.976, 0.950, 0.944 and 0.998, respectively. The standard error of the measurement (SEM) for elbow flexor, elbow extensor, mid-thigh, and lateral thigh MT was 0.70, 0.83, and 1.09, and 0.34 mms, respectively.

Thus, not only did we do a pilot study, but our reliability metrics were superior to Ostrowski.  We had superior ICC's (for example, Ostrowski reports 0.91 for triceps while we report 0.95), and also superior CV's (for example, Ostrowski reports 5.5% for triceps, while ours was 1.8%).  But Lyle doesn't mention that as it doesn't fit with his narrative.

I also addressed the potential for observer bias in this blog post.  I noted three examples of studies Brad published (and of which he was the ultrasound tech) that went in a completely different direction compared to his a priori beliefs.  Can observer bias be completely ruled out in our volume study?  Of course not, but given that the best predictor of future behavior is past behavior, Brad's publication of past studies that refuted his own beliefs indicate that observer bias is likely not an issue.

Lyle's issue with the blinding also reveals a contradiction in his criticisms.  On one hand, he is insinuating that Brad influenced the ultrasound results to somehow get what he wanted.  On the other hand, he claims the results were due to edema.  So which is it?

Lyle's double standards regarding study methodology can also be observed in his past study reviews.  Here's an example where Lyle reviewed a study by Brad on volume equated rep ranges and hypertrophy.  Lyle makes no comments about blinding or observer bias, despite there being no blinding.  Lyle also makes no mention that the randomization procedure was not described (it only states that the subjects were randomly assigned to their respective groups, similar to our volume study).  Lyle also makes no mention of the fact that ultrasound images were obtained 48-72 hours after training.  The intraclass correlation coefficient for the ultrasound measurements was 0.84, which in fact is worse than our volume study.  So why were these not issues then, but suddenly these are now?  I would contend it's because Lyle happened to agree with the outcomes of that study, so those issues weren't a problem for him.  But now, suddenly, those issues are a problem when a study is published that doesn't agree with his preformed beliefs.

Here's another example in Lyle's FB group where he talks about Brad's 1 versus 3 minute rest study, saying how the results support tension as being greater than fatigue.  Yet he doesn't mention how there was no blinding in that study, nor does he complain about the fact that ultrasound images were taken 48-72 hours after the final training session.  He also makes no mention of the randomization procedure not being described (again, it only states that the subjects were randomly assigned to their respective groups).  So why were those not an issue then, but suddenly they are now?  Again, it's most likely because the results of the study matched up with his preconceived beliefs.

Greg Nuckols made a an excellent comment regarding having consistent standards when evaluating research:

It's bothersome when people hold research to different standards based on whether or not they agree with the outcomes of the study. It's perfectly natural (it feels better to look for reasons why we're right than to look for reasons we may be wrong), but it's something we should try to avoid.

If you count something as a flaw in a study you don't like the results of, you should be consistent in recognizing that flaw in similar research, regardless of results. If you're willing to overlook a shortcoming in a study whose results you like, you should be consistent in overlooking that flaw in similar research, even if you don't like the outcomes of a particular study.

So, for example, if you don't like the results of the study, and you say that the outcomes were confounded by the fact that ultrasound measurements were taken 48-72 hours after the last training session (which may not be enough time for all edema to dissipate), you shouldn't then turn around and approvingly cite research with this same shortcoming. If you think the timing of ultrasound measurements confounded the results of the study you don't like, you should also think that the timing of ultrasound measurements confounded the results of research you do like. The same thing applies to other possible shortcomings or sources of bias, such as lack of blinding. If you think the results of a paper you don't like were biased due to lack of blinding, you should hold all other research up to that same standard.

Objectively evaluating research requires holding all similar research up to the same set of standards. If you're willing to overlook flaws in papers that support a pet theory, while simultaneously nitpicking every possible problem in research that runs counter to an idea you cherish...that's a problem.

The Edema Issue (UPDATED 9/26/2018)

Note:  A number of changes has occurred to this section after further discussion with Lucas Tafur.  I had originally used the proportion of increase between muscle fibers and the interstitial space as evidence that hypertrophy with chronic high volume training is likely not due to edema, based on an excerpt from Roy et al on the use of the ratio to signify edematous changes.  However, Lucas Tafur pointed out there were problems with using such data.  Thus, I've removed that section.  Also, it was brought to my attention regarding data by Damas showing the presence of muscle swelling/edema even in the absence of any significant muscle damage.  I've updated the discussion regarding that.  Finally, I removed discussion of Ostrowski as it's not clear when they performed their ultrasound measurements, and I have not received word from the authors on the timing.  It was likely a minimum of 72 hours, given the structure of the training program.  They trained 4 days per week (likely MTuesThFri), and the last day of training triceps would be on Friday if that schedule is correct.  Since it is unlikely they performed measurements on the weekend, that would imply measurements were performed on Monday at the earliest.  This is of course speculation and we cannot know for certain without information directly from the authors.  In addition, I've added comments by Andrew Vigotsky and Brad Schoenfeld on the use of echo intensity to assess edema, and some additional comments on the study by Bartolomei.

The issue of the timing of the ultrasounds and potential edema continues to be brought up.  Ultrasounds in our study were performed 48-72 hours after the final training session.  Lucas Tafur brought up acute studies (such as this one, this one, and this one) showing edema up to 72 hours after a session in trained subjects.  Their thought is that edema is a confounder and the higher volume groups may simply have more edema (or their edema is the same, but it lasts longer and doesn't dissipate after training as quickly).

However, the three studies they cite are acute studies using protocols to which the subjects are not accustomed.  Thus, they would be expected to cause muscle damage and edema, even if the subjects can be classified as "trained".   I made this point in one of my other blog posts.   It is very clear that the subjects were not accustomed to the protocol, as they had a significant increase in soreness that lasted 72 hours.  Peak torque was also significantly reduced at 48 hours, and torque production is one of the best indirect markers of muscle damage.  This shows that this was a damaging protocol in trained subjects, because they were not accustomed to it (despite being "trained").   The Bartolomei study looked at other indirect markers of muscle damage (like CK), in trained subjects, and showed significant changes in all the indirect markers, including markers of muscle force production. Again, these are trained subjects, and all the evidence indicates that muscle damage occurred, which would signify that they were not accustomed to the protocol.  The Ahtiainen study also showed significant changes in all of the indirect markers of muscle damage. Thus, it is highly likely that muscle damage occurred in this study, despite the subjects being classifed as "trained".

Overall, there is sufficient evidence to indicate that significant muscle damage occurred in all three of the studies referenced on trained subjects,  Thus, you can't take acute studies where subjects are going to get muscle damage from an unaccustomed protocol, and extrapolate that to a chronic training study that lasts many weeks.  This is due to the impact of the repeated bout effect on on muscle damage.

Brad Schoenfeld posted about a study by Damas et al. demonstrating the impact of the repeated bout effect on muscle damage.  After the first training session (6 sets on quadriceps), there was a significant increase in muscle damage at 48 hours.  This also coincided with an increase in soreness and a decline in MVC, both indirect markers of muscle damage.  After 19 training sessions, there was no longer a significant elevation in muscle damage at 48 hours after doing 6 sets.  There was also no significant decline in MVC and no significant increase in soreness.

Now, while edema is an indirect marker of muscle damage, you can have increases in edema with no significant increase in damage.  This was demonstrated in other data by Damas.  They found evidence of edema (assessed by ultrasound echo intensity) at 72 hours after a session even after 10 weeks of training, despite no significant increase in muscle damage.  However, the amount of edema was minor relative to the total increase in muscle size.  Contrast that with measurements at 3 weeks, where most of the increase in muscle size could be attributed to edema.  The amount of edema remained relatively constant throughout the study.  If we use the 3-week increase in CSA as attributable to mostly edema, and if the amount of edema remains constant, that would mean that, after 10 weeks, about 26% of the increase in CSA could be attributed to edema.  This was with 6 sets performed twice per week.

Now, it's important to note that one issue with the Damas paper is that echo intensity may not be a valid way to assess edema or be an indicator of edema over the long term.  Andrew Vigotsky commented:

I think there are a lot of leaps with many of the edema arguments. For instance, I am still not convinced that EI is a valid way to assess "edema", especially independently of other changes, such as connective tissue content. A lot of these methods don't seem to have established construct validity.

Also, what is considered "edema"? Is it ICW, ECW, or both/either?

In other words, the increase in echo intensity observed in Damas may not be edema at all.  It could be increases in connective tissue content or something else.  Brad Schoenfeld also commented to me:

Echo intensity is non-specific. We don't know what it shows. Perhaps in the untrained subjects in Damas et al there was an increase in glycogen content (which we know occurs) that ultimately attracted additional water in the muscle. Would this occur to much of an extent in already trained subjects? Likely not. And even if so, would there be any differences between 1 vs 3 vs 5 sets? That's more than a big stretch, IMO.

Unfortunately we don't have much other data on the impact of chronic training on acute increases in muscle thickness.   In this study, scientists examined the impact of the repeated bout effect on acute changes in muscle thickness.  After the first training session, there was swelling present 48 hours after training.  After 16 training sessions, there was no swelling present at 48 hours after training.  Now, a confounder here (like Damas) is that training volume was only moderate (4 sets per session; 12 weekly sets).  Another confounder is that training loads were light (BFR or 40% 1-RM).

The question is whether this happens with higher volume training (such as 8+ sets per session).  Lucas rightly points out there are no studies that have specifically examined the impact of chronic high volume training on the acute muscle swelling response.  There are three possibilities:

  1.  48-hour changes in muscle thickness disappear with multiple weeks of training similar to what was observed by Farup et al.
  2. 48-hour changes in muscle thickness still persist after 48 hours despite multiple weeks of training, but it remains constant (as suggested by Damas et al although there are limitations to the use of echo intensity) and is not affected by training volume
  3. 48-hour changes in muscle thickness still persist after 48 hours despite multiple weeks of training, and more volume results in more swelling which would confound the results (either decrease the difference between groups, or result in no difference)

If #1 is true, then edema would not be a confounder in our results.  If #2 is true, it would not be a confounder since it would be constant among the three levels of volume.

Unfortunately, #3 has yet to be formally tested in scientific research.  The closest we have is the study by Bartolomei et al., which used a within-subjects design and compared 8 sets of 3 to 8 sets of 10.  8 sets of 10 resulted in a 48-hour increase in thickness of 1.7 mm (7.7%), while 8 sets of 3 resulted in a 48-hour increase of 1 mm (4.6%).  This brings up the possibility of a volume effect, but unfortunately, as pointed out earlier, this is an acute study using an unaccustomed protocol.

We can also look acute 48-hour increases in muscle thickness across different studies, and see if there appears to be any sort of volume effect.  In other words, do studies that use higher session volumes (like 8-9 sets) tend to show greater increases in muscle thickness compared to studies that use lower session volumes (like 4 sets)?  Here's a table of studies of which I'm aware (there could be more; if anyone would like to point one out, please let me know).  I've stuck with studies that look at muscle thickness (not CSA or other metrics) so that we're comparing apples to apples.  I've also stuck with studies using traditional resistance training (not maximal eccentric action or other training that isn't applicable to traditional training).

 

It appears from this data that there does not appear to be any volume-dependent effect on the 48-hour increase in MT.  The increase in MT ranges from 1-2 mm and 4-7.7% across most studies that range from 4 to 9 sets in a session.  While the highest percent change (7.7%) was observed for an 8-set session, the lowest percent change (4%) was also observed for an 8-set session.  The largest absolute increase was observed in Radaelli et al., with a 5 mm increase.  However, their baseline thickness was dramatically different from other studies (it was 100 mm); the percentage change was very similar (approximately 5%).

How do these session volumes compare to the volumes in our study?  For upper body, the lowest session volume was 2 sets per muscle group for upper body and 3 sets per muscle group for lower body (weekly volume was 6-9 sets).  The moderate volume group was 6 sets per session for upper body and 9 sets per session for lower body (weekly volume was 18-27 sets).  The high volume group was 10 sets per session for upper body and 15 sets per session for lower body (weekly volume was 30-45 sets).  Thus, on a per session basis, the volumes in these studies are mostly comparable to the low and moderate groups.  The weekly volumes are in between the low and moderate groups in our study.

There are three large limitations to this analysis.  First, nearly all the studies involve subjects who are unaccustomed to the protocol; thus, the impact of chronic training is still mostly unknown.  Second, I'm comparing across different studies.  You would need a study that directly tests different levels of volume to truly know the impact, similar to Bartolomei except using the same loading scheme and different numbers of sets.  Finally, there is no data on the highest session volumes (10-15).

Now, what would this mean for our study if there is edema present and it's constant at 48 hours?  It would imply that there are still differences between groups, but the absolute  magnitude of "true" hypertrophy is less than observed.  In other words, let's say that the typical increase in edema is 1 mm on biceps and is unaffected by volume.  Our results were 0.7, 2.1, and 2.9 mm for the three levels.  This would imply that the true hypertrophy was -0.3, 1.1, and 1.9 mm.  Thus, the dose-response is preserved as the edema is constant.

On a final note, we can look at the study by Radaelli and colleagues, where the highest volumes were 45 weekly sets on triceps and 30 weekly sets on biceps.  The study lasted for 6 months, much longer than the typical 6-12 week study.  It was very well controlled as the subjects were confined to an aircraft carrier.  While the subjects were untrained initially, 6 months of consistent training is certainly more than enough to test the repeated bout effect, especially given that they did whole body workouts 3 days per week.  The timing of ultrasounds on this study was 2-5 days; thus, at least some of the subjects were tested beyond the 2-3 day window that is estimated to be problematic.  Here were the results for muscle thickness:

It is important to point out that our study was essentially a replication of Radaelli; the training protocol designs were very similar, and we observed similar results.  This brings up the concept of replication which I will discuss later.  What is important here is that the differences in gains were very large, much larger than any impact that edema would have.  For example, the changes for biceps were +0.5 mm, +2.8 mm, and +6.1 mm, respectively for the low, medium, and high volume groups.  For triceps, it was +0.2 mm, +0.7 mm, and +8.4 mm.  It is very unlikely that edema would account for such large differences after 6 months of consistent training, especially when the ultrasounds were taken up to 5 days after the final session.  I will also note that this study contradicts Lucas Tafur's claim that the differences between our study and Ostrowski or Heaselgrave can be accounted for as the result of differences in the volume of direct arm work.  In this study, the subjects did a combination of direct and indirect work on biceps and triceps, and yet they still observed a graded response.

Overall, edema cannot be ruled out without directly testing for it.  It also cannot be ruled out as a contributor (although not the sole reason for hypertrophy) that may reduce the magnitude of the differences between groups.  However, the little data we have suggests that either edema mostly disappears with chronic training, or, if it persists, it stays relatively constant.  We also need to be careful because the technique used to assess edema over the long term (echo intensity) may not actually reflect edema.  Also, if it is edema, it only makes up a minor portion of overall hypertrophy.  Finally, there's insufficient data to know whether set volume would impact the acute muscle swelling response 48 hours after training, after multiple weeks of training and becoming accustomed to the protocol.  The very large differences observed in Radaelli despite measurements being taken up to 5 days after the last training session would also suggest that edema is not playing a significant role in the outcomes.  Unfortunately, we cannot know for certain the answers without research that directly examines the impact of chronic volume training on muscle swelling over a 48-72 hour period and beyond.

The Pragmatic and Practical Aspects of Research

One thing that is glaringly obvious in Lyle's critiques is his lack of experience in actually carrying out the process of research on human subjects.  He simply doesn't realize that some of the solutions he proposes are simply not feasible in small exercise science studies, due to lack of funding and personnel.  For example, in regards to blinding, Greg Nuckols writes:

As for blinding, it's a matter of how many hands you have available for training visits. Training 30+ people for a study is no small thing. Blinding for data collection would mean that at least one member of the research team couldn't help with any training visits. Unless you have a big research team, or unless you're working with a big grant and have the money to hire research assistants, that can be an insurmountable barrier (notice on the author list, there are only three other people from Lehman). Also, keep in mind - Brad's program doesn't have a PhD program, so all of his students are Masters students. If Brad wasn't involved with training visits at all, he wouldn't be doing right by his students; he'd be forgoing an important opportunity for mentorship (and the standardization of the training visits would be much worse; you can trust a study to PhD students, but generally you don't completely turn the reigns over to a group of Masters students with no oversight). The same applies to the ultrasound scans themselves. Taking good scans in a skill that requires quite a bit of practice. There's no doubt that Brad can take a better ultrasound scan than his students.

So basically, there are three potential scenarios here:

1) Brad isn't blinded so he can mentor his students, help with training visits, and take good ultrasound scans. This introduces the risk of bias.

2) Brad isn't blinded so he can mentor his students and help with training visits, but one of his students is blinded (and thus doesn't help with training visits) so they can take ultrasound scans without risk of bias. This increases the odds of shoddy data collection, and also makes the training schedule much more onerous for the other students.

3) Brad is blinded and takes the scans, which means he doesn't help with any of the training visits. This means he can't mentor his students, and the odds of training getting screwed up are much higher.

It seems incredibly obvious to me that 1) is the best option. And to anyone who's worked in a lab, that should be obvious. Since Lyle only briefly helped with some studies in undergrad and doesn't have serious time working in a lab, though, that may not be obvious to him.

Eric Helms also commented on the practical aspects of blinding (as well as the need for consistency in how research is critiqued):

I agree it’s always ideal to do as much blinding as possible. However, the participants visited the lab 26 times for this study – twice to do pre- and post-testing, and 24 times to be trained. If you blind your ultrasound technician to group assignment, this means they can’t be present for training, which is over 90% of the lab visits. In exercise science, where personnel is limited, this may create an insurmountable logistical barrier. It’s worth noting that technicians aren’t blinded in the vast majority of studies in our field. So, if a lack of blinding did create bias, the same bias has occurred in nearly every other study published in this area. Personally, when I see this brought up, it makes me wonder why it’s being presented as something out of the norm, when that’s not the case, and why it hasn’t been mentioned in relation to the other studies on this topic.

Eric Helms also stated the following to me regarding Lyle's inexperience with how research is conducted:

Lyle implies that everyone finished up training on the same day, which is why scans had to be taken 48-72 hours after training (otherwise, it may take too long to do them all in a day).  No way in hell all post-testing was done within a 48-hour period; he doesn't get that an 8-week study might take a year and has open enrollment.

Ostrowski...Again

Lyle continues his argumentum ad nauseam tactics and continues to misrepresent our position on Ostrowski, despite the fact that I've already explained it.  For example, he states:

...those previous non-significant results became significant now in the discussion of this most-recent paper, which seems awfully convenient.

However, we never said that the results in the Ostrowski paper were significant.

Lyle once again distorts when he says:

When James couldn’t argue this anymore, he finally admitted that yes, it was a misrepresentation but that it wasn’t deliberate.

I never said it was a misrepresentation.  I only said that I could understand how someone might perceive it as such.  Here is what I said:

Now, I can understand how someone might read this and infer that we were claiming a dose-response relationship in Ostrowski, even though it was never claimed.  We never intended to make such an implication, so any claims to the contrary are simply a misunderstanding.  And anyone trying to claim anything different are assigning intent to us without actually knowing the intent.

Lyle's comments on Ostrowski also display his ignorance regarding word counts in journals.  Greg Nuckols writes:

...the quads are probably the reason they didn't discuss the middle data point of the Ostrowski study, not the triceps. It would be easy to say "In Ostrowski et al the MOD and HIGH groups had similar triceps hypertrophy, indicating that higher volumes only led to increased hypertrophy to a point. However, we found evidence for greater triceps hypertrophy in HIGH than MOD in our study. Reasons for this difference may include...." That's straightforward and doesn't add too many words.

On the other hand, the low-volume group actually had slightly larger increases in quad CSA than the moderate-volume group in the Ostrowski paper. Good luck constructing a physiological rationale for low to be better than moderate, and high to be better than low. That's pretty clearly just a matter of small sample size and sampling variance. Getting into all of that would take a ton of words that they probably couldn't spare to meet the journal's word count rules.

Brad Schoenfeld has also explained why Ostrowski was discussed in the way that it was and, like Greg, discussed the issue with brevity in the discussion.  The problem is that Lyle doesn't understand journal word count limits and assumes malintent and tries to distort the presentation of Ostrowski as some sort of deliberate conspiracy.

Lyle's inferences that we were somehow trying to push some "more is better" agenda with Ostrowski in the discussion section completely ignores the obvious.  If we were so interested in pushing such an agenda, we would've talked about Radaelli, not Ostrowski, in that part of the discussion.   Also, if we were so interested in pushing such an agenda and somehow trying to distort Ostrowski to achieve that, we would have left the middle data point out in numerous instances where we have discussed Ostrowski in various forums or media.  For example, here's a table from my Volume Bible in my research review (which has been posted since 2017):

You can see that nothing was left out.

Here's an excerpt from the table in our 2016 meta analysis on volume:

You can see nothing was left out.

The fact is, there was no deliberate misrepresentation of Ostrowski.  It was simply used to be a comparison of lowest versus highest volumes, AND we never made any claims that Ostrowski supported any sort of dose response relationship.  It completely dumbfounds me that this has been implicated as some sort of grand conspiracy to push some sort of agenda.  It is one of the most ridiculous criticisms of our paper that I have seen (along with the implications for program design which I will get to), of all the ones that have been put forth.

Replication

I want to address the concept of replication.  Some of you may be aware of  a "replication crisis" in science.  It represents an issue where the results of small studies are unable to be replicated by independent scientists.  It's been a particular problem in the field of psychology.

Replication is important, because if other scientists can replicate your results, then it gives more credence that your findings are "true" and not a fluke.  As I noted earlier, our study was a replication of the Radaelli study, except in trained subjects.  So you now have two completely independent research groups showing similar findings with similar protocols. This provides evidence that our results are "real" and not some fluke or product of manipulation.  Now, in all of Lyle's post, he fails to mention the results of Radaelli.  I've seen him call the Radaelli study "shit" for the following reasons:

  • The triceps data was a bit odd in that the lower volume groups showed very small increase (smaller than what's been observed in other studies), then there was a sudden jump with the high volume group.  However, as I point out in this blog post, such things can happen by random chance.  This doesn't invalidate the study.
  • He complained that they didn't measure quadriceps.  That does not make the study "shit" just because they didn't measure something you wanted them to measure.  Ostrowski didn't measure bicep thickness; if failing to measure a particular muscle group makes a study "shit", then that would make Ostrowski fall into the same category.  Again, double standards are at work here.

Again, Lyle shows that he dismisses research based on what it found, rather than on the methodology itself.  If he doesn't like the findings, then he finds reasons to dismiss the study....classic confirmation bias at work.

"So Are You Going To Recommend Everybody Start Doing 45 Sets Per Muscle Group Per Week?"

The other highly absurd criticism (along with the supposed Ostrowski conspiracy) is that we somehow believe that 45 sets per week is optimal and everyone should start doing that for each muscle group.  This is yet another strawman.  Nowhere have we claimed that people should start doing 45 weekly sets per muscle group.  Our study was simply an exploratory study to try to determine where the upper limits of volume might lie.  The results did not turn out as we had anticipated.

In my Research Review Volume Bible, I've made this very clear that we're not recommending everyone go out and start training every muscle group with extremely high volumes.  I've discussed how the most prudent and practical way to use this data is in the context of specialization routines, where you use a high volume for only one or two muscle groups.  In fact, my friend Jacob Schepis did this.  For 10 months, he did an arm specialization routine.  He did 15 weekly direct sets and 15 weekly indirect sets, for a total weekly volume of 30.  Check out his gains (and this was in an already well trained individual):

Also, if you're doing compound movements, some of the volumes performed in our study are certainly not unrealistic when performed on 1-2 muscle groups.  Here's what Eric Helms had to say in his upcoming MASS review (MASS is a research review; they do a great job and you should check them out...and of course check out my research review too 🙂 ):

 I think many of the folks picking apart this study, looking for holes in its methods, analysis or discussion, are concerned that everyone seeking hypertrophy is going to start doing 30-45 sets per muscle group and get hurt, so they are hoping to find a reason that this study is bogus. To be sure, 30-45 sets is a lot of volume, but it’s also important to point out that this value counts each set for each muscle group on a 1:1 basis, regardless of whether the muscle is directly or indirectly trained. For example, if I told the average lifter to do 4 sets of lat pulldowns, rows, cable curls, and DB hammer curls, twice per week, they probably wouldn’t think that was too crazy (even though, by the counting method used here, that’s 32 sets for biceps). However, if I told them to do 32 sets for biceps per week without preamble, their initial reaction would probably be a wide-eyed expression until they understood the counting method. I do think it’s a fair concern that this study could lead people to overdo it; however, it’s also important to remember that no single study tells you how to train."

The Lessons

I think it is very important, if we are going to critique research, that we maintain consistent standards when evaluating studies.  If we consider a study that we disagree with as unreliable due to some methodological issue, we have to consider a study we agree with as also unreliable if it has the same methodological issue.  While it can be human nature to seek out what confirms our beliefs, it is important to try to resist that tendency.

It's also important to develop an understanding of the practical limitations of research in terms of funding, personnel, journal word counts, etc.  This is particularly true in exercise science studies, where funding and staff are very limited.  While it would be great to be able to do studies with 100 subjects per group and two ultrasound technicians and everyone blinded to everything, it is generally not possible in the real world of exercise science.

Finally, it's important to approach scientific research, and the researchers who perform it, in good faith.  Do not assume malintent or some sort of lack of integrity, especially based on little to no evidence.  I think my friend Eric Helms put it best:

It is a good thing to bring up concerns, potential limitations, and issues of potential bias related to published data. The discussion of scientific findings is how we do better research in the future, and make the most correct interpretation of data. We must be able to disagree in these discussions, but productive disagreement can only happen if we argue in good faith.

Yes, some people lack integrity, and will lie if it benefits them or their bias. But the bar for questioning someone's integrity in a discussion needs to be very high. Integrity is arguably the most important quality for someone interested in science to have. If the bar is low, you can treat all disagreement as a sign of incompetence at best or intellectual dishonesty at worst. Of course, this stance of approaching everyone with a dissenting viewpoint as being stupid or "bad" leaves no room for you potentially being incorrect.

 

5 Responses to “Lessons in Confirmation Bias, Double Standards, Strawmen, Edema, Blinding, and Research Critiques: My Final Response on our Volume Study

  • Lyle, why have you gone so crazy?? You’re acting real weird. I like your books man and you’re a valuable addition to the science-based training community, but your attitude letting me down. I feel upset and disappointed. Stop with the attacks and snarky comments. Chills bro, we’re all in this fitness journey together

  • Profit of truth
    4 weeks ago

    Why delete my comment, just proves he is right and im not even taking sides

    • It states very clearly in my article why Lyle has been blocked. It says:

      Lyle also misses the obvious why we have stopped responding to him. He has incessantly sent us emails and PMs laced with ranting insults and profanities (I count 11 emails in my spam folder as of writing this). He has also been obsessively posting the same things to his group, almost daily. He has practically engaged in a smear campaign against myself and my colleagues. At some point, a rational person will simply stop responding…it does not become worth the time or energy to respond. This is particularly true when the individual continues to ignore points that have already been made to him. It is impossible to have any sort of productive discussion. The constant barrage of emotional hatemail also causes one to question whether such a person can actually be objective about our study.

      Lyle’s modus operandi appears to be:

      Lyle: Makes incessant abusive emails, PMs, and posts
      Us: Block him due to the abusive nature of his interactions
      Lyle: SEE? THEY CAN’T ANSWER MY QUESTIONS!

      And that means any comments he passes through others (I see your IP is Melbourne, Australia) will also be deleted.

      My blocking of his comments proves that I am not willing to tolerate unacceptable behavior.

Trackbacks & Pings

Leave a Reply Text

Your email address will not be published. Required fields are marked *