Error Bars in Scientific Figures: SEM vs SD vs 95% CI and When to Use Each
Why Error Bar Choices Get Flagged in Review
Error bars are the most common source of misunderstanding in scientific figures. A 2012 survey in the Journal of Cell Biology found that the majority of papers did not specify whether error bars represented standard deviation, standard error of the mean, or 95% confidence intervals — and that readers routinely misinterpreted them. If you need to plot error bars for a scientific publication, choosing between SEM, SD, and 95% CI is not cosmetic. Each communicates something fundamentally different about your data, and reviewers know the difference.
What Each Error Bar Measures
Standard Deviation (SD)
SD describes the spread of individual data points around the mean. It answers: “How variable are the observations in this sample?”
$SD = \sqrt{\frac{1}{n-1}\sum_{i=1}^{n}(x_i - \bar{x})^2}$SD does not shrink as you collect more data. A sample of 10 and a sample of 1,000 from the same population will have similar SDs (assuming the population is stable). Use SD when you want to show data variability — for example, when reporting the range of patient responses to a drug.
Standard Error of the Mean (SEM)
SEM estimates how precisely you know the population mean. It answers: “How much would the sample mean vary if I repeated this experiment?”
$SEM = \frac{SD}{\sqrt{n}}$SEM always shrinks with larger sample sizes. This is why SEM bars look smaller than SD bars, and why some researchers prefer them — they make differences look more impressive. Reviewers are aware of this, and many journals explicitly require SD or 95% CI instead of SEM for this reason.
Using SEM to make small differences look significant. Because SEM = SD / sqrt(n), a large sample always produces small SEM bars regardless of data spread. If two SEM bars overlap, you cannot conclude the means are not significantly different — and if they do not overlap, you still need a statistical test to confirm.
95% Confidence Interval (CI)
The 95% CI provides a range that, with 95% confidence, contains the true population mean. For normally distributed data with known SD:
$CI_{95\%} = \bar{x} \pm 1.96 \times SEM$With small samples, replace 1.96 with the appropriate t-value for n – 1 degrees of freedom. A 95% CI of [4.2, 6.8] means: “We are 95% confident the true mean lies between 4.2 and 6.8.” This is the most interpretable error bar for readers because it directly communicates estimation precision.
When to Use Each Type
| Error bar | Shows | Use when | Journal preference |
|---|---|---|---|
| SD | Data spread | Describing variability across observations | Biology, pharmacology, preclinical |
| SEM | Precision of mean | Comparing group means (with caution) | Some clinical journals; declining preference |
| 95% CI | Plausible range for true mean | Inference about population parameters | Recommended by APA, CONSORT, most journals |
The APA Publication Manual (7th edition) and the CONSORT guidelines both recommend 95% confidence intervals over SEM. If your target journal does not specify a preference, 95% CI is the safest default.
The Overlap Trap: Why Eyeballing Error Bars Fails
Researchers often judge statistical significance by whether error bars overlap. This heuristic is unreliable:
- SEM bars: non-overlapping bars do not guarantee significance. Overlapping bars do not rule it out. The relationship between SEM overlap and p < 0.05 depends on sample size.
- 95% CI bars: if CIs of two independent means do not overlap, the difference is significant at roughly p < 0.01 (more conservative than p < 0.05). If they do overlap slightly, the difference may still be significant at p < 0.05.
- SD bars: overlap tells you nothing about statistical significance. SD describes variability, not inferential precision.
Never rely on visual inspection of error bars to determine significance. Always report the test statistic, p-value, and effect size from a formal hypothesis test. Error bars supplement the statistical test — they do not replace it. If your sample size was calculated for adequate power, the formal test is what matters.
How to Plot Each Type
In R (ggplot2)
# SD error bars
ggplot(data, aes(x = group, y = value)) +
stat_summary(fun = mean, geom = "bar") +
stat_summary(fun.data = mean_sdl, fun.args = list(mult = 1),
geom = "errorbar", width = 0.2)
# SEM error bars
ggplot(data, aes(x = group, y = value)) +
stat_summary(fun = mean, geom = "bar") +
stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.2)
# 95% CI error bars
ggplot(data, aes(x = group, y = value)) +
stat_summary(fun = mean, geom = "bar") +
stat_summary(fun.data = mean_cl_normal, geom = "errorbar", width = 0.2)
In Python (matplotlib)
import numpy as np
import matplotlib.pyplot as plt
means = [group1.mean(), group2.mean()]
sds = [group1.std(ddof=1), group2.std(ddof=1)]
sems = [s / np.sqrt(len(g)) for s, g in zip(sds, [group1, group2])]
# SD bars
plt.bar(["Group 1", "Group 2"], means, yerr=sds, capsize=5)
# SEM bars
plt.bar(["Group 1", "Group 2"], means, yerr=sems, capsize=5)
Reporting Error Bars in Figure Legends
Every figure with error bars must state what they represent. This is non-negotiable — the most common reviewer comment about figures is “error bars not defined in legend.”
Standard phrasing:
- “Data are presented as mean ± SD (n = 8 per group).”
- “Bars represent mean ± SEM. *p < 0.05, **p < 0.01 (unpaired t-test).”
- “Error bars indicate 95% confidence intervals. Individual data points are shown.”
Note the pattern: specify the measure (SD, SEM, or 95% CI), state the sample size, and if significance markers are present, define them and name the test used. This applies to bar charts, line graphs, and dot plots equally.
A study measures enzyme activity across 3 conditions with n = 12 per group. Group means: 45.2, 52.8, 48.1 units. SDs: 8.3, 7.9, 9.1. One-way ANOVA: F(2, 33) = 3.42, p = 0.044, partial eta-squared = 0.17.
Figure legend: “Enzyme activity (units) across treatment conditions. Bars show mean ± SD (n = 12). One-way ANOVA: F(2, 33) = 3.42, p = 0.044. Post-hoc Tukey HSD: *p = 0.038 (Control vs. Treatment A).”
Computing SEM for verification: SEM = 8.3 / sqrt(12) = 2.40 for group 1. The 95% CI for group 1 = 45.2 ± 2.201 × 2.40 = [39.9, 50.5], where 2.201 is the t-value for df = 11.
When Reviewers Push Back
Common reviewer requests and how to handle them:
- “Show individual data points” — overlay a jitter or strip plot on your bar chart. This is now standard in biology journals and addresses concerns about hidden distributions.
- “Use SD instead of SEM” — switch the error bars and update the legend. If you used SEM because differences looked too small with SD, that is a presentation problem, not a data problem.
- “Add confidence intervals” — calculate 95% CIs and add them to the results text, even if the figure shows SD. Many journals want both.
- “Define error bars” — add the definition to every figure legend, not just the first one.
If your study involves animal research with IACUC requirements, reviewers are especially likely to scrutinize whether your error bars and sample sizes are consistent with the power analysis in your protocol.