Hover any element for exact values · Toggle outliers and orientation · Arrow keys navigate groups
A box and whisker plot encodes the five-number summary of any distribution in a single compact glyph. The box spans the interquartile range (IQR = Q3 − Q1), capturing the middle 50% of observations. The thick line inside the box marks the median — its position within the box reveals distributional skew at a glance: a median pushed toward Q1 signals positive skew (a long upper tail); pushed toward Q3 signals negative skew. Whiskers extend to the most extreme values within 1.5×IQR of the box edges (Tukey's 1977 fence rule); any points beyond are plotted individually as outliers. The small triangle (△) marks the mean, which often diverges visibly from the median in skewed data — a divergence that standard bar charts cannot show at all.
This chart compares simulated annual household income (USD thousands) across five residential zones — Urban, Suburban, Rural, Coastal, and Mountain — with 80 observations per group. Coastal households show the highest median and widest spread, with a long upper whisker indicating a tail of high earners. Rural shows the lowest median and the most compact IQR. Suburban's cluster of upper outliers reveals an affluent subpopulation that a bar chart showing only the mean would conceal entirely. Toggle between vertical and horizontal orientation to suit different reading contexts, enable notches to display 95% confidence intervals around each median, or toggle outlier visibility to see how extreme values shift the perceived scale of each distribution.
A box and whisker plot encodes the five-number summary of a distribution in a single glyph: minimum whisker end, first quartile (Q1), median, third quartile (Q3), and maximum whisker end. The box spans the interquartile range (IQR = Q3 − Q1), containing the middle 50% of observations. The median line cuts across it — its position within the box reveals skew at a glance: a line pushed toward Q1 means positive skew; pushed toward Q3 means negative skew. Whiskers extend to the farthest point within 1.5×IQR of the box; anything beyond is plotted as an individual outlier dot. The perceptual mechanism exploited is position along a common scale — the most accurate quantitative encoding channel known from psychophysical research.
The data presents five groups with continuous measurements and the goal is comparing distributional shape, not just central tendency. A bar chart would show only the mean, hiding spread, skew, and outliers entirely. A dot plot shows all individual points but collapses into unreadable overplotting at sample sizes above ~40. The box plot occupies the productive middle ground: it summarises the full distribution without sacrificing comparability across groups. When the five groups are placed side-by-side on a shared axis, relative spread (IQR width), central tendency (median line), tail behaviour (whisker length), and anomalies (outlier dots) become simultaneously readable in a single pass.
A grouped bar chart showing means and standard deviations — the most common naive substitute — fails because standard deviations assume symmetrical, approximately normal distributions. The moment a group is skewed or has outlier contamination, the error bars become misleading (extending below zero, or implying tails of equal length). The viewer walks away believing the groups differ primarily in average, when the real story might be in tail behaviour or bimodality entirely invisible to that encoding. A violin plot is the strongest alternative to the box plot, adding kernel density contours, but it requires larger sample sizes to be trustworthy and adds reading complexity for audiences not trained in density estimation.
FT Visual Vocabulary category: Distribution — "Show the range of values in a dataset and how they are distributed." Tufte principle: the box plot is architecturally data-ink efficient — every pixel of ink encodes a statistical quantity. Abela quadrant: Distribution (single variable, multiple groups, comparison of shape). The one design decision worth knowing: the 1.5×IQR rule for whisker length was set by Tukey in 1977 as a robust fence — it flags roughly 0.7% of observations as potential outliers under a normal distribution. This rule is implemented here exactly; changing it to min/max would hide distributional tail information.