Radiolab, tetrachromacy, and the demands of responsible experimentation

January 11, 2013

Radiolab is a bimonthly radio show and podcast that tells stories of human experience. Its blend of interview, narration, and soundscape make for a 60 minute show (or a 15 minute “short”) that’s meant to delight, inform, and perhaps slake its audience’s curiosity about the world. In one episode, hosts and creators Jad Abumrad and Robert Krulwich interview a man who lost his sense of taste after a stomach surgery forced him to feed by IV for many months. In another episode, they tell the story of Blissymbols, a universal symbol-based communication system constructed by Charles Bliss as an alternative to natural language, whose flexibility he viewed as a dangerous tool for manipulating others. Radiolab is engaging, thought-provoking, clear, and worthwhile.

Well, usually.


In May 2012, Radiolab ran an episode on the perception of color. In the episode, we learn that most humans have retinae carpeted with three types of cones, which are sensors tuned to a specific wavelength of light, like a radio that is tuned to your favorite FM station. The activity of the three types of cones combine to determine the color that is perceived. Having only three types of cones (“trichromacy”) limits the range of perceptible colors. Animals with a greater number of cone types, e.g. the mantis shrimp (a dodecachromat), have a larger color gamut. People with fewer cone types are colorblind.

In the episode, the hosts raise the question of what would happen if a person had an extra cone type, a condition known as tetrachromacy. Do tetrachromats see extra colors? To answer this question, they hunt for a tetrachromat, find one, and perform a behavioral test to determine how she perceives color. They recap:

Producer Tim Howard tracked down a real-life tetrachromat named Susan Hogan, then drove out to Pittsburgh to meet her … and administer[ed] a quick vision test that made it clear that who sees what is anything but black and white.

To determine whether Susan could perceive extra colors, the experimenter performs a test using special swatches of fabric designed so that to a tetrachromat, they are easy to tell apart, but to a trichromat, they look exactly the same. On each round, Susan, like a muppet playing “One of these things is not like the other”, is shown three swatches and then asked to pick the odd man out.

Here’s where things went awry.

The test played out as follows, as narrated by the experimenter, E, in a discussion with the host, H:

E: We ended up doing the test in a nearby park. In the first trial, I took out three of the swatches: two that were exactly the same, and one that was supposedly different.

H: And when you took it out, could you see the difference?

E: No, no. So I go behind the tree and I whisper into the mic, “Number 3 is different. Number 3.”

We then hear a brief recording of a conversation between the experimenter and Susan, S:

E: I hope you couldn’t hear me.

S: No… [inaudible]

The experimenter gives her a moment to look at the swatches. After a few seconds, she answers correctly, picking number 3. The experiment is repeated. The second time, she answers correctly, picking number 1. The third time, she again answers correctly, noticing that this time the experimenter has played a trick on her and that all three swatches are the same. Fin.


Back in 1904, a horse known as Clever Hans was found that could do arithmetic. Or so it was claimed. After careful testing, Oskar Pfungst, a psychologist and comparitive biologist, determined that the horse wasn’t a math whiz, but rather had a specific perceptual skill: he could pick up on the subtle clues provided unknowingly by spectators and examiners. Hans didn’t know the answer, but he could tell that others did, using it to his advantage. Nowadays this is known as the Clever Hans effect, and luckily, scientists have developed a simple method to eliminate it. The approach, used often in medical studies and those of animal and infant cognition, is the so-called “double-blind” experiment, where neither the participant nor the experimenter are allowed to know the correct answer at the time of the test.

Radiolab’s tetrachromacy experiment did nothing to prevent the Clever Hans effect: the experimenter knew the correct answer and was in the presence of the participant during the test.

Strike one.


Persi Diaconis is the Mary V. Sunseri Professor of Statistics and Mathematics at Stanford University, where he studies probability theory. He’s also a trained magician. When, in the ’60s and ’70s, claims of paranormal skills abounded from those including Uri Geller and Ted Serios, Diaconis was a leader of the skeptics, attending demonstrations of paranormal abilities and writing articles in Science and statistics journals documenting the deceit and trickery that made those demonstrations possible. It was ordinary magic, not magical magic.

One of the standard tools used to demonstrate extra-sensory perception (ESP) is a deck of Zener cards. Each card has one of five symbols printed on its back, with five cards per symbol, for a total of 25 cards. To test for ESP, the participant is asked to guess the order of the cards. Performance that is reliably better than chance suggests that the participant is clairvoyant (or cheating).

In fact, some people guess correctly more than 1/5 of the time, every time they take the test. Impressive. But are they clairvoyant? No. Diaconis noticed that because the set of cards is fixed in number and composition, a participant learns something about the remaining cards each time one is revealed. (For a moment, imagine a deck with only two cards: one King and one Queen. If you guess “King” for the first card, and I reveal that you were wrong, what would you guess is the next card? It’s a King, and you’re sure of it. You always know the identity of the last card, no matter whether you knew the one before it. That’s because you can eliminate the cards you’ve already seen from the pool of possible cards that you use when guessing.) Chance performance was never 1/5th. The participant can use what’s already been revealed to guess what’s coming next. We learn from this that it’s important to consider how well a participant would do in the absence of the skill we’re testing for.

In Radiolab’s tetrachromacy experiment, the choice of which swatch to use as the odd man out was made by the experimenter. The order was picked for the purpose of telling a good story. Providing trial-by-trial feedback to the participant may have altered the results.

Strike two.


When a doctor screens a patient for colon cancer, or when the TSA screens a flyer’s bag for contraband, they must assign one of two labels: present or absent. The goal is to make the correct classification, but sometimes errors occur. For example, a doctor may incorrectly diagnose a sick patient as healthy, missing signs of the disease’s presence, or a TSA agent may see contraband where none exists, signaling a false alarm. Good screening procedures find what they’re looking for, but only when it’s really there.

Later in the segment, after the experiment with Susan the tetrachromat, Radiolab repeats the procedure with a new participant who is a known trichromat. Without a fourth type of cone, the participant lacks the capacity to see any color that requires it. And those are precisely the colors needed to pass Radiolab’s test. The new participant should fail. But that’s not what happened. Instead, the trichromat performed just as well as the tetrachromat, answering all of the questions correctly. This is like a TSA agent who thinks that every flyer carries contraband, or a doctor who thinks that every patient has cancer. Without an ability to distinguish healthy from sick, present from absent, or a normal gamut from one with extra colors, these tests are useless.

Strike three.


Am I holding Radiolab to too high a standard? Is it right to demand that their in-house experiments, meant as public demonstrations, are done carefully by the standards of modern science? In the same way that it’s unreasonable to explain to a newcomer all of the exceptions, caveats, and uncertainties that litter any complex subject, might not the same be true of sound experimental design? Shouldn’t we applaud them for even bothering to experiment?

At its core, the public understanding of science isn’t just about knowing the latest research findings and building an arsenal of neat facts and tidbits; rather, it’s about developing a skeptical mind that can control its own beliefs by evaluating evidence. In an interview with Charlie Rose, Carl Sagan once said:

Science is more than a body of knowledge. It’s a way of thinking, a way of skeptically interrogating the universe with a fine understanding of human fallibility. If we are not able to ask skeptical questions, to interrogate those who tell us that something is true, to be skeptical of those on authority, then we are up for grab by the next charlatan, political or religious, who comes ambling along.

For a show like Radiolab, fascinated by the boundary between what’s known and what could be, funded by the National Science Foundation, and listened to by millions, what’s most important is not the ideas but the process by which those ideas are tested and confirmed. That’s science, and it’s where Radiolab misstepped. fleuron