biology

How Cleaner Wrasses Recognize Themselves in Mirrors: The Strange Cognition of a Four-Inch Fish

The mirror self-recognition test was supposed to mark a cognitive boundary between great apes and the rest of vertebrates. A 2019 paper showing cleaner wrasses passing it complicates the boundary in ways the field is still working through.

Anethoth

16 May 2026 — 6 min read

The mirror self-recognition (MSR) test was developed by Gordon Gallup in 1970 to test whether great apes could recognize themselves in a mirror. The test puts a colored mark on a part of the animal's body it cannot see directly (typically the forehead) and observes whether the animal, after experience with the mirror, uses the mirror to investigate the mark. Self-directed investigation (rather than treating the mirror image as another animal) is taken as evidence that the animal recognizes the image as itself.

For about forty years after Gallup's original work, MSR was thought to be passed by great apes (chimpanzees, orangutans, bonobos, gorillas with debate), bottlenose dolphins, Asian elephants, and Eurasian magpies. The list was small enough to suggest a meaningful cognitive boundary: the animals that pass MSR have certain other cognitive capacities (theory of mind, complex social behavior, large brains relative to body size), and the animals that fail seem to lack those capacities. The test was treated as a bright line between self-aware and not-self-aware vertebrates.

The 2019 Kohda et al PLOS Biology paper showed that cleaner wrasses (Labroides dimidiatus), small tropical reef fish about four inches long with brains weighing about a hundredth of a gram, pass the mark test under controlled conditions. The result has been replicated and extended in several follow-up papers. It does not fit the previous bright-line interpretation of MSR, and the field has been working through the implications since.

The fish and the test

Cleaner wrasses are tropical reef fish that earn their living by removing parasites from the bodies of larger client fish. The behavior is complex: they recognize individual clients across many encounters, prioritize service to clients that might leave (high client-fish mobility encourages prompt service), cheat on clients that cannot easily switch service providers, and reconcile after cheating using affiliative gestures. The behavioral complexity has made cleaner wrasses a model organism for cognitive ecology of fish for several decades.

The Kohda et al protocol is essentially the Gallup mark test adapted for fish. After acclimation to a mirror in their tank, fish are given a colored mark (a brown gel injected just under the skin to make a visible spot) on the throat in a position where they cannot see it directly. The behavioral measure is whether the fish, after seeing themselves in the mirror, attempt to remove the mark by scraping their throat against the substrate. Control conditions include marks visible without the mirror, transparent marks (no visible mark), and other manipulations that distinguish mark-directed behavior from other responses.

The published result is that cleaner wrasses pass the mark test by the same behavioral criteria that great apes do: increased throat-scraping after mark application, no scraping with transparent marks, and the scraping reduces or stops when the mark is removed. The 2019 paper reported the result on three of four wrasses tested; subsequent work has replicated and extended with larger sample sizes.

The first response: methodological objections

The initial response from the cognitive ecology community was substantial methodological objection. Frans de Waal published a commentary in PLOS Biology arguing that the wrasse behavior could be explained as parasite-removal behavior triggered by an unfamiliar visual stimulus rather than as self-recognition. Gallup published similar concerns. Several other groups raised methodological points about the marking technique, the small sample size, and the operational definition of mark-directed behavior.

The Kohda group responded with additional experiments addressing many of the objections: larger sample sizes, additional control conditions, and tests of whether the wrasses transferred the behavior to other body locations. The 2022 follow-up paper in PNAS used a larger sample (17 fish) and a more refined experimental design, with the result holding up. Subsequent work has applied the same methods to related species (other Labroides species and several non-cleaner wrasses) and finds the result is specific to cleaner wrasses and not generic to wrasses.

The current consensus, as of 2026, is that cleaner wrasses do reliably pass the standard MSR mark test, and the methodological objections have largely been addressed. The interpretive question (what does passing the test mean about the underlying cognition) is much more open.

The interpretive problem

The MSR test was originally interpreted as evidence of self-awareness, with the implicit assumption that self-awareness is a high-level cognitive capacity associated with large brains and rich social behavior. The cleaner wrasse result challenges that assumption in two ways: the brain is small (a hundredth of a gram, with much simpler neuroanatomy than mammalian brains), and the social behavior, while complex by fish standards, is not on the same scale as primate or cetacean social cognition.

The interpretive options divide into roughly three camps. The first camp argues that MSR does mean what Gallup originally said it meant, and the wrasse result shows that self-awareness is more widely distributed in vertebrates than previously thought. This camp has to explain how a hundredth-of-a-gram brain implements self-awareness when much larger fish brains apparently do not.

The second camp argues that MSR is not a clean test of self-awareness but rather of a more specific capacity: visual matching of body model to visual feedback. This is a useful capacity for animals that need to coordinate their bodies precisely (cleaner wrasses need to position themselves accurately to remove specific parasites from specific spots on client fish), and the capacity might be widely distributed without implying broader self-awareness. This camp has to explain why the capacity is not also present in many other fish that have similar behavioral demands.

The third camp argues that the MSR test is too noisy and too dependent on testing conditions to be a clean test of any single cognitive capacity, and that the appropriate response is to study cleaner wrasse cognition directly using a wider battery of tasks. This camp has the methodological advantages but does not produce clean theoretical conclusions.

What cleaner wrasse brains look like

The neuroanatomy of cleaner wrasse brains is generally fish-typical: a small forebrain with telencephalic regions homologous to vertebrate pallial regions, a larger optic tectum (analogous to mammalian superior colliculus), and a cerebellum-like structure. The total neuron count is on the order of 10^7 neurons, compared to roughly 10^11 in human brains. The neuron-density-per-volume is high compared to mammalian brains, which partly compensates for the small total volume.

The fish telencephalon contains regions functionally homologous to mammalian hippocampus (for spatial learning) and amygdala (for emotional learning), but the homology is at the level of evolutionary origin rather than detailed circuitry. The fish brain organization is the ancestral vertebrate organization, modified independently in the various vertebrate lineages over the 500-million-year radiation.

The implication of the small brain passing MSR is either that MSR can be implemented in a much smaller circuit than mammalian self-recognition uses, or that mammalian self-recognition is implemented in a smaller circuit than the gross brain size suggests. Either interpretation is interesting; neither is well-established.

The wider context: other fish cognition results

The cleaner wrasse MSR result fits into a broader pattern of fish cognition findings that have been accumulating over the last 25 years. Fish demonstrate complex social learning, individual recognition across many interactions, transitive inference (if A beats B and B beats C, A will beat C), tool use in some species, and arithmetic discrimination of small quantities. The Australian rainbowfish that learn to escape a trap by following a leader fish through it can transmit the learned escape route to naive fish via observational learning, with the route persisting in the population after the original learners are removed.

The cumulative effect of these findings is to undermine the previous consensus that fish cognition was substantially simpler than mammalian or avian cognition. The current view, articulated in Culum Brown's 2015 review and many subsequent papers, is that fish cognitive capacities overlap heavily with mammalian capacities once the methodological biases are controlled for. The MSR result is a particularly clean example because the test was designed for primates and the fish pass without modification.

The convergent-evolution question

If MSR or its underlying capacities are present in cleaner wrasses, the evolutionary question is whether the capacity is convergent (independently evolved in multiple lineages) or homologous (inherited from a common ancestor). The convergent interpretation requires that whatever cognitive machinery passes MSR has evolved at least three times independently (in great apes, in cetaceans, in birds, and now in cleaner wrasses) under selection pressures that favor it. The homologous interpretation requires that the underlying capacity was present in the common ancestor of vertebrates (about 500 million years ago) and has been retained in some descendants and lost in others.

The neither-clean-answer is currently most likely. Some component capacities (visual matching, body model maintenance, attention to self-as-distinct-from-environment) are probably homologous and ancient. The integration of these capacities into a behavior that passes MSR may be convergent, evolved separately in lineages where the integration is selected for. The question is empirical and probably will be resolved by detailed comparative neuroscience over the coming decades.

The deeper observation

The cleaner wrasse MSR result is one of a category of recent findings that complicate the cognitive bright lines drawn over the last several decades. The bright lines were drawn at convenient points (great apes versus other primates, mammals versus birds, vertebrates versus invertebrates) and have generally not held up under more thorough comparative testing. Octopus spatial cognition, raven planning, jay episodic memory, slime mold optimization, fungal communication networks, and now cleaner wrasse self-recognition are all results that complicate the previous taxonomic neatness.

The wider observation is that biological cognition was probably never clean enough to support the categorical claims that schoolroom biology textbooks made about it. Cognitive capacities seem to be more widely distributed than previously thought, with the differences between lineages being more about degree, organization, and specialization than about presence or absence. The cleaner wrasse with its hundredth-of-a-gram brain is one of the more striking demonstrations because the cognitive substrate is so much smaller than the previous theory required, but the cleaner wrasse is unlikely to be the most surprising case the comparative cognition field will produce.

How Cleaner Wrasses Recognize Themselves in Mirrors: The Strange Cognition of a Four-Inch Fish

Anethoth

The fish and the test

The first response: methodological objections

The interpretive problem

What cleaner wrasse brains look like

The wider context: other fish cognition results

The convergent-evolution question

The deeper observation

Read more

How Treehoppers Communicate Through Plant Stems: The Strange Substrate-Borne Vibrational Network

The Forgotten History of the Microwave Oven: How Radar Engineering Reshaped the Kitchen

Postgres pg_settings: Reading and Reasoning About Configuration at Runtime

Designing API Webhook Payloads: Snapshots vs References and the Right Default for B2B SaaS