To recognize an object, it is widely supposed that we first detect and then combine its features. Familiar objects are recognized effortlessly, but unfamiliar objects—like new faces or foreignlanguage letters—are hard to distinguish and must be learned through practice. Here, we describe a method that separates detection and combination and reveals how each improves as the observer learns. We dissociate the steps by two independent manipulations: For each step, we do or do not provide a bionic crutch that performs it optimally. Thus, the two steps may be performed solely by the human, solely by the crutches, or cooperatively, when the human takes one step and a crutch takes the other. The crutches reveal a double dissociation between detecting and combining. Relative to the two-step ideal, the human observer’s overall efficiency for unconstrained identification equals the product of the efficiencies with which the human performs the steps separately. The twostep strategy is inefficient: Constraining the ideal to take two steps roughly halves its identification efficiency. In contrast, we find that humans constrained to take two steps perform just as well as when unconstrained, which suggests that they normally take two steps. Measuring threshold contrast (the faintness of a barely identifiable letter) as it improves with practice, we find that detection is inefficient and learned slowly. Combining is learned at a rate that is 4× higher and, after 1,000 trials, 7× more efficient. This difference explains much of the diversity of rates reported in perceptual learning studies, including effects of complexity and familiarity.