Gaze behavior during scene and object recognition can highlight the relevant information for a task. For example, salience maps—highlighting regions that have heightened luminance, contrast, color, etc. in a scene—can be used to predict gaze targets. Certain tasks, such as face recognition, result in a typical pattern of fixations on high salience features. While local salience of a 2-D feature may contribute to gaze behavior and object recognition, we are perfectly capable of recognizing objects from 3-D depth cues devoid of meaningful 2-D features. Faces can be recognized from pure texture, binocular disparity, or structure-from-motion displays (Dehmoobadsharifabadi & Farivar, 2016; Farivar, Blanke, & Chaudhuri, 2009; Liu, Collin, Farivar, & Chaudhuri, 2005), and yet these displays are devoid of local salient 2-D features. We therefore sought to determine whether gaze behavior is driven by an underlying 3-D representation that is depth-cue invariant or depth-cue specific. By using a face identification task comprising morphs of 3-D facial surfaces, we were able to measure identification thresholds and thereby equate for task difficulty across different depth cues. We found that gaze behavior for faces defined by shading and texture cues was highly comparable, but we observed some deviations for faces defined by binocular disparity. Interestingly, we found no effect of task difficulty on gaze behavior. The results are discussed in the context of depth-cue invariant representations for facial surfaces, with gaze behavior being constrained by low-level limits of depth extraction from specific cues such as binocular disparity.