Annotated Human Region Dataset for Intersectional Attribute Analysis

We annotated the whole validation set and a random portion of the training set. We downloaded 13,501 and 2,099,769 validation and training images, respectively; from those, 5,668 and 498,006 validation and training images had detected human regions by YOLOv5; from those, all the validation (5,668) and a random subset of the training images (17,147), were annotated; only 4,614 and 14,275 validation and training images passed the manual verification and their human-detected regions were annotated with the six demographic and contextual attributes. Overall, 35,347 regions were annotated: 8,833 in the validation set and 26,514 in the training set.

We publicly share the data in two formats: raw, in which each region has up to three annotations, and region-level, in which each region is assigned to a class by majority vote. If there is no consensus, the region is labeled as 'disagreement'.

Attribute analysis Full statistics per attribute and class are reported in the supplementary material. Overall, all the attributes are imbalanced, with one or two predominant classes per attribute. For age, the predominant class is 'adult', appearing in 45.6% of the region-level annotations, while for gender, the gap between 'man' (63.6%) and 'woman' (35.1%) is 28.5 points. The gap is bigger in skin-tone and ethnicity: skin-tone Type 2 is annotated in 47.1% of the regions, while Types 4, 5, and 6 together only in 17.5%. Similarly, in ethnicity, 'White' class is over-represented with 62.5% regions, while the rest of the classes appear from 0.6% to 10% regions each. When conducting an intersectional analysis, big differences arise. For example, the classes 'man' and 'White' appear together in 13,651 regions, whereas 'woman' and 'Black' in only 823.

As per the contextual attributes, the most represented emotion is 'neutral' (47.1%) followed by 'happy' (35.7%), whereas negative-associated emotions ('sad', 'fear', 'anger') are just 2.33% in total. For activity, the most common classes are 'posing' (28.6%) and 'other' (20.5%).

Gender and context We cross-check gender with contextual attributes. Region-level emotion statistics per gender are shown in Figure 4, where it can be seen that women tend to appear 'Happy' whereas men tend to appear 'Neutral'. This aligns with the gender stereotyping of emotions, which is a well-documented phenomenon in psychology [35]. We also detect disparities in activities per gender, especially in the classes 'posing' and 'sports': in woman regions, there are 42.8% 'posing' and 5.1% 'sports' annotations, while in man regions, 21% are 'posing' and 26.9% are 'sports'.

Annotated Human Region Dataset for Intersectional Attribute Analysis