YOLOv5 v6.2: Classification Models, Apple M1, Reproducibility, ClearML & Deci.ai Integrations - Ultralytics - 常规

[25] Glenn Jocher, Ayush Chaurasia, Alex Stoken, Jirka Borovec, NanoCode012, Yonghye Kwon, TaoXie, Kalen Michael, Jiacong Fang, imyhxy, Lorna, Colin Wong, (Zeng Yifu), Abhiram V, Diego Montes, Zhiqiang Wang, Cristi Fati, Jebastin Nadar, Laughing, UnglvKitDe, tkianai, yxNONG, Piotr Skalski, Adam Hogan, Max Strobel, Mrinal Jain, Lorenzo Mammana, and xylieong. ultralytics/yolov5: v6.2 - YOLOv5 Classification Models, Apple M1, Reproducibility, ClearML and Deci.ai integrations, Aug. 2022.

Kimmo Karkkainen and Jungseock Joo. 'FairFace: Face attribute dataset for balanced race, gender, and age for bias measurement and mitigation.' In WACV, 2021.

Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, et al. 'Visual Genome: Connecting language and vision using crowdsourced dense image annotations.' Trans. IJCV, 123(1):32–73, 2017.

Gen Li, N. Duan, Yuejian Fang, Daxin Jiang, and M. Zhou. 'Unicoder-VL: A universal encoder for vision and language by cross-modal pre-training.' In AAAI, 2020.

Xiujun Li, Xi Yin, Chunyuan Li, Pengchuan Zhang, Xiaowei Hu, Lei Zhang, Lijuan Wang, Houdong Hu, Li Dong, Furu Wei, et al. 'Oscar: Object-semantics aligned pre-training for vision-language tasks.' In ECCV, 2020.

[30] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C Lawrence´ Zitnick. 'Microsoft COCO: Common objects in context.' In ECCV, 2014.

Jiasen Lu, Dhruv Batra, Devi Parikh, and Stefan Lee. 'Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks.' NeurIPS, 2019.

Nicole Meister, Dora Zhao, Angelina Wang, Vikram V Ramaswamy, Ruth Fong, and Olga Russakovsky. 'Gender artifacts in visual datasets.' arXiv preprint arXiv:2206.09191, 2022.

Ron Mokady, Amir Hertz, and Amit H Bermano. 'ClipCap: Clip prefix for image captioning.' arXiv preprint arXiv:2111.09734, 2021.

Jahna Otterbacher, Pınar Barlas, Styliani Kleanthous, and Kyriakos Kyriakou. 'How do we talk about other people? Group (un) fairness in natural language image descriptions.' In AAAI HCOMP, 2019.

E Ashby Plant, Janet Shibley Hyde, Dacher Keltner, and Patricia G Devine. 'The gender stereotyping of emotions.' Psychology of women quarterly, 24(1):81–92, 2000.

Bryan A Plummer, Liwei Wang, Chris M Cervantes, Juan C Caicedo, Julia Hockenmaier, and Svetlana Lazebnik. 'Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models.' In ICCV, 2015.

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 'Learning transferable visual models from natural language supervision.' In ICML, 2021.

YOLOv5 v6.2: Classification Models, Apple M1, Reproducibility, ClearML & Deci.ai Integrations - Ultralytics