YOLOv1, YOLOv2, and YOLOv3 identify the typical detection architecture consisting of three parts, i.e., backbone, neck, and head. YOLOv4 and YOLOv5 introduce the CSPNet design to replace DarkNet, coupled with data augmentation strategies, enhanced PAN, and a greater variety of model scales, etc. YOLOv6 presents BiC and SimCSPSPPF for neck and backbone, respectively, with anchor-aided training and self-distillation strategy. YOLOv7 introduces E-ELAN for rich gradient flow path and explores several trainable bag-of-freebies methods. YOLOv8 presents C2f building block for effective feature extraction and fusion. Gold-YOLO provides the advanced GD mechanism to boost the multi-scale feature fusion capability. YOLOv9 proposes GELAN to improve the architecture and PGI to augment the training process.

Last updated