This code snippet is a crucial part of the YOLOv5 object detection model, responsible for processing predictions, applying non-maximum suppression (NMS), and saving the results as images and/or text files. Here's a detailed breakdown of the code:

  1. Setting Up Parameters

    imgsz = check_img_size(imgsz, s=model.stride.max())  # check img_size
    device = select_device(device)
    half = device.type != 'cpu'  # half precision only supported on CUDA
    

    This section ensures the input image size aligns with the model's requirements, selects the appropriate device (CPU or GPU), and determines whether to use half-precision floating-point computations for efficiency.

  2. Model Construction

    model = attempt_load(weights, map_location=device)  # load FP32 model
    

    Here, the model weights are loaded, the model is constructed, and it's moved to the selected device.

  3. Loading Images

    dataset = LoadImages(source, img_size=imgsz, stride=stride)
    

    The LoadImages class handles loading input images and resizing them to the model's expected size.

  4. Prediction

    for path, img, im0s, vid_cap in dataset:
        img = torch.from_numpy(img).to(device)
        img = img.half() if half else img.float()  # uint8 to fp16/32
        img /= 255.0  # 0 - 255 to 0.0 - 1.0
        if img.ndimension() == 3:
            img = img.unsqueeze(0)
    
        # Inference
        t1 = time_synchronized()
        pred = model(img, augment=opt.augment)[0]
    

    The image is transformed into a PyTorch tensor, moved to the device, and passed through the model for prediction. Data augmentation (if enabled) is applied during this step.

  5. Processing Detections

    # Apply NMS
    pred = non_max_suppression(pred, opt.conf_thres, opt.iou_thres, classes=opt.classes, agnostic=opt.agnostic_nms)
    t2 = time_synchronized()
    
    # Process detections
    for i, det in enumerate(pred):  # detections per image
        if webcam:  # batch_size >= 1
            p, s, im0, frame = path[i], f'{i}: ', im0s[i].copy(), dataset.count
        else:
            p, s, im0, frame = path, '', im0s.copy(), getattr(dataset, 'frame', 0)
    

    Non-maximum suppression (NMS) is applied to filter overlapping predictions, and the results are processed. For video inputs, the current frame number is tracked.

  6. Outputting Results

    p = Path(p)  # to Path
    save_path = str(save_dir / p.name)  # img.jpg
    txt_path = str(save_dir / 'labels' / p.stem) + ('' if dataset.mode == 'image' else f'_{frame}')  # img.txt
    s += '%gx%g ' % img.shape[2:]  # print string
    gn = torch.tensor(im0.shape)[[1, 0, 1, 0]]  # normalization gain whwh
    imc = im0.copy() if save_crop else im0  # for save_crop
    if len(det):
        # Rescale boxes from img_size to im0 size
        det[:, :4] = scale_coords(img.shape[2:], det[:, :4], im0.shape).round()
    
        # Print results
        for c in det[:, -1].unique():
            n = (det[:, -1] == c).sum()  # detections per class
            s += f'{n} {names[int(c)]}{'s' * (n > 1)}, '  # add to string
    

    The code sets paths for saving results (image and text files), counts and categorizes detected objects, and rescales bounding box coordinates for display.

  7. Displaying and Saving Results

    # Stream results
    if view_img:
        cv2.imshow(str(p), im0)
        cv2.waitKey(1)  # 1 millisecond
    
    # Save results (image with detections)
    if save_img:
        if dataset.mode == 'image':
            cv2.imwrite(save_path, im0)
        else:  # 'video' or 'stream'
            if vid_path[i] != save_path:  # new video
                vid_path[i] = save_path
                if isinstance(vid_writer[i], cv2.VideoWriter):
                    vid_writer[i].release()  # release previous video writer
                if vid_cap:  # video
                    fps = vid_cap.get(cv2.CAP_PROP_FPS)
                    w = int(vid_cap.get(cv2.CAP_PROP_FRAME_WIDTH))
                    h = int(vid_cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
                else:  # stream
                    fps, w, h = 30, im0.shape[1], im0.shape[0]
                    save_path += '.mp4'
                vid_writer[i] = cv2.VideoWriter(save_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (w, h))
            vid_writer[i].write(im0)
    

    This section visualizes the results if view_img is enabled and saves them as images or videos depending on the input type. The save_img flag controls whether the results are saved.

  8. Outputting Results

    if save_txt or save_img:
        s = f'
    

{len(list(save_dir.glob('labels/*.txt')))} labels saved to {save_dir / 'labels'}' if save_txt else '' print(f'Results saved to {save_dir}{s}')


The code prints the paths where results are saved.

9. **Model Update**

```python
if update:
    strip_optimizer(weights)  # update model (to fix SourceChangeWarning)

This section updates the model if update is set to handle potential SourceChangeWarning issues.

This code snippet serves as the core of YOLOv5's detection process. It demonstrates the steps involved in processing predictions, applying NMS, and saving the results in various formats. It also includes features for visualization and model updating, enhancing the overall utility of the YOLOv5 model.

YOLOv5 Object Detection: Processing Detections and Saving Results

原文地址: https://www.cveoy.top/t/topic/oHSZ 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录