用python写出对yolov5s的输出的大小为1272424 的featuremap做后处理得到最终检测框其中num_classes=4num_anchors=3
以下是一个可能的解决方案:
import torch
from torchvision.ops import box_convert, nms
def post_process(output, conf_thresh=0.5, nms_thresh=0.5):
# 将输出转换为bbox
batch_size, num_anchors, grid_h, grid_w = output.shape
num_classes = 4
anchor_boxes = torch.tensor([[[10, 13], [16, 30], [33, 23]],
[[30, 61], [62, 45], [59, 119]],
[[116, 90], [156, 198], [373, 326]]],
dtype=output.dtype, device=output.device)
scaled_anchors = anchor_boxes / torch.tensor([grid_w, grid_h], dtype=anchor_boxes.dtype, device=anchor_boxes.device).unsqueeze(0)
output[..., :2] = torch.sigmoid(output[..., :2])
output[..., :2] += scaled_anchors
output[..., 2:4] = torch.exp(output[..., 2:4]) * anchor_boxes.unsqueeze(1)
output[..., 4:] = torch.sigmoid(output[..., 4:])
bbox = box_convert(output[..., :4].flatten(start_dim=2), 'cxcywh', 'xyxy').reshape(batch_size, -1, 4)
conf = output[..., 4:].flatten(start_dim=2)
# 应用置信度阈值
keep = (conf > conf_thresh).nonzero()
if keep.numel() == 0:
return []
batch_idx, anchor_idx, class_idx = keep.unbind(dim=-1)
scores = conf[batch_idx, anchor_idx, class_idx]
bbox = bbox[batch_idx, anchor_idx]
class_idx = class_idx.float()
# 应用NMS
keep = nms(bbox, scores, nms_thresh)
bbox = bbox[keep]
scores = scores[keep]
class_idx = class_idx[keep]
# 将bbox和类别id转换为检测框
num_detections = bbox.shape[0]
detections = torch.zeros((batch_size, num_detections, 6), dtype=output.dtype, device=output.device)
detections[..., :4] = bbox
detections[..., 4] = scores
detections[..., 5] = class_idx
return detections
此函数将输出张量作为输入,并将其转换为检测框。每个检测框由5个值组成:左上角和右下角坐标,置信度得分和类别id。此函数使用sigmoid函数将bbox偏移量和置信度得分转换为可用于计算bbox坐标的值。然后使用anchors计算bbox坐标,将其转换为左上角和右下角坐标,并对置信度得分应用了阈值和NMS。最后,函数将bbox和类别id转换为检测框,其中第一维是批量大小,第二维是检测框数量,第三维是值。
原文地址: https://www.cveoy.top/t/topic/ZuS 著作权归作者所有。请勿转载和采集!