Add Non Maximum Suppression (NMS) to object detection model using ONNX
Integrate NMS node to your ONNX model
Check previous tutorial to integrate preprocessing layers.
Non Maximum Suppression is a commom object detection postprocessing step, where selects a single entity out of many overlapping entities. The criteria is usually discarding entities that are below a given probability bound. With remaining entities we repeatedly pick the entity with the highest probability (adapted from).
The main goal is to decrease num serial operations and take advantage of batch processing. For this we will use the ONNX NMS implementation.
Hint: use Netron to help model visualization.
NMS Node
For this tutorial I will consider a detector model that has Scores and Boxes as output.
- Scores has shape: (batch_size, spacial_dimension, num_classes)
- Boxes has shape: (batch_size, spacial_dimension, 4)
By SpacialDimension you can understand how Boxes/Anchors generated.
According ONNX documentation, the NMS node wait as inputs:
- boxes: tensor(float). An input tensor with shape [num_batches, spatial_dimension, 4]. The single box data format is indicated by center_point_box.
- scores: tensor(float). An input tensor with shape [num_batches, num_classes, spatial_dimension]
- max_output_boxes_per_class (optional): tensor(int64). Integer representing the maximum number of boxes to be selected per batch per class. It is a scalar. Default to 0, which means no output.
- iou_threshold (optional): tensor(float). Float representing the threshold for deciding whether boxes overlap too much with respect to IOU. It is scalar. Value range [0, 1]. Default to 0.
- score_threshold (optional): tensor(float). Float representing the threshold for deciding when to remove boxes based on score. It is a scalar.
Attributes:
- center_point_box : int (default is 0). Integer indicate the format of the box data. The default is 0. 0 — the box data is supplied as [y1, x1, y2, x2] where (y1, x1) and (y2, x2) are the coordinates of any diagonal pair of box corners and the coordinates can be provided as normalized (i.e., lying in the interval [0, 1]) or absolute. Mostly used for TF models. 1 — the box data is supplied as [x_center, y_center, width, height]. Mostly used for Pytorch models.
Output:
- selected_indices : tensor(int64). Selected indices from the boxes tensor. [num_selected_indices, 3], the selected index format is [batch_index, class_index, box_index].
As you can see, these are indeed the output names. Your model may have different names, so it is a good practice to use Netron.
!pip install onnx numpy
import onnx
from onnx import TensorProto
import numpy as np
model_path = '/path/to/your/model.onnx'
model = onnx.load(model_path)
graph = model.graph
My boxes are with correct format. My scores need apply Transpose operation before connect in NMS node.
# create transpose node
# (batch_size, spacial_dimension, num_classes) -> (batch_size, num_classes, spacial_dimension)
transpose_scores_node = onnx.helper.make_node(
'Transpose',
inputs=['scores'],
outputs=['scores_transposed'],
perm=(0, 2, 1))
# add to graph
graph.node.append(transpose_scores_node)
Define some inputs
max_detections = 200
score_thresh = 0.95
iou_thresh = 0.5
# make constant tensors
score_threshold = onnx.helper.make_tensor(
'score_threshold',
TensorProto.FLOAT,
[1],
[score_thresh])
iou_threshold = onnx.helper.make_tensor(
'iou_threshold',
TensorProto.FLOAT,
[1],
[iou_thresh])
max_output_boxes_per_class = onnx.helper.make_tensor(
'max_output_boxes_per_class',
TensorProto.INT64,
[1],
[max_detections])
Create NMS node and define new output
inputs_nms=['boxes', 'scores_transposed', 'max_output_boxes_per_class',
'iou_threshold', 'score_threshold']
outputs_nms = ['num_selected_indices']
nms_node = onnx.helper.make_node(
'NonMaxSuppression',
inputs_nms,
outputs_nms,
center_point_box=1,
)
# add to the list of graph nodes
graph.node.append(nms_node)
# initializer
graph.initializer.append(score_threshold)
graph.initializer.append(iou_threshold)
graph.initializer.append(max_output_boxes_per_class)
# define output
output_nms_value_info = onnx.helper.make_tensor_value_info(
'num_selected_indices',
TensorProto.INT64,
shape=['num_selected_indices', 3])
# add to graph
graph.output.append(output_nms_value_info)
Save model
onnx.save(model, 'model-nms-node.onnx')
Postprocessing NMS
Ok, now we have a new output that say which are my selected index. But yet not is the perfect output. A common way out is the NVIDIA DeepStream standard.
According TensorRT doc:
- num_detections: This is a
[batch_size, 1]
tensor of data typeint32
. The last dimension is a scalar indicating the number of valid detections per batch image. It can be less thanmax_output_boxes
. - detection_boxes: This is a
[batch_size, max_output_boxes, 4]
tensor of data typefloat32
orfloat16
, containing the coordinates of non-max suppressed boxes. The output coordinates will always be in BoxCorner format, regardless of the input code type. - detection_scores: This is a
[batch_size, max_output_boxes]
tensor of data typefloat32
orfloat16
, containing the scores for the boxes. - detection_classes: This is a
[batch_size, max_output_boxes]
tensor of data typeint32
, containing the classes for the boxes.
Doing this using ONNX can be hard.
!pip install torch
Instead we will use Pytorch to create postprocessing and then join using ONNX.
import torch
torch_boxes = torch.tensor([
[91.0,2,3,4,5,6],
[11,12,13,14,15,16],
[21,22,23,24,25,26],
[31,32,33,34,35,36],
]).unsqueeze(0)
torch_scores = torch.tensor([
[0.1,0.82,0.3,0.6,0.55,0.6],
[0.9,0.18,0.7,0.4,0.45,0.4],
]).unsqueeze(0)
torch_indices = torch.tensor([[0,0,0], [0,0,2], [0,0,1]])
torch_boxes = torch_boxes.permute(0, 2, 1)
torch_scores = torch_scores.permute(0, 2, 1)
Build a pytorch model
# 01
from torch import nn
class PostProcessingNMS(nn.Module):
def forward(self, idx, boxes, scores):
"""
idx: selected indices from the boxes tensor. [num_selected_indices, 3],
the selected index format is [batch_index, class_index, box_index]
boxes: in (X, Y, H, W) format. Shape is:
[batch_size, spacial_dimensions, 4]
scores: Shape is: [batch_size, spacial_dimensions, num_classes]
"""
bbox_result = self.gather(boxes, idx)
score_intermediate_result = self.gather(scores, idx).max(axis=-1)
score_result = score_intermediate_result.values
classes_result = score_intermediate_result.indices
num_dets = torch.tensor(score_result.shape[-1]).clone().detach()
return (bbox_result, score_result, classes_result, num_dets)
def gather(self, target, idx):
pick_indices = idx[:, -1:].repeat(1, target.shape[2]).unsqueeze(0)
return torch.gather(target, 1, pick_indices)
But, what if I want to apply a filter that remove class 0 (usually the background). Use:
# 02
from torch import nn
class PostProcessingNMS(nn.Module):
def forward(self, idx, boxes, scores):
"""
Args:
idx: selected indices from the boxes tensor. [num_selected_indices, 3],
the selected index format is [batch_index, class_index, box_index]
boxes: in (X, Y, H, W) format. Shape is:
[batch_size, spacial_dimensions, 4]
scores: Shape is: [batch_size, spacial_dimensions, num_classes]
"""
bbox_result = self.gather(boxes, idx)
score_intermediate_result = self.gather(scores, idx).max(axis=-1)
mask = score_intermediate_result.indices != 0
bbox_result = bbox_result[mask]
score_result = score_intermediate_result.values[mask]
classes_result = score_intermediate_result.indices[mask]
num_dets = torch.tensor(score_result.shape[-1]).clone().detach()
return (bbox_result, score_result, classes_result, num_dets)
def gather(self, target, idx):
pick_indices = idx[:, -1:].repeat(1, target.shape[2]).unsqueeze(0)
return torch.gather(target, 1, pick_indices)
Apart from applications like DeepStream, your only interest will be boxes selecteds. Use:
# 03
from torch import nn
class PostProcessingNMS(nn.Module):
def forward(self, idx, boxes, scores):
"""
Args:
idx: selected indices from the boxes tensor. [num_selected_indices, 3],
the selected index format is [batch_index, class_index, box_index]
boxes: in (X, Y, H, W) format. Shape is:
[batch_size, spacial_dimensions, 4]
scores: Shape is: [batch_size, spacial_dimensions, num_classes]
Output:
boxes selecteds
"""
bbox_result = self.gather(boxes, idx)
score_intermediate_result = self.gather(scores, idx).max(axis=-1)
mask = score_intermediate_result.indices != 0
bbox_result = bbox_result[mask]
return bbox_result
def gather(self, target, idx):
pick_indices = idx[:, -1:].repeat(1, target.shape[2]).unsqueeze(0)
return torch.gather(target, 1, pick_indices)
I chose case #03
postp = PostProcessingNMS()
dynamic = {
'boxes':{0:'batch', 1:'num_anchors', 2:'boxes'},
'scores':{0:'batch', 1:'num_anchors', 2:'classes',},
'num_selected_indices':{0:'num_results'},
'det_bboxes':{0:'batch', 1:'num_results'},
#'det_scores':{0:'batch', 1:'num_results'},
#'det_classes':{0:'batch', 1:'num_results'},
}
output_names=['det_bboxes',
#'det_scores', 'det_classes', 'num_dets'
]
torch.onnx.export(postp,
(torch_indices, torch_boxes, torch_scores),
'postp.onnx',
input_names=['num_selected_indices', 'boxes', 'scores'],
output_names=output_names,
dynamic_axes=dynamic,
opset_version=17)
With ONNX-sim, simplify your model:
!pip install onnxsim
!onnxsim postp.onnx postp-sim.onnx
Compose fully model
import onnx
from onnx import compose
from onnx.compose import merge_models
model_nms = onnx.load('model-nms-node.onnx')
model_postp = onnx.load('postp-sim.onnx')
# add prefix, resolve names conflits
postp_with_prefix = compose.add_prefix(model_postp, prefix='_')
# as in the other tutorial, check if the IR and Opset versions are the same
model_full = compose.merge_models(
model_nms,
postp_with_prefix,
io_map=[('scores', '_scores'),
('boxes', '_boxes'),
('num_selected_indices', '_num_selected_indices')])
onnx.save_model(model_prep, 'model_nms.onnx')
Your boxes usally are in relative format. To convert to absolute, multiply per image shape
# example img dim
width=4200
height=2800
boxes[:, 0] *= width
boxes[:, 1] *= height
boxes[:, 2] *= width
boxes[:, 3] *= height
# convert to int
boxes_int = boxes.astype(np.int32)
def rescale_bbox(box: np.ndarray) -> List[int]:
width = box[2] - box[0]
height = box[3] - box[1]
maximum = max(width, height)
dx = int((maximum - width)/2)
dy = int((maximum - height)/2)
bboxes = [box[0] - dx, box[1] - dy, box[2] + dx, box[3] + dy]
return bboxes
# then, reescale bbox to adapt to original img
for i in range(boxes_int.shape[0]):
box = rescale_bbox(boxes_int[i, :])
print(box)
This last steps can vary, check your model documentation.
Thank you 🏇. See you in the next post.