Add Non Maximum Suppression (NMS) to object detection model using ONNX

Vilson Rodrigues
6 min readSep 12, 2023

--

Integrate NMS node to your ONNX model

Stable Diffusion 2.1 prompt: a cat with multiple bounding boxes

Check previous tutorial to integrate preprocessing layers.

Non Maximum Suppression is a commom object detection postprocessing step, where selects a single entity out of many overlapping entities. The criteria is usually discarding entities that are below a given probability bound. With remaining entities we repeatedly pick the entity with the highest probability (adapted from).

NMS example. Source.

The main goal is to decrease num serial operations and take advantage of batch processing. For this we will use the ONNX NMS implementation.

Hint: use Netron to help model visualization.

NMS Node

For this tutorial I will consider a detector model that has Scores and Boxes as output.

  • Scores has shape: (batch_size, spacial_dimension, num_classes)
  • Boxes has shape: (batch_size, spacial_dimension, 4)

By SpacialDimension you can understand how Boxes/Anchors generated.

According ONNX documentation, the NMS node wait as inputs:

  • boxes: tensor(float). An input tensor with shape [num_batches, spatial_dimension, 4]. The single box data format is indicated by center_point_box.
  • scores: tensor(float). An input tensor with shape [num_batches, num_classes, spatial_dimension]
  • max_output_boxes_per_class (optional): tensor(int64). Integer representing the maximum number of boxes to be selected per batch per class. It is a scalar. Default to 0, which means no output.
  • iou_threshold (optional): tensor(float). Float representing the threshold for deciding whether boxes overlap too much with respect to IOU. It is scalar. Value range [0, 1]. Default to 0.
  • score_threshold (optional): tensor(float). Float representing the threshold for deciding when to remove boxes based on score. It is a scalar.

Attributes:

  • center_point_box : int (default is 0). Integer indicate the format of the box data. The default is 0. 0 — the box data is supplied as [y1, x1, y2, x2] where (y1, x1) and (y2, x2) are the coordinates of any diagonal pair of box corners and the coordinates can be provided as normalized (i.e., lying in the interval [0, 1]) or absolute. Mostly used for TF models. 1 — the box data is supplied as [x_center, y_center, width, height]. Mostly used for Pytorch models.

Output:

  • selected_indices : tensor(int64). Selected indices from the boxes tensor. [num_selected_indices, 3], the selected index format is [batch_index, class_index, box_index].
My detector. 2 outputs. Boxes and scores.

As you can see, these are indeed the output names. Your model may have different names, so it is a good practice to use Netron.

!pip install onnx numpy
import onnx
from onnx import TensorProto
import numpy as np

model_path = '/path/to/your/model.onnx'

model = onnx.load(model_path)

graph = model.graph

My boxes are with correct format. My scores need apply Transpose operation before connect in NMS node.

# create transpose node
# (batch_size, spacial_dimension, num_classes) -> (batch_size, num_classes, spacial_dimension)
transpose_scores_node = onnx.helper.make_node(
'Transpose',
inputs=['scores'],
outputs=['scores_transposed'],
perm=(0, 2, 1))

# add to graph
graph.node.append(transpose_scores_node)

Define some inputs

max_detections = 200
score_thresh = 0.95
iou_thresh = 0.5

# make constant tensors
score_threshold = onnx.helper.make_tensor(
'score_threshold',
TensorProto.FLOAT,
[1],
[score_thresh])

iou_threshold = onnx.helper.make_tensor(
'iou_threshold',
TensorProto.FLOAT,
[1],
[iou_thresh])

max_output_boxes_per_class = onnx.helper.make_tensor(
'max_output_boxes_per_class',
TensorProto.INT64,
[1],
[max_detections])

Create NMS node and define new output

inputs_nms=['boxes', 'scores_transposed', 'max_output_boxes_per_class',
'iou_threshold', 'score_threshold']
outputs_nms = ['num_selected_indices']

nms_node = onnx.helper.make_node(
'NonMaxSuppression',
inputs_nms,
outputs_nms,
center_point_box=1,
)

# add to the list of graph nodes
graph.node.append(nms_node)

# initializer
graph.initializer.append(score_threshold)
graph.initializer.append(iou_threshold)
graph.initializer.append(max_output_boxes_per_class)

# define output
output_nms_value_info = onnx.helper.make_tensor_value_info(
'num_selected_indices',
TensorProto.INT64,
shape=['num_selected_indices', 3])

# add to graph
graph.output.append(output_nms_value_info)

Save model

onnx.save(model, 'model-nms-node.onnx')
Model with new output: num_selected_indices.

Postprocessing NMS

Ok, now we have a new output that say which are my selected index. But yet not is the perfect output. A common way out is the NVIDIA DeepStream standard.

According TensorRT doc:

  • num_detections: This is a [batch_size, 1] tensor of data type int32. The last dimension is a scalar indicating the number of valid detections per batch image. It can be less than max_output_boxes.
  • detection_boxes: This is a [batch_size, max_output_boxes, 4] tensor of data type float32 or float16, containing the coordinates of non-max suppressed boxes. The output coordinates will always be in BoxCorner format, regardless of the input code type.
  • detection_scores: This is a [batch_size, max_output_boxes] tensor of data type float32 or float16, containing the scores for the boxes.
  • detection_classes: This is a [batch_size, max_output_boxes] tensor of data type int32, containing the classes for the boxes.

Doing this using ONNX can be hard.

!pip install torch

Instead we will use Pytorch to create postprocessing and then join using ONNX.

import torch

torch_boxes = torch.tensor([
[91.0,2,3,4,5,6],
[11,12,13,14,15,16],
[21,22,23,24,25,26],
[31,32,33,34,35,36],
]).unsqueeze(0)

torch_scores = torch.tensor([
[0.1,0.82,0.3,0.6,0.55,0.6],
[0.9,0.18,0.7,0.4,0.45,0.4],
]).unsqueeze(0)

torch_indices = torch.tensor([[0,0,0], [0,0,2], [0,0,1]])

torch_boxes = torch_boxes.permute(0, 2, 1)
torch_scores = torch_scores.permute(0, 2, 1)

Build a pytorch model

# 01
from torch import nn

class PostProcessingNMS(nn.Module):

def forward(self, idx, boxes, scores):
"""
idx: selected indices from the boxes tensor. [num_selected_indices, 3],
the selected index format is [batch_index, class_index, box_index]

boxes: in (X, Y, H, W) format. Shape is:
[batch_size, spacial_dimensions, 4]

scores: Shape is: [batch_size, spacial_dimensions, num_classes]
"""
bbox_result = self.gather(boxes, idx)
score_intermediate_result = self.gather(scores, idx).max(axis=-1)
score_result = score_intermediate_result.values
classes_result = score_intermediate_result.indices
num_dets = torch.tensor(score_result.shape[-1]).clone().detach()
return (bbox_result, score_result, classes_result, num_dets)

def gather(self, target, idx):
pick_indices = idx[:, -1:].repeat(1, target.shape[2]).unsqueeze(0)
return torch.gather(target, 1, pick_indices)

But, what if I want to apply a filter that remove class 0 (usually the background). Use:

# 02
from torch import nn

class PostProcessingNMS(nn.Module):

def forward(self, idx, boxes, scores):
"""
Args:
idx: selected indices from the boxes tensor. [num_selected_indices, 3],
the selected index format is [batch_index, class_index, box_index]

boxes: in (X, Y, H, W) format. Shape is:
[batch_size, spacial_dimensions, 4]

scores: Shape is: [batch_size, spacial_dimensions, num_classes]
"""
bbox_result = self.gather(boxes, idx)
score_intermediate_result = self.gather(scores, idx).max(axis=-1)

mask = score_intermediate_result.indices != 0

bbox_result = bbox_result[mask]
score_result = score_intermediate_result.values[mask]
classes_result = score_intermediate_result.indices[mask]

num_dets = torch.tensor(score_result.shape[-1]).clone().detach()
return (bbox_result, score_result, classes_result, num_dets)

def gather(self, target, idx):
pick_indices = idx[:, -1:].repeat(1, target.shape[2]).unsqueeze(0)
return torch.gather(target, 1, pick_indices)

Apart from applications like DeepStream, your only interest will be boxes selecteds. Use:

# 03
from torch import nn

class PostProcessingNMS(nn.Module):

def forward(self, idx, boxes, scores):
"""
Args:
idx: selected indices from the boxes tensor. [num_selected_indices, 3],
the selected index format is [batch_index, class_index, box_index]

boxes: in (X, Y, H, W) format. Shape is:
[batch_size, spacial_dimensions, 4]

scores: Shape is: [batch_size, spacial_dimensions, num_classes]

Output:
boxes selecteds
"""
bbox_result = self.gather(boxes, idx)

score_intermediate_result = self.gather(scores, idx).max(axis=-1)

mask = score_intermediate_result.indices != 0

bbox_result = bbox_result[mask]

return bbox_result

def gather(self, target, idx):
pick_indices = idx[:, -1:].repeat(1, target.shape[2]).unsqueeze(0)
return torch.gather(target, 1, pick_indices)

I chose case #03

postp = PostProcessingNMS()

dynamic = {
'boxes':{0:'batch', 1:'num_anchors', 2:'boxes'},
'scores':{0:'batch', 1:'num_anchors', 2:'classes',},
'num_selected_indices':{0:'num_results'},
'det_bboxes':{0:'batch', 1:'num_results'},
#'det_scores':{0:'batch', 1:'num_results'},
#'det_classes':{0:'batch', 1:'num_results'},
}

output_names=['det_bboxes',
#'det_scores', 'det_classes', 'num_dets'
]

torch.onnx.export(postp,
(torch_indices, torch_boxes, torch_scores),
'postp.onnx',
input_names=['num_selected_indices', 'boxes', 'scores'],
output_names=output_names,
dynamic_axes=dynamic,
opset_version=17)

With ONNX-sim, simplify your model:

!pip install onnxsim
!onnxsim postp.onnx postp-sim.onnx
Simplified Postprocessing NMS model

Compose fully model

import onnx
from onnx import compose
from onnx.compose import merge_models
model_nms = onnx.load('model-nms-node.onnx')
model_postp = onnx.load('postp-sim.onnx')
# add prefix, resolve names conflits
postp_with_prefix = compose.add_prefix(model_postp, prefix='_')

# as in the other tutorial, check if the IR and Opset versions are the same
model_full = compose.merge_models(
model_nms,
postp_with_prefix,
io_map=[('scores', '_scores'),
('boxes', '_boxes'),
('num_selected_indices', '_num_selected_indices')])
onnx.save_model(model_prep, 'model_nms.onnx')
The final model.

Your boxes usally are in relative format. To convert to absolute, multiply per image shape

# example img dim
width=4200
height=2800

boxes[:, 0] *= width
boxes[:, 1] *= height
boxes[:, 2] *= width
boxes[:, 3] *= height

# convert to int
boxes_int = boxes.astype(np.int32)

def rescale_bbox(box: np.ndarray) -> List[int]:
width = box[2] - box[0]
height = box[3] - box[1]
maximum = max(width, height)
dx = int((maximum - width)/2)
dy = int((maximum - height)/2)
bboxes = [box[0] - dx, box[1] - dy, box[2] + dx, box[3] + dy]
return bboxes

# then, reescale bbox to adapt to original img
for i in range(boxes_int.shape[0]):
box = rescale_bbox(boxes_int[i, :])
print(box)

This last steps can vary, check your model documentation.

Thank you 🏇. See you in the next post.

References

--

--