정우진

nvidia retinanet container

Showing 65 changed files with 7799 additions and 0 deletions
1 +# NVIDIA ODTK change log
2 +
3 +## Version 0.2.6 -- 2021-04-04
4 +
5 +### Added
6 +* `--no-apex` option to `odtk train` and `odtk infer`.
7 + * This parameter allows you to switch to Pytorch native AMP and DistributedDataParallel.
8 +* Adding validation stats to TensorBoard.
9 +
10 +### Changed
11 +* Pytorch Docker container 20.11 from 20.06
12 +* Added training and inference support for PyTorch native AMP, and torch.nn.parallel.DistributedDataParallel (use `--no-apex`).
13 +* Switched the Pytorch Model and Data Memory Format to Channels Last. (see [Memory Format Tutorial](https://pytorch.org/tutorials/intermediate/memory_format_tutorial.html))
14 +* Bug fixes:
15 + * Workaround for `'No detections!'` during vlidation added. (see [#52663](https://github.com/pytorch/pytorch/issues/52663))
16 + * Freeze unused parameters from torchvision models from autograd gradient calculations.
17 + * Make tensorboard writer exclusive to the master process to prevent race conditions.
18 +* Renamed instances of `retinanet` to `odtk` (folder, C++ namepsaces, etc.)
19 +
20 +
21 +## Version 0.2.5 -- 2020-06-27
22 +
23 +### Added
24 +* `--dynamic-batch-opts` option to `odtk export`.
25 + * This parameter allows you to provide TensorRT Optimiation Profile batch sizes for engine export (min, opt, max).
26 +
27 +### Changed
28 +* Updated TensorRT plugins to allow for dynamic batch sizes (see https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#work_dynamic_shapes and https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_plugin_v2_dynamic_ext.html).
29 +
30 +
31 +## Version 0.2.4 -- 2020-04-20
32 +
33 +### Added
34 +* `--anchor-ious` option to `odtk train`.
35 + * This parameter allows you to adjust the background and foreground anchor IoU threshold. The default values are `[0.4, 0.5].`
36 + * Example `--anchor-ious 0.3 0.5`. This would mean that any anchor with an IoU of less than 0.3 is assigned to background,
37 + and that any anchor with an IoU of greater than 0.5 is assigned to the foreground object, which is atmost one.
38 +
39 +## Version 0.2.3 -- 2020-04-14
40 +
41 +### Added
42 +* `MobileNetV2FPN` backbone
43 +
44 +## Version 0.2.2 -- 2020-04-01
45 +
46 +### Added
47 +* Rotated bounding box detections models can now be exported to ONNX and TensorRT using `odtk export model.pth model.plan --rotated-bbox`
48 +* The `--rotated-bbox` flag is automatically applied when running `odtk infer` or `odtk export` _on a model trained with ODTK version 0.2.2 or later_.
49 +
50 +### Changed
51 +
52 +* Improvements to the rotated IoU calculations.
53 +
54 +### Limitations
55 +
56 +* The C++ API cannot currently infer rotated bounding box models.
57 +
58 +## Version 0.2.1 -- 2020-03-18
59 +
60 +### Added
61 +* The DALI dataloader (flag `--with-dali`) now supports image augmentation using:
62 + * `--augment-brightness` : Randomly adjusts brightness of image
63 + * `--augment-contrast` : Randomly adjusts contrast of image
64 + * `--augment-hue` : Randomly adjusts hue of image
65 + * `--augment-saturation` : Randomly adjusts saturation of image
66 +
67 +### Changed
68 +* The code in `box.py` for generating anchors has been improved.
69 +
70 +## Version 0.2.0 -- 2020-03-13
71 +
72 +Version 0.2.0 introduces rotated detections.
73 +
74 +### Added
75 +* `train arguments`:
76 + * `--rotated-bbox`: Trains a model is predict rotated bounding boxes `[x, y, w, h, theta]` instead of axis aligned boxes `[x, y, w, h]`.
77 +* `infer arguments`:
78 + * `--rotated-bbox`: Infer a rotated model.
79 +
80 +### Changed
81 +The project has reverted to the name **Object Detection Toolkit** (ODTK), to better reflect the multi-network nature of the repo.
82 +* `retinanet` has been replaced with `odtk`. All subcommands remain the same.
83 +
84 +### Limitations
85 +* Models trained using the `--rotated-bbox` flag cannot be exported to ONNX or a TensorRT Engine.
86 +* PyTorch raises two warnings which can be ignored:
87 +
88 +Warning 1: NCCL watchdog
89 +```
90 +[E ProcessGroupNCCL.cpp:284] NCCL watchdog thread terminated
91 +```
92 +
93 +Warning 2: Save state warning
94 +```
95 +/opt/conda/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:201: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler.
96 + warnings.warn(SAVE_STATE_WARNING, UserWarning)
97 +```
98 +
99 +## Version 0.1.1 -- 2020-03-06
100 +
101 +### Added
102 + * `train` arguments
103 + * `--augment-rotate`: Randomly rotates the training images by 0°, 90°, 180° or 270°.
104 + * `--augment-brightness` : Randomly adjusts brightness of image
105 + * `--augment-contrast` : Randomly adjusts contrast of image
106 + * `--augment-hue` : Randomly adjusts hue of image
107 + * `--augment-saturation` : Randomly adjusts saturation of image
108 + * `--regularization-l2` : Sets the L2 regularization of the optimizer.
1 +Reporting problems, asking questions
2 +------------------------------------
3 +
4 +
5 +We appreciate feedback, questions or bug reports. When you need help with the code, try to follow the process outlined in the Stack Overflow (https://stackoverflow.com/help/mcve) document.
6 +
7 +At a minimum, your issues should describe the following:
8 +
9 +* What command you ran
10 +* The hardware and container that you are using
11 +* The version of ODTK you are using
12 +* What was the result you observed
13 +* What was the result you expected
1 +FROM nvcr.io/nvidia/pytorch:20.11-py3
2 +
3 +COPY . odtk/
4 +RUN pip install --no-cache-dir -e odtk/
1 +FROM nvcr.io/nvidia/pytorch:20.03-py3
2 +COPY . /workspace/retinanet-examples/
3 +RUN apt-get update && apt-get install -y libssl1.0.0 libgstreamer1.0-0 gstreamer1.0-tools gstreamer1.0-plugins-good gstreamer1.0-plugins-bad gstreamer1.0-plugins-ugly gstreamer1.0-libav libgstrtspserver-1.0-0 libjansson4 ffmpeg libjson-glib-1.0 libgles2-mesa
4 +RUN git clone https://github.com/edenhill/librdkafka.git /librdkafka && \
5 + cd /librdkafka && ./configure && make -j && make -j install && \
6 + mkdir -p /opt/nvidia/deepstream/deepstream-4.0/lib && \
7 + cp /usr/local/lib/librdkafka* /opt/nvidia/deepstream/deepstream-4.0/lib && \
8 + rm -rf /librdkafka
9 +WORKDIR /workspace/retinanet-examples/extras/deepstream/DeepStream_Release/deepstream_sdk_v4.0.2_x86_64
10 +RUN tar -xvf binaries.tbz2 -C / && \
11 + ./install.sh
12 +# config files + sample apps
13 +RUN chmod u+x ./sources/tools/nvds_logger/setup_nvds_logger.sh
14 +
15 +WORKDIR /usr/lib/x86_64-linux-gnu
16 +RUN ln -sf libnvcuvid.so.1 libnvcuvid.so
17 +
18 +WORKDIR /workspace/retinanet-examples
19 +RUN pip install --no-cache-dir -e .
20 +RUN mkdir extras/deepstream/deepstream-sample/build && \
21 + cd extras/deepstream/deepstream-sample/build && \
22 + cmake -DDeepStream_DIR=/workspace/retinanet-examples/extras/deepstream/DeepStream_Release/deepstream_sdk_v4.0.2_x86_64 .. && make -j
23 +WORKDIR /workspace/retinanet-examples/extras/deepstream
1 +# Inference
2 +
3 +We provide two ways inferring using `odtk`:
4 +* PyTorch inference using a trained model (FP32 or FP16 precision)
5 +* Export trained pytorch model to TensorRT for optimized inference (FP32, FP16 or INT8 precision)
6 +
7 +`odtk infer` will run distributed inference across all available GPUs. When using PyTorch, the default behavior is to run inference with mixed precision. The precision used when running inference with a TensorRT engine will correspond to the precision chosen when the model was exported to TensorRT (see [TensorRT section](#exporting-trained-pytorch-model-to-tensorrt) below).
8 +
9 +**NOTE**: Availability of HW support for fast FP16 and INT8 precision like [NVIDIA Tensor Cores](https://www.nvidia.com/en-us/data-center/tensorcore/) depends on your GPU architecture: Volta or newer GPUs support both FP16 and INT8, and Pascal GPUs can support either FP16 or INT8.
10 +
11 +## PyTorch Inference
12 +
13 +Evaluate trained PyTorch detection model on COCO 2017 (mixed precision):
14 +
15 +```bash
16 +odtk infer model.pth --images=/data/coco/val2017 --annotations=instances_val2017.json --batch 8
17 +```
18 +**NOTE**: `--batch N` specifies *global* batch size to be used for inference. The batch size per GPU will be `N // num_gpus`.
19 +
20 +Use full precision (FP32) during evaluation:
21 +
22 +```bash
23 +odtk infer model.pth --images=/data/coco/val2017 --annotations=instances_val2017.json --full-precision
24 +```
25 +
26 +Evaluate PyTorch detection model with a small input image size:
27 +
28 +```bash
29 +odtk infer model.pth --images=/data/coco/val2017 --annotations=instances_val2017.json --resize 400 --max-size 640
30 +```
31 +Here, the shorter side of the input images will be resized to `resize` as long as the longer side doesn't get larger than `max-size`, otherwise the longer side of the input image will be resized to `max-size`.
32 +
33 +**NOTE**: To get best accuracy, training the model at the preferred export size is encouraged.
34 +
35 +Run inference using your own dataset:
36 +
37 +```bash
38 +odtk infer model.pth --images=/data/your_images --output=detections.json
39 +```
40 +
41 +## Exporting trained PyTorch model to TensorRT
42 +
43 +`odtk` provides an simple workflow to optimize a trained PyTorch model for inference deployment using TensorRT. The PyTorch model is exported to [ONNX](https://github.com/onnx/onnx), and then the ONNX model is consumed and optimized by TensorRT.
44 +To learn more about TensorRT optimization, refer here: https://developer.nvidia.com/tensorrt
45 +
46 +**NOTE**: When a model is optimized with TensorRT, the output is a TensorRT engine (.plan file) that can be used for deployment. This TensorRT engine has several fixed properties that are specified during the export process.
47 +* Input image size: TensorRT engines only support a fixed input size.
48 +* Precision: TensorRT supports FP32, FP16, or INT8 precision.
49 +* Target GPU: TensorRT optimizations are tied to the type of GPU on the system where optimization is performed. They are not transferable across different types of GPUs. Put another way, if you aim to deploy your TensorRT engine on a Tesla T4 GPU, you must run the optimization on a system with a T4 GPU.
50 +
51 +The workflow for exporting a trained PyTorch detection model to TensorRT is as simple as:
52 +
53 +```bash
54 +odtk export model.pth model_fp16.plan --size 1280
55 +```
56 +This will create a TensorRT engine optimized for batch size 1, using an input size of 1280x1280. By default, the engine will be created to run in FP16 precision.
57 +
58 +Export your model to use full precision using a non-square input size:
59 +```bash
60 +odtk export model.pth model_fp32.plan --full-precision --size 800 1280
61 +```
62 +
63 +In order to use INT8 precision with TensorRT, you need to provide calibration images (images that are representative of what will be seen at runtime) that will be used to rescale the network.
64 +```bash
65 +odtk export model.pth model_int8.plan --int8 --calibration-images /data/val/ --calibration-batches 2 --calibration-table model_calibration_table
66 +```
67 +
68 +This will randomly select 16 images from `/data/val/` to calibrate the network for INT8 precision. The results from calibration will be saved to `model_calibration_table` that can be used to create subsequent INT8 engines for this model without needed to recalibrate.
69 +
70 +**NOTE:** Number of images in `/data/val/` must be greater than or equal to the kOPT(middle) optimization profile from `--dynamic-batch-opts`. Here, the default kOPT is 8.
71 +
72 +Build an INT8 engine for a previously calibrated model:
73 +```bash
74 +odtk export model.pth model_int8.plan --int8 --calibration-table model_calibration_table
75 +```
76 +
77 +## Deployment with TensorRT on NVIDIA Jetson AGX Xavier
78 +
79 +We provide a path for deploying trained models with TensorRT onto embedded platforms like [NVIDIA Jetson AGX Xavier](https://developer.nvidia.com/embedded/buy/jetson-agx-xavier-devkit), where PyTorch is not readily available.
80 +
81 +You will need to export your trained PyTorch model to ONNX representation on your host system, and copy the resulting ONNX model to your Jetson AGX Xavier:
82 +```bash
83 +odtk export model.pth model.onnx --size 800 1280
84 +```
85 +
86 +Refer to additional documentation on using the example cppapi code to build the TensorRT engine and run inference here: [cppapi example code](extras/cppapi/README.md)
87 +
88 +## Rotated detections
89 +
90 +*Rotated ODTK* allows users to train and infer rotated bounding boxes in imagery.
91 +
92 +### Inference
93 +
94 +An example command:
95 +```
96 +odtk infer model.pth --images /data/val --annotations /data/val_rotated.json --output /data/detections.json \
97 + --resize 768 --rotated-bbox
98 +```
99 +
100 +### Export
101 +
102 +Rotated bounding box models can be exported to create TensorRT engines by using the axis aligned command with the addition of `--rotated-bbox`.
...\ No newline at end of file ...\ No newline at end of file
1 +Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
2 +
3 +Redistribution and use in source and binary forms, with or without
4 +modification, are permitted provided that the following conditions
5 +are met:
6 + * Redistributions of source code must retain the above copyright
7 + notice, this list of conditions and the following disclaimer.
8 + * Redistributions in binary form must reproduce the above copyright
9 + notice, this list of conditions and the following disclaimer in the
10 + documentation and/or other materials provided with the distribution.
11 + * Neither the name of NVIDIA CORPORATION nor the names of its
12 + contributors may be used to endorse or promote products derived
13 + from this software without specific prior written permission.
14 +
15 +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
16 +EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
17 +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
18 +PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
19 +CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
20 +EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
21 +PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
22 +PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
23 +OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
24 +(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
25 +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
1 +# NVIDIA Object Detection Toolkit (ODTK)
2 +
3 +**Fast** and **accurate** single stage object detection with end-to-end GPU optimization.
4 +
5 +## Description
6 +
7 +ODTK is a single shot object detector with various backbones and detection heads. This allows performance/accuracy trade-offs.
8 +
9 +It is optimized for end-to-end GPU processing using:
10 +* The [PyTorch](https://pytorch.org) deep learning framework with [ONNX](https://onnx.ai) support
11 +* NVIDIA [Apex](https://github.com/NVIDIA/apex) for mixed precision and distributed training
12 +* NVIDIA [DALI](https://github.com/NVIDIA/DALI) for optimized data pre-processing
13 +* NVIDIA [TensorRT](https://developer.nvidia.com/tensorrt) for high-performance inference
14 +* NVIDIA [DeepStream](https://developer.nvidia.com/deepstream-sdk) for optimized real-time video streams support
15 +
16 +## Rotated bounding box detections
17 +
18 +This repo now supports rotated bounding box detections. See [rotated detections training](TRAINING.md#rotated-detections) and [rotated detections inference](INFERENCE.md#rotated-detections) documents for more information on how to use the `--rotated-bbox` command.
19 +
20 +Bounding box annotations are described by `[x, y, w, h, theta]`.
21 +
22 +## Performance
23 +
24 +The detection pipeline allows the user to select a specific backbone depending on the latency-accuracy trade-off preferred.
25 +
26 +ODTK **RetinaNet** model accuracy and inference latency & FPS (frames per seconds) for [COCO 2017](http://cocodataset.org/#detection-2017) (train/val) after full training schedule. Inference results include bounding boxes post-processing for a batch size of 1. Inference measured at `--resize 800` using `--with-dali` on a FP16 TensorRT engine.
27 +
28 +Backbone | mAP @[IoU=0.50:0.95] | Training Time on [DGX1v](https://www.nvidia.com/en-us/data-center/dgx-1/) | Inference latency FP16 on [V100](https://www.nvidia.com/en-us/data-center/tesla-v100/) | Inference latency INT8 on [T4](https://www.nvidia.com/en-us/data-center/tesla-t4/) | Inference latency FP16 on [A100](https://www.nvidia.com/en-us/data-center/a100/) | Inference latency INT8 on [A100](https://www.nvidia.com/en-us/data-center/a100/)
29 +--- | :---: | :---: | :---: | :---: | :---: | :---:
30 +[ResNet18FPN](https://github.com/NVIDIA/retinanet-examples/releases/download/19.04/retinanet_rn18fpn.zip) | 0.318 | 5 hrs | 14 ms;</br>71 FPS | 18 ms;</br>56 FPS | 9 ms;</br>110 FPS | 7 ms;</br>141 FPS
31 +[MobileNetV2FPN](https://github.com/NVIDIA/retinanet-examples/releases/download/v0.2.3/retinanet_mobilenetv2fpn.pth) | 0.333 | | 14 ms;</br>74 FPS | 18 ms;</br>56 FPS | 9 ms;</br>114 FPS | 7 ms;</br>138 FPS
32 +[ResNet34FPN](https://github.com/NVIDIA/retinanet-examples/releases/download/19.04/retinanet_rn34fpn.zip) | 0.343 | 6 hrs | 16 ms;</br>64 FPS | 20 ms;</br>50 FPS | 10 ms;</br>103 FPS | 7 ms;</br>142 FPS
33 +[ResNet50FPN](https://github.com/NVIDIA/retinanet-examples/releases/download/19.04/retinanet_rn50fpn.zip) | 0.358 | 7 hrs | 18 ms;</br>56 FPS | 22 ms;</br>45 FPS | 11 ms;</br>93 FPS | 8 ms;</br>129 FPS
34 +[ResNet101FPN](https://github.com/NVIDIA/retinanet-examples/releases/download/19.04/retinanet_rn101fpn.zip) | 0.376 | 10 hrs | 22 ms;</br>46 FPS | 27 ms;</br>37 FPS | 13 ms;</br>78 FPS | 9 ms;</br>117 FPS
35 +[ResNet152FPN](https://github.com/NVIDIA/retinanet-examples/releases/download/19.04/retinanet_rn152fpn.zip) | 0.393 | 12 hrs | 26 ms;</br>38 FPS | 33 ms;</br>31 FPS | 15 ms;</br>66 FPS | 10 ms;</br>103 FPS
36 +
37 +## Installation
38 +
39 +For best performance, use the latest [PyTorch NGC docker container](https://ngc.nvidia.com/catalog/containers/nvidia:pytorch). Clone this repository, build and run your own image:
40 +
41 +```bash
42 +git clone https://github.com/nvidia/retinanet-examples
43 +docker build -t odtk:latest retinanet-examples/
44 +docker run --gpus all --rm --ipc=host -it odtk:latest
45 +```
46 +
47 +## Usage
48 +
49 +Training, inference, evaluation and model export can be done through the `odtk` utility.
50 +For more details, including a list of parameters, please refer to the [TRAINING](TRAINING.md) and [INFERENCE](INFERENCE.md) documentation.
51 +
52 +### Training
53 +
54 +Train a detection model on [COCO 2017](http://cocodataset.org/#download) from pre-trained backbone:
55 +```bash
56 +odtk train retinanet_rn50fpn.pth --backbone ResNet50FPN \
57 + --images /coco/images/train2017/ --annotations /coco/annotations/instances_train2017.json \
58 + --val-images /coco/images/val2017/ --val-annotations /coco/annotations/instances_val2017.json
59 +```
60 +
61 +### Fine Tuning
62 +
63 +Fine-tune a pre-trained model on your dataset. In the example below we use [Pascal VOC](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html) with [JSON annotations](https://storage.googleapis.com/coco-dataset/external/PASCAL_VOC.zip):
64 +```bash
65 +odtk train model_mydataset.pth --backbone ResNet50FPN \
66 + --fine-tune retinanet_rn50fpn.pth \
67 + --classes 20 --iters 10000 --val-iters 1000 --lr 0.0005 \
68 + --resize 512 --jitter 480 640 --images /voc/JPEGImages/ \
69 + --annotations /voc/pascal_train2012.json --val-annotations /voc/pascal_val2012.json
70 +```
71 +
72 +Note: the shorter side of the input images will be resized to `resize` as long as the longer side doesn't get larger than `max-size`. During training, the images will be randomly randomly resized to a new size within the `jitter` range.
73 +
74 +### Inference
75 +
76 +Evaluate your detection model on [COCO 2017](http://cocodataset.org/#download):
77 +```bash
78 +odtk infer retinanet_rn50fpn.pth --images /coco/images/val2017/ --annotations /coco/annotations/instances_val2017.json
79 +```
80 +
81 +Run inference on [your dataset](#datasets):
82 +```bash
83 +odtk infer retinanet_rn50fpn.pth --images /dataset/val --output detections.json
84 +```
85 +
86 +### Optimized Inference with TensorRT
87 +
88 +For faster inference, export the detection model to an optimized FP16 TensorRT engine:
89 +```bash
90 +odtk export model.pth engine.plan
91 +```
92 +
93 +Evaluate the model with TensorRT backend on [COCO 2017](http://cocodataset.org/#download):
94 +```bash
95 +odtk infer engine.plan --images /coco/images/val2017/ --annotations /coco/annotations/instances_val2017.json
96 +```
97 +
98 +### INT8 Inference with TensorRT
99 +
100 +For even faster inference, do INT8 calibration to create an optimized INT8 TensorRT engine:
101 +```bash
102 +odtk export model.pth engine.plan --int8 --calibration-images /coco/images/val2017/
103 +```
104 +This will create an INT8CalibrationTable file that can be used to create INT8 TensorRT engines for the same model later on without needing to do calibration.
105 +
106 +Or create an optimized INT8 TensorRT engine using a cached calibration table:
107 +```bash
108 +odtk export model.pth engine.plan --int8 --calibration-table /path/to/INT8CalibrationTable
109 +```
110 +
111 +## Datasets
112 +
113 +RetinaNet supports annotations in the [COCO JSON format](http://cocodataset.org/#format-data).
114 +When converting the annotations from your own dataset into JSON, the following entries are required:
115 +```
116 +{
117 + "images": [{
118 + "id" : int,
119 + "file_name" : str
120 + }],
121 + "annotations": [{
122 + "id" : int,
123 + "image_id" : int,
124 + "category_id" : int,
125 + "bbox" : [x, y, w, h] # all floats
126 + "area": float # w * h. Required for validation scores
127 + "iscrowd": 0 # Required for validation scores
128 + }],
129 + "categories": [{
130 + "id" : int
131 + ]}
132 +}
133 +```
134 +
135 +If using the `--rotated-bbox` flag for rotated detections, add an additional float `theta` to the annotations. To get validation scores you also need to fill the `segmentation` section.
136 +```
137 + "bbox" : [x, y, w, h, theta] # all floats, where theta is measured in radians anti-clockwise from the x-axis.
138 + "segmentation" : [[x1, y1, x2, y2, x3, y3, x4, y4]]
139 + # Required for validation scores.
140 +```
141 +
142 +## Disclaimer
143 +
144 +This is a research project, not an official NVIDIA product.
145 +
146 +## Jetpack compatibility
147 +
148 +This branch uses TensorRT 7. If you are training and inferring models using PyTorch, or are creating TensorRT engines on Tesla GPUs (eg V100, T4), then you should use this branch.
149 +
150 +If you wish to deploy your model to a Jetson device (eg - Jetson AGX Xavier) running Jetpack version 4.3, then you should use the `19.10` branch of this repo.
151 +
152 +## References
153 +
154 +- [Focal Loss for Dense Object Detection](https://arxiv.org/abs/1708.02002).
155 + Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollár.
156 + ICCV, 2017.
157 +- [Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour](https://arxiv.org/abs/1706.02677).
158 + Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, Kaiming He.
159 + June 2017.
160 +- [Feature Pyramid Networks for Object Detection](https://arxiv.org/abs/1612.03144).
161 + Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, Serge Belongie.
162 + CVPR, 2017.
163 +- [Deep Residual Learning for Image Recognition](http://arxiv.org/abs/1512.03385).
164 + Kaiming He, Xiangyu Zhang, Shaoqing Renm Jian Sun.
165 + CVPR, 2016.
1 +# Training
2 +
3 +There are two main ways to train a model with `odtk`:
4 +* Fine-tuning the detection model using a model already trained on a large dataset (like MS-COCO)
5 +* Fully training the detection model from random initialization using a pre-trained backbone (usually ImageNet)
6 +
7 +## Fine-tuning
8 +
9 +Fine-tuning an existing model trained on COCO allows you to use transfer learning to get a accurate model for your own dataset with minimal training.
10 +When fine-tuning, we re-initialize the last layer of the classification head so the network will re-learn how to map features to classes scores regardless of the number of classes in your own dataset.
11 +
12 +You can fine-tune a pre-trained model on your dataset. In the example below we take a model trained on COCO, and then fine-tune using [Pascal VOC](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html) with [JSON annotations](https://storage.googleapis.com/coco-dataset/external/PASCAL_VOC.zip):
13 +```bash
14 +odtk train model_mydataset.pth \
15 + --fine-tune retinanet_rn50fpn.pth \
16 + --classes 20 --iters 10000 --val-iters 1000 --lr 0.0005 \
17 + --resize 512 --jitter 480 640 --images /voc/JPEGImages/ \
18 + --annotations /voc/pascal_train2012.json --val-annotations /voc/pascal_val2012.json
19 +```
20 +
21 +Even though the COCO model was trained on 80 classes, we can easily use tranfer learning to fine-tune it on the Pascal VOC model representing only 20 classes.
22 +
23 +The shorter side of the input images will be resized to `resize` as long as the longer side doesn't get larger than `max-size`.
24 +During training the images will be randomly resized to a new size within the `jitter` range.
25 +
26 +We usually want to fine-tune the model with a lower learning rate `lr` than during full training and for less iterations `iters`.
27 +
28 +## Full Training
29 +
30 +If you do not have a pre-trained model, if your dataset is substantially large, or if you have written your own backbone, then you should fully train the detection model.
31 +
32 +Full training usually starts from a pre-trained backbone (automatically downloaded with the current backbones we offer) that has been pre-trained on a classification task with a large dataset like [ImageNet](http://www.image-net.org).
33 +This is especially necessary for backbones using batch normalization as they require large batch sizes during training that cannot be provided when training on the detection task as the input images have to be relatively large.
34 +
35 +Train a detection model on [COCO 2017](http://cocodataset.org/#download) from pre-trained backbone:
36 +```bash
37 +odtk train retinanet_rn50fpn.pth --backbone ResNet50FPN \
38 + --images /coco/images/train2017/ --annotations /coco/annotations/instances_train2017.json \
39 + --val-images /coco/images/val2017/ --val-annotations /coco/annotations/instances_val2017.json
40 +```
41 +
42 +## Training arguments
43 +
44 +### Positional arguments
45 +* The only positional argument is the name of the model. This can be a full path, or relative to the current directory.
46 +```bash
47 +odtk train model.pth
48 +```
49 +
50 +### Other arguments
51 +The following arguments are available during training:
52 +
53 +* `--annotations` (str): Path to COCO style annotations (required).
54 +* `--images` (str): Path to a directory of images (required).
55 +* `--lr` (float): Sets the learning rate. Default: 0.01.
56 +* `--full-precision`: By default we train using mixed precision. Include this argument to instead train in full precision.
57 +* `--warmup` (int): The number of initial iterations during which we want to linearly ramp-up the learning rate to avoid early divergence of the loss. Default: 1000
58 +* `--backbone` (str): Specify one of the supported backbones. Default: `ResNet50FPN`
59 +* `--classes` (int): The number of classes in your dataset. Default: 80
60 +* `--batch` (int): The size of each training batch. Default: 2 x number of GPUs.
61 +* `--max-size` (int): The longest edge of your training image will be resized, so that it is always less than or equal to `max-size`. Default: 1333.
62 +* `--jitter` (int int): The shortest edge of your training images will be resized to int1 >= shortest edge >= int2, unless the longest edge exceeds `max-size`, in which case the longest edge will be resized to `max-size` and the shortest length will be sized to keep the aspect ratio constant. Default: 640 1024.
63 +* `--resize` (int): During validation inference, the shortest edge of your training images will be resized to int, unless the longest edge exceeds `max-size`, in which case the longest edge will be resized to `max-size` and the shortest length will be sized to keep the aspect ratio constant. Default: 800.
64 +* `--iters` (int): The number of iterations to process. An iteration is the processing (forward and backward pass) of one batch. Number of epochs is (`iters` x `batch`) / `len(data)`. Default: 90000.
65 +* `--milestones` (int int): The learning rate is multiplied by `--gamma` every time it reaches a milestone. Default: 60000 80000.
66 +* `--gamma` (float): The learning rate is multiplied by `--gamma` every time it reaches a milestone. Default: 0.1.
67 +* `--override`: Do not continue training from `model.pth`, instead overwrite it.
68 +* `--val-annotations` (str): Path to COCO style annotations. If supplied, `pycocotools` will be used to give validation mAP.
69 +* `--val-images` (str): Path to directory of validation images.
70 +* `--val-iters` (int): Run inference on the validation set every int iterations.
71 +* `--fine-tune` (str): Fine tune from a model at path str.
72 +* `--with-dali`: Load data using DALI.
73 +* `--augment-rotate`: Randomly rotates the training images by 0&deg;, 90&deg;, 180&deg; or 270&deg;.
74 +* `--augment-brightness` (float): Randomly adjusts brightness of image. The value sets the standard deviation of a Gaussian distribution. The degree of augmentation is selected from this distribution. Default: 0.002
75 +* `--augment-contrast` (float): Randomly adjusts contrast of image. The value sets the standard deviation of a Gaussian distribution. The degree of augmentation is selected from this distribution. Default: 0.002
76 +* `--augment-hue` (float): Randomly adjusts hue of image. The value sets the standard deviation of a Gaussian distribution. The degree of augmentation is selected from this distribution. Default: 0.0002
77 +* `--augment-saturation` (float): Randomly adjusts saturation of image. The value sets the standard deviation of a Gaussian distribution. The degree of augmentation is selected from this distribution. Default: 0.002
78 +* `--regularization-l2` (float): Sets the L2 regularization of the optimizer. Default: 0.0001
79 +
80 +You can also monitor the loss and learning rate schedule of the training using TensorBoard bu specifying a `logdir` path.
81 +
82 +## Rotated detections
83 +
84 +*Rotated ODTK* allows users to train and infer rotated bounding boxes in imagery.
85 +
86 +### Dataset
87 +Annotations need to conform to the COCO standard, with the addition of an angle (radians) in the bounding box (bbox) entry `[xmin, ymin, width, height, **theta**]`. `xmin`, `ymin`, `width` and `height` are in the axis aligned coordinates, ie floats, measured from the top left of the image. `theta` is in radians, measured anti-clockwise from the x-axis. We constrain theta between - \pi/4 and \pi/4.
88 +
89 +In order for the validation metrics to calculate, you also need to fill the `segmentation` entry with the coordinates of the corners of your bounding box.
90 +
91 +If using the `--rotated-bbox` flag for rotated detections, add an additional float `theta` to the annotations. To get validation scores you also need to fill the `segmentation` section.
92 +```
93 + "bbox" : [x, y, w, h, theta] # all floats, where theta is measured in radians anti-clockwise from the x-axis.
94 + "segmentation" : [[x1, y1], [x2, y2], [x3, y3], [x4, y4]]
95 + # Required for validation scores.
96 +```
97 +
98 +### Anchors
99 +
100 +As with all single shot detectors, the anchor boxes may need to be adjusted to suit your dataset. You may need to adjust the anchors in `odtk/model.py`
101 +
102 +The default anchors are:
103 +
104 +```python
105 +self.ratios = [0.5, 1.0, 2.0]
106 +self.scales = [4 * 2**(i/3) for i in range(3)]
107 +self.angles = [-np.pi/6, 0, np.pi/6]
108 +```
109 +
110 +### Training
111 +
112 +We recommend reducing your learning rate, for example using `--lr 0.0005`.
113 +
114 +An example training command for training remote sensing imagery. Note that `--augment-rotate` has been used to randomly rotated the imagery during training.
115 +```
116 +odtk train model.pth --images /data/train --annotations /data/train_rotated.json --backbone ResNet50FPN \
117 + --lr 0.00005 --fine-tune /data/saved_models/retinanet_rn50fpn.pth \
118 + --val-images /data/val --val-annotations /data/val_rotated.json --classes 1 \
119 + --jitter 688 848 --resize 768 \
120 + --augment-rotate --augment-brightness 0.01 --augment-contrast 0.01 --augment-hue 0.002 \
121 + --augment-saturation 0.01 --batch 16 --regularization-l2 0.0001 --val-iters 20000 --rotated-bbox
122 +```
123 +
1 +/*
2 + * Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
3 + *
4 + * Permission is hereby granted, free of charge, to any person obtaining a
5 + * copy of this software and associated documentation files (the "Software"),
6 + * to deal in the Software without restriction, including without limitation
7 + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
8 + * and/or sell copies of the Software, and to permit persons to whom the
9 + * Software is furnished to do so, subject to the following conditions:
10 + *
11 + * The above copyright notice and this permission notice shall be included in
12 + * all copies or substantial portions of the Software.
13 + *
14 + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
17 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
19 + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
20 + * DEALINGS IN THE SOFTWARE.
21 + */
22 +
23 +#pragma once
24 +
25 +#include <opencv2/opencv.hpp>
26 +#include <opencv2/opencv.hpp>
27 +#include <opencv2/core/core.hpp>
28 +#include <opencv2/highgui/highgui.hpp>
29 +#include <iterator>
30 +#include <vector>
31 +#include <assert.h>
32 +#include <algorithm>
33 +#include "NvInfer.h"
34 +
35 +using namespace std;
36 +using namespace cv;
37 +
38 +class ImageStream {
39 +public:
40 + ImageStream(int batchSize, Dims inputDims, const vector<string> calibrationImages)
41 + : _batchSize(batchSize)
42 + , _calibrationImages(calibrationImages)
43 + , _currentBatch(0)
44 + , _maxBatches(_calibrationImages.size() / _batchSize)
45 + , _inputDims(inputDims) {
46 + _batch.resize(_batchSize * _inputDims.d[1] * _inputDims.d[2] * _inputDims.d[3]);
47 + }
48 +
49 + int getBatchSize() const { return _batchSize;}
50 +
51 + int getMaxBatches() const { return _maxBatches;}
52 +
53 + float* getBatch() { return &_batch[0];}
54 +
55 + Dims getInputDims() { return _inputDims;}
56 +
57 + bool next() {
58 +
59 + if (_currentBatch == _maxBatches)
60 + return false;
61 +
62 + for (int i = 0; i < _batchSize; i++) {
63 + auto image = imread(_calibrationImages[_batchSize * _currentBatch + i].c_str(), IMREAD_COLOR);
64 + cv::resize(image, image, Size(_inputDims.d[3], _inputDims.d[2]));
65 + cv::Mat pixels;
66 + image.convertTo(pixels, CV_32FC3, 1.0 / 255, 0);
67 +
68 + vector<float> img;
69 +
70 + if (pixels.isContinuous())
71 + img.assign((float*)pixels.datastart, (float*)pixels.dataend);
72 + else
73 + return false;
74 +
75 + auto hw = _inputDims.d[2] * _inputDims.d[3];
76 + auto channels = _inputDims.d[1];
77 + auto vol = channels * hw;
78 +
79 + for (int c = 0; c < channels; c++) {
80 + for (int j = 0; j < hw; j++) {
81 + _batch[i * vol + c * hw + j] = (img[channels * j + 2 - c] - _mean[c]) / _std[c];
82 + }
83 + }
84 + }
85 +
86 + _currentBatch++;
87 + return true;
88 + }
89 +
90 + void reset() {
91 + _currentBatch = 0;
92 + }
93 +
94 +private:
95 + int _batchSize;
96 + vector<string> _calibrationImages;
97 + int _currentBatch;
98 + int _maxBatches;
99 + Dims _inputDims;
100 +
101 + vector<float> _mean {0.485, 0.456, 0.406};
102 + vector<float> _std {0.229, 0.224, 0.225};
103 + vector<float> _batch;
104 +
105 +};
106 +
107 +class Int8EntropyCalibrator: public IInt8EntropyCalibrator2 {
108 +public:
109 + Int8EntropyCalibrator(ImageStream& stream, const string networkName, const string calibrationCacheName, bool readCache = true)
110 + : _stream(stream)
111 + , _networkName(networkName)
112 + , _calibrationCacheName(calibrationCacheName)
113 + , _readCache(readCache) {
114 + Dims d = _stream.getInputDims();
115 + _inputCount = _stream.getBatchSize() * d.d[1] * d.d[2] * d.d[3];
116 + cudaMalloc(&_deviceInput, _inputCount * sizeof(float));
117 + }
118 +
119 + int getBatchSize() const override {return _stream.getBatchSize();}
120 +
121 + virtual ~Int8EntropyCalibrator() {cudaFree(_deviceInput);}
122 +
123 + bool getBatch(void* bindings[], const char* names[], int nbBindings) override {
124 +
125 + if (!_stream.next())
126 + return false;
127 +
128 + cudaMemcpy(_deviceInput, _stream.getBatch(), _inputCount * sizeof(float), cudaMemcpyHostToDevice);
129 + bindings[0] = _deviceInput;
130 + return true;
131 + }
132 +
133 + const void* readCalibrationCache(size_t& length) {
134 + _calibrationCache.clear();
135 + ifstream input(calibrationTableName(), ios::binary);
136 + input >> noskipws;
137 + if (_readCache && input.good())
138 + copy(istream_iterator<char>(input), istream_iterator<char>(), back_inserter(_calibrationCache));
139 +
140 + length = _calibrationCache.size();
141 + return length ? &_calibrationCache[0] : nullptr;
142 + }
143 +
144 + void writeCalibrationCache(const void* cache, size_t length) {
145 + std::ofstream output(calibrationTableName(), std::ios::binary);
146 + output.write(reinterpret_cast<const char*>(cache), length);
147 + }
148 +
149 +private:
150 + std::string calibrationTableName() {
151 + // Use calibration cache if provided
152 + if(_calibrationCacheName.length() > 0)
153 + return _calibrationCacheName;
154 +
155 + assert(_networkName.length() > 0);
156 + Dims d = _stream.getInputDims();
157 + return std::string("Int8CalibrationTable_") + _networkName + to_string(d.d[2]) + "x" + to_string(d.d[3]) + "_" + to_string(_stream.getMaxBatches());
158 + }
159 +
160 + ImageStream _stream;
161 + const string _networkName;
162 + const string _calibrationCacheName;
163 + bool _readCache {true};
164 + size_t _inputCount;
165 + void* _deviceInput {nullptr};
166 + vector<char> _calibrationCache;
167 +
168 +};
1 +/*
2 + * Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
3 + *
4 + * Permission is hereby granted, free of charge, to any person obtaining a
5 + * copy of this software and associated documentation files (the "Software"),
6 + * to deal in the Software without restriction, including without limitation
7 + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
8 + * and/or sell copies of the Software, and to permit persons to whom the
9 + * Software is furnished to do so, subject to the following conditions:
10 + *
11 + * The above copyright notice and this permission notice shall be included in
12 + * all copies or substantial portions of the Software.
13 + *
14 + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
17 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
19 + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
20 + * DEALINGS IN THE SOFTWARE.
21 + */
22 +
23 +#include "decode.h"
24 +#include "utils.h"
25 +
26 +#include <algorithm>
27 +#include <cstdint>
28 +
29 +#include <thrust/device_ptr.h>
30 +#include <thrust/sequence.h>
31 +#include <thrust/execution_policy.h>
32 +#include <thrust/gather.h>
33 +#include <thrust/tabulate.h>
34 +#include <thrust/count.h>
35 +#include <thrust/find.h>
36 +#include <cub/device/device_radix_sort.cuh>
37 +#include <cub/iterator/counting_input_iterator.cuh>
38 +
39 +#include <stdio.h>
40 +
41 +namespace odtk {
42 +namespace cuda {
43 +
44 +int decode(int batch_size,
45 + const void *const *inputs, void *const *outputs,
46 + size_t height, size_t width, size_t scale,
47 + size_t num_anchors, size_t num_classes,
48 + const std::vector<float> &anchors, float score_thresh, int top_n,
49 + void *workspace, size_t workspace_size, cudaStream_t stream) {
50 +
51 + int scores_size = num_anchors * num_classes * height * width;
52 +
53 + if (!workspace || !workspace_size) {
54 + // Return required scratch space size cub style
55 + workspace_size = get_size_aligned<float>(anchors.size()); // anchors
56 + workspace_size += get_size_aligned<bool>(scores_size); // flags
57 + workspace_size += get_size_aligned<int>(scores_size); // indices
58 + workspace_size += get_size_aligned<int>(scores_size); // indices_sorted
59 + workspace_size += get_size_aligned<float>(scores_size); // scores
60 + workspace_size += get_size_aligned<float>(scores_size); // scores_sorted
61 +
62 + size_t temp_size_flag = 0;
63 + cub::DeviceSelect::Flagged((void *)nullptr, temp_size_flag,
64 + cub::CountingInputIterator<int>(scores_size),
65 + (bool *)nullptr, (int *)nullptr, (int *)nullptr, scores_size);
66 + size_t temp_size_sort = 0;
67 + cub::DeviceRadixSort::SortPairsDescending((void *)nullptr, temp_size_sort,
68 + (float *)nullptr, (float *)nullptr, (int *)nullptr, (int *)nullptr, scores_size);
69 + workspace_size += std::max(temp_size_flag, temp_size_sort);
70 +
71 + return workspace_size;
72 + }
73 +
74 + auto anchors_d = get_next_ptr<float>(anchors.size(), workspace, workspace_size);
75 + cudaMemcpyAsync(anchors_d, anchors.data(), anchors.size() * sizeof *anchors_d, cudaMemcpyHostToDevice, stream);
76 +
77 + auto on_stream = thrust::cuda::par.on(stream);
78 +
79 + auto flags = get_next_ptr<bool>(scores_size, workspace, workspace_size);
80 + auto indices = get_next_ptr<int>(scores_size, workspace, workspace_size);
81 + auto indices_sorted = get_next_ptr<int>(scores_size, workspace, workspace_size);
82 + auto scores = get_next_ptr<float>(scores_size, workspace, workspace_size);
83 + auto scores_sorted = get_next_ptr<float>(scores_size, workspace, workspace_size);
84 +
85 +
86 + for (int batch = 0; batch < batch_size; batch++) {
87 +
88 + auto in_scores = static_cast<const float *>(inputs[0]) + batch * scores_size;
89 + auto in_boxes = static_cast<const float *>(inputs[1]) + batch * (scores_size / num_classes) * 4;
90 +
91 + auto out_scores = static_cast<float *>(outputs[0]) + batch * top_n;
92 + auto out_boxes = static_cast<float4 *>(outputs[1]) + batch * top_n;
93 + auto out_classes = static_cast<float *>(outputs[2]) + batch * top_n;
94 +
95 + // Discard scores below threshold
96 + thrust::transform(on_stream, in_scores, in_scores + scores_size,
97 + flags, thrust::placeholders::_1 > score_thresh);
98 +
99 + int *num_selected = reinterpret_cast<int *>(indices_sorted);
100 + cub::DeviceSelect::Flagged(workspace, workspace_size,
101 + cub::CountingInputIterator<int>(0),
102 + flags, indices, num_selected, scores_size, stream);
103 + cudaStreamSynchronize(stream);
104 + int num_detections = *thrust::device_pointer_cast(num_selected);
105 +
106 + // Only keep top n scores
107 + auto indices_filtered = indices;
108 + if (num_detections > top_n) {
109 + thrust::gather(on_stream, indices, indices + num_detections,
110 + in_scores, scores);
111 + cub::DeviceRadixSort::SortPairsDescending(workspace, workspace_size,
112 + scores, scores_sorted, indices, indices_sorted, num_detections, 0, sizeof(*scores)*8, stream);
113 + indices_filtered = indices_sorted;
114 + num_detections = top_n;
115 + }
116 +
117 + // Gather boxes
118 + bool has_anchors = !anchors.empty();
119 + thrust::transform(on_stream, indices_filtered, indices_filtered + num_detections,
120 + thrust::make_zip_iterator(thrust::make_tuple(out_scores, out_boxes, out_classes)),
121 + [=] __device__ (int i) {
122 + int x = i % width;
123 + int y = (i / width) % height;
124 + int a = (i / num_classes / height / width) % num_anchors;
125 + int cls = (i / height / width) % num_classes;
126 + float4 box = float4{
127 + in_boxes[((a * 4 + 0) * height + y) * width + x],
128 + in_boxes[((a * 4 + 1) * height + y) * width + x],
129 + in_boxes[((a * 4 + 2) * height + y) * width + x],
130 + in_boxes[((a * 4 + 3) * height + y) * width + x]
131 + };
132 +
133 + if (has_anchors) {
134 + // Add anchors offsets to deltas
135 + float x = (i % width) * scale;
136 + float y = ((i / width) % height) * scale;
137 + float *d = anchors_d + 4*a;
138 +
139 + float x1 = x + d[0];
140 + float y1 = y + d[1];
141 + float x2 = x + d[2];
142 + float y2 = y + d[3];
143 + float w = x2 - x1 + 1.0f;
144 + float h = y2 - y1 + 1.0f;
145 + float pred_ctr_x = box.x * w + x1 + 0.5f * w;
146 + float pred_ctr_y = box.y * h + y1 + 0.5f * h;
147 + float pred_w = exp(box.z) * w;
148 + float pred_h = exp(box.w) * h;
149 +
150 + box = float4{
151 + max(0.0f, pred_ctr_x - 0.5f * pred_w),
152 + max(0.0f, pred_ctr_y - 0.5f * pred_h),
153 + min(pred_ctr_x + 0.5f * pred_w - 1.0f, width * scale - 1.0f),
154 + min(pred_ctr_y + 0.5f * pred_h - 1.0f, height * scale - 1.0f)
155 + };
156 + }
157 +
158 + return thrust::make_tuple(in_scores[i], box, cls);
159 + });
160 +
161 + // Zero-out unused scores
162 + if (num_detections < top_n) {
163 + thrust::fill(on_stream, out_scores + num_detections,
164 + out_scores + top_n, 0.0f);
165 + thrust::fill(on_stream, out_classes + num_detections,
166 + out_classes + top_n, 0.0f);
167 + }
168 + }
169 +
170 + return 0;
171 +}
172 +
173 +}
174 +}
1 +/*
2 + * Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
3 + *
4 + * Permission is hereby granted, free of charge, to any person obtaining a
5 + * copy of this software and associated documentation files (the "Software"),
6 + * to deal in the Software without restriction, including without limitation
7 + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
8 + * and/or sell copies of the Software, and to permit persons to whom the
9 + * Software is furnished to do so, subject to the following conditions:
10 + *
11 + * The above copyright notice and this permission notice shall be included in
12 + * all copies or substantial portions of the Software.
13 + *
14 + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
17 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
19 + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
20 + * DEALINGS IN THE SOFTWARE.
21 + */
22 +
23 +#pragma once
24 +
25 +#include <vector>
26 +
27 +namespace odtk {
28 +namespace cuda {
29 +
30 +int decode(int batch_size,
31 + const void *const *inputs, void *const *outputs,
32 + size_t height, size_t width, size_t scale,
33 + size_t num_anchors, size_t num_classes,
34 + const std::vector<float> &anchors, float score_thresh, int top_n,
35 + void *workspace, size_t workspace_size, cudaStream_t stream);
36 +
37 +
38 +}
39 +}
...\ No newline at end of file ...\ No newline at end of file
1 +/*
2 + * Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
3 + *
4 + * Permission is hereby granted, free of charge, to any person obtaining a
5 + * copy of this software and associated documentation files (the "Software"),
6 + * to deal in the Software without restriction, including without limitation
7 + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
8 + * and/or sell copies of the Software, and to permit persons to whom the
9 + * Software is furnished to do so, subject to the following conditions:
10 + *
11 + * The above copyright notice and this permission notice shall be included in
12 + * all copies or substantial portions of the Software.
13 + *
14 + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
17 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
19 + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
20 + * DEALINGS IN THE SOFTWARE.
21 + */
22 +
23 +#include "decode_rotate.h"
24 +#include "utils.h"
25 +
26 +#include <algorithm>
27 +#include <cstdint>
28 +
29 +#include <thrust/device_ptr.h>
30 +#include <thrust/sequence.h>
31 +#include <thrust/execution_policy.h>
32 +#include <thrust/gather.h>
33 +#include <thrust/tabulate.h>
34 +#include <thrust/count.h>
35 +#include <thrust/find.h>
36 +#include <cub/device/device_radix_sort.cuh>
37 +#include <cub/iterator/counting_input_iterator.cuh>
38 +
39 +namespace odtk {
40 +namespace cuda {
41 +
42 +int decode_rotate(int batch_size,
43 + const void *const *inputs, void *const *outputs,
44 + size_t height, size_t width, size_t scale,
45 + size_t num_anchors, size_t num_classes,
46 + const std::vector<float> &anchors, float score_thresh, int top_n,
47 + void *workspace, size_t workspace_size, cudaStream_t stream) {
48 +
49 + int scores_size = num_anchors * num_classes * height * width;
50 +
51 + if (!workspace || !workspace_size) {
52 + // Return required scratch space size cub style
53 + workspace_size = get_size_aligned<float>(anchors.size()); // anchors
54 + workspace_size += get_size_aligned<bool>(scores_size); // flags
55 + workspace_size += get_size_aligned<int>(scores_size); // indices
56 + workspace_size += get_size_aligned<int>(scores_size); // indices_sorted
57 + workspace_size += get_size_aligned<float>(scores_size); // scores
58 + workspace_size += get_size_aligned<float>(scores_size); // scores_sorted
59 +
60 + size_t temp_size_flag = 0;
61 + cub::DeviceSelect::Flagged((void *)nullptr, temp_size_flag,
62 + cub::CountingInputIterator<int>(scores_size),
63 + (bool *)nullptr, (int *)nullptr, (int *)nullptr, scores_size);
64 + size_t temp_size_sort = 0;
65 + cub::DeviceRadixSort::SortPairsDescending((void *)nullptr, temp_size_sort,
66 + (float *)nullptr, (float *)nullptr, (int *)nullptr, (int *)nullptr, scores_size);
67 + workspace_size += std::max(temp_size_flag, temp_size_sort);
68 +
69 + return workspace_size;
70 + }
71 +
72 + auto anchors_d = get_next_ptr<float>(anchors.size(), workspace, workspace_size);
73 + cudaMemcpyAsync(anchors_d, anchors.data(), anchors.size() * sizeof *anchors_d, cudaMemcpyHostToDevice, stream);
74 +
75 + auto on_stream = thrust::cuda::par.on(stream);
76 +
77 + auto flags = get_next_ptr<bool>(scores_size, workspace, workspace_size);
78 + auto indices = get_next_ptr<int>(scores_size, workspace, workspace_size);
79 + auto indices_sorted = get_next_ptr<int>(scores_size, workspace, workspace_size);
80 + auto scores = get_next_ptr<float>(scores_size, workspace, workspace_size);
81 + auto scores_sorted = get_next_ptr<float>(scores_size, workspace, workspace_size);
82 +
83 + for (int batch = 0; batch < batch_size; batch++) {
84 + auto in_scores = static_cast<const float *>(inputs[0]) + batch * scores_size;
85 + auto in_boxes = static_cast<const float *>(inputs[1]) + batch * (scores_size / num_classes) * 6; //From 4
86 +
87 + auto out_scores = static_cast<float *>(outputs[0]) + batch * top_n;
88 + auto out_boxes = static_cast<float6 *>(outputs[1]) + batch * top_n; // From float4
89 + auto out_classes = static_cast<float *>(outputs[2]) + batch * top_n;
90 +
91 + // Discard scores below threshold
92 + thrust::transform(on_stream, in_scores, in_scores + scores_size,
93 + flags, thrust::placeholders::_1 > score_thresh);
94 +
95 + int *num_selected = reinterpret_cast<int *>(indices_sorted);
96 + cub::DeviceSelect::Flagged(workspace, workspace_size, cub::CountingInputIterator<int>(0),
97 + flags, indices, num_selected, scores_size, stream);
98 + cudaStreamSynchronize(stream);
99 + int num_detections = *thrust::device_pointer_cast(num_selected);
100 +
101 + // Only keep top n scores
102 + auto indices_filtered = indices;
103 + if (num_detections > top_n) {
104 + thrust::gather(on_stream, indices, indices + num_detections,
105 + in_scores, scores);
106 + cub::DeviceRadixSort::SortPairsDescending(workspace, workspace_size,
107 + scores, scores_sorted, indices, indices_sorted, num_detections, 0, sizeof(*scores)*8, stream);
108 + indices_filtered = indices_sorted;
109 + num_detections = top_n;
110 + }
111 +
112 + // Gather boxes
113 + bool has_anchors = !anchors.empty();
114 + thrust::transform(on_stream, indices_filtered, indices_filtered + num_detections,
115 + thrust::make_zip_iterator(thrust::make_tuple(out_scores, out_boxes, out_classes)),
116 + [=] __device__ (int i) {
117 + int x = i % width;
118 + int y = (i / width) % height;
119 + int a = (i / num_classes / height / width) % num_anchors;
120 + int cls = (i / height / width) % num_classes;
121 +
122 + float6 box = make_float6(
123 + make_float4(
124 + in_boxes[((a * 6 + 0) * height + y) * width + x],
125 + in_boxes[((a * 6 + 1) * height + y) * width + x],
126 + in_boxes[((a * 6 + 2) * height + y) * width + x],
127 + in_boxes[((a * 6 + 3) * height + y) * width + x]
128 + ),
129 + make_float2(
130 + in_boxes[((a * 6 + 4) * height + y) * width + x],
131 + in_boxes[((a * 6 + 5) * height + y) * width + x]
132 + )
133 + );
134 +
135 + if (has_anchors) {
136 + // Add anchors offsets to deltas
137 + float x = (i % width) * scale;
138 + float y = ((i / width) % height) * scale;
139 + float *d = anchors_d + 4*a;
140 +
141 + float x1 = x + d[0];
142 + float y1 = y + d[1];
143 + float x2 = x + d[2];
144 + float y2 = y + d[3];
145 +
146 + float w = x2 - x1 + 1.0f;
147 + float h = y2 - y1 + 1.0f;
148 + float pred_ctr_x = box.x1 * w + x1 + 0.5f * w;
149 + float pred_ctr_y = box.y1 * h + y1 + 0.5f * h;
150 + float pred_w = exp(box.x2) * w;
151 + float pred_h = exp(box.y2) * h;
152 + float pred_sin = box.s;
153 + float pred_cos = box.c;
154 +
155 + box = make_float6(
156 + make_float4(
157 + max(0.0f, pred_ctr_x - 0.5f * pred_w),
158 + max(0.0f, pred_ctr_y - 0.5f * pred_h),
159 + min(pred_ctr_x + 0.5f * pred_w - 1.0f, width * scale - 1.0f),
160 + min(pred_ctr_y + 0.5f * pred_h - 1.0f, height * scale - 1.0f)
161 + ),
162 + make_float2(pred_sin, pred_cos)
163 + );
164 + }
165 +
166 + return thrust::make_tuple(in_scores[i], box, cls);
167 + });
168 +
169 + // Zero-out unused scores
170 + if (num_detections < top_n) {
171 + thrust::fill(on_stream, out_scores + num_detections,
172 + out_scores + top_n, 0.0f);
173 + thrust::fill(on_stream, out_classes + num_detections,
174 + out_classes + top_n, 0.0f);
175 + }
176 + }
177 +
178 + return 0;
179 +}
180 +
181 +}
182 +}
1 +/*
2 + * Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
3 + *
4 + * Permission is hereby granted, free of charge, to any person obtaining a
5 + * copy of this software and associated documentation files (the "Software"),
6 + * to deal in the Software without restriction, including without limitation
7 + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
8 + * and/or sell copies of the Software, and to permit persons to whom the
9 + * Software is furnished to do so, subject to the following conditions:
10 + *
11 + * The above copyright notice and this permission notice shall be included in
12 + * all copies or substantial portions of the Software.
13 + *
14 + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
17 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
19 + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
20 + * DEALINGS IN THE SOFTWARE.
21 + */
22 +
23 +#pragma once
24 +
25 +#include <vector>
26 +
27 +namespace odtk {
28 +namespace cuda {
29 +
30 +int decode_rotate(int batchSize,
31 + const void *const *inputs, void *const *outputs,
32 + size_t height, size_t width, size_t scale,
33 + size_t num_anchors, size_t num_classes,
34 + const std::vector<float> &anchors, float score_thresh, int top_n,
35 + void *workspace, size_t workspace_size, cudaStream_t stream);
36 +
37 +}
38 +}
...\ No newline at end of file ...\ No newline at end of file
1 +/*
2 + * Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
3 + *
4 + * Permission is hereby granted, free of charge, to any person obtaining a
5 + * copy of this software and associated documentation files (the "Software"),
6 + * to deal in the Software without restriction, including without limitation
7 + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
8 + * and/or sell copies of the Software, and to permit persons to whom the
9 + * Software is furnished to do so, subject to the following conditions:
10 + *
11 + * The above copyright notice and this permission notice shall be included in
12 + * all copies or substantial portions of the Software.
13 + *
14 + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
17 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
19 + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
20 + * DEALINGS IN THE SOFTWARE.
21 + */
22 +
23 +#include "nms.h"
24 +#include "utils.h"
25 +
26 +#include <algorithm>
27 +#include <iostream>
28 +#include <stdexcept>
29 +#include <cstdint>
30 +#include <vector>
31 +#include <cmath>
32 +
33 +#include <cuda.h>
34 +#include <thrust/device_ptr.h>
35 +#include <thrust/sequence.h>
36 +#include <thrust/execution_policy.h>
37 +#include <thrust/gather.h>
38 +#include <cub/device/device_radix_sort.cuh>
39 +#include <cub/iterator/counting_input_iterator.cuh>
40 +
41 +namespace odtk {
42 +namespace cuda {
43 +
44 +__global__ void nms_kernel(
45 + const int num_per_thread, const float threshold, const int num_detections,
46 + const int *indices, float *scores, const float *classes, const float4 *boxes) {
47 +
48 + // Go through detections by descending score
49 + for (int m = 0; m < num_detections; m++) {
50 + for (int n = 0; n < num_per_thread; n++) {
51 + int i = threadIdx.x * num_per_thread + n;
52 + if (i < num_detections && m < i && scores[m] > 0.0f) {
53 + int idx = indices[i];
54 + int max_idx = indices[m];
55 + int icls = classes[idx];
56 + int mcls = classes[max_idx];
57 + if (mcls == icls) {
58 + float4 ibox = boxes[idx];
59 + float4 mbox = boxes[max_idx];
60 + float x1 = max(ibox.x, mbox.x);
61 + float y1 = max(ibox.y, mbox.y);
62 + float x2 = min(ibox.z, mbox.z);
63 + float y2 = min(ibox.w, mbox.w);
64 + float w = max(0.0f, x2 - x1 + 1);
65 + float h = max(0.0f, y2 - y1 + 1);
66 + float iarea = (ibox.z - ibox.x + 1) * (ibox.w - ibox.y + 1);
67 + float marea = (mbox.z - mbox.x + 1) * (mbox.w - mbox.y + 1);
68 + float inter = w * h;
69 + float overlap = inter / (iarea + marea - inter);
70 + if (overlap > threshold) {
71 + scores[i] = 0.0f;
72 + }
73 + }
74 + }
75 + }
76 +
77 + // Sync discarded detections
78 + __syncthreads();
79 + }
80 +}
81 +
82 +int nms(int batch_size,
83 + const void *const *inputs, void *const *outputs,
84 + size_t count, int detections_per_im, float nms_thresh,
85 + void *workspace, size_t workspace_size, cudaStream_t stream) {
86 +
87 + if (!workspace || !workspace_size) {
88 + // Return required scratch space size cub style
89 + workspace_size = get_size_aligned<bool>(count); // flags
90 + workspace_size += get_size_aligned<int>(count); // indices
91 + workspace_size += get_size_aligned<int>(count); // indices_sorted
92 + workspace_size += get_size_aligned<float>(count); // scores
93 + workspace_size += get_size_aligned<float>(count); // scores_sorted
94 +
95 + size_t temp_size_flag = 0;
96 + cub::DeviceSelect::Flagged((void *)nullptr, temp_size_flag,
97 + cub::CountingInputIterator<int>(count),
98 + (bool *)nullptr, (int *)nullptr, (int *)nullptr, count);
99 + size_t temp_size_sort = 0;
100 + cub::DeviceRadixSort::SortPairsDescending((void *)nullptr, temp_size_sort,
101 + (float *)nullptr, (float *)nullptr, (int *)nullptr, (int *)nullptr, count);
102 + workspace_size += std::max(temp_size_flag, temp_size_sort);
103 +
104 + return workspace_size;
105 + }
106 +
107 + auto on_stream = thrust::cuda::par.on(stream);
108 +
109 + auto flags = get_next_ptr<bool>(count, workspace, workspace_size);
110 + auto indices = get_next_ptr<int>(count, workspace, workspace_size);
111 + auto indices_sorted = get_next_ptr<int>(count, workspace, workspace_size);
112 + auto scores = get_next_ptr<float>(count, workspace, workspace_size);
113 + auto scores_sorted = get_next_ptr<float>(count, workspace, workspace_size);
114 +
115 + for (int batch = 0; batch < batch_size; batch++) {
116 + auto in_scores = static_cast<const float *>(inputs[0]) + batch * count;
117 + auto in_boxes = static_cast<const float4 *>(inputs[1]) + batch * count;
118 + auto in_classes = static_cast<const float *>(inputs[2]) + batch * count;
119 +
120 + auto out_scores = static_cast<float *>(outputs[0]) + batch * detections_per_im;
121 + auto out_boxes = static_cast<float4 *>(outputs[1]) + batch * detections_per_im;
122 + auto out_classes = static_cast<float *>(outputs[2]) + batch * detections_per_im;
123 +
124 + // Discard null scores
125 + thrust::transform(on_stream, in_scores, in_scores + count,
126 + flags, thrust::placeholders::_1 > 0.0f);
127 +
128 + int *num_selected = reinterpret_cast<int *>(indices_sorted);
129 + cub::DeviceSelect::Flagged(workspace, workspace_size, cub::CountingInputIterator<int>(0),
130 + flags, indices, num_selected, count, stream);
131 + cudaStreamSynchronize(stream);
132 + int num_detections = *thrust::device_pointer_cast(num_selected);
133 +
134 + // Sort scores and corresponding indices
135 + thrust::gather(on_stream, indices, indices + num_detections, in_scores, scores);
136 + cub::DeviceRadixSort::SortPairsDescending(workspace, workspace_size,
137 + scores, scores_sorted, indices, indices_sorted, num_detections, 0, sizeof(*scores)*8, stream);
138 +
139 + // Launch actual NMS kernel - 1 block with each thread handling n detections
140 + const int max_threads = 1024;
141 + int num_per_thread = ceil((float)num_detections / max_threads);
142 + nms_kernel<<<1, max_threads, 0, stream>>>(num_per_thread, nms_thresh, num_detections,
143 + indices_sorted, scores_sorted, in_classes, in_boxes);
144 +
145 + // Re-sort with updated scores
146 + cub::DeviceRadixSort::SortPairsDescending(workspace, workspace_size,
147 + scores_sorted, scores, indices_sorted, indices, num_detections, 0, sizeof(*scores)*8, stream);
148 +
149 + // Gather filtered scores, boxes, classes
150 + num_detections = min(detections_per_im, num_detections);
151 + cudaMemcpyAsync(out_scores, scores, num_detections * sizeof *scores, cudaMemcpyDeviceToDevice, stream);
152 + if (num_detections < detections_per_im) {
153 + thrust::fill_n(on_stream, out_scores + num_detections, detections_per_im - num_detections, 0);
154 + }
155 + thrust::gather(on_stream, indices, indices + num_detections, in_boxes, out_boxes);
156 + thrust::gather(on_stream, indices, indices + num_detections, in_classes, out_classes);
157 + }
158 +
159 + return 0;
160 +}
161 +
162 +}
163 +}
1 +/*
2 + * Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
3 + *
4 + * Permission is hereby granted, free of charge, to any person obtaining a
5 + * copy of this software and associated documentation files (the "Software"),
6 + * to deal in the Software without restriction, including without limitation
7 + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
8 + * and/or sell copies of the Software, and to permit persons to whom the
9 + * Software is furnished to do so, subject to the following conditions:
10 + *
11 + * The above copyright notice and this permission notice shall be included in
12 + * all copies or substantial portions of the Software.
13 + *
14 + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
17 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
19 + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
20 + * DEALINGS IN THE SOFTWARE.
21 + */
22 +
23 +#pragma once
24 +
25 +namespace odtk {
26 +namespace cuda {
27 +
28 +int nms(int batchSize,
29 + const void *const *inputs, void *const *outputs,
30 + size_t count, int detections_per_im, float nms_thresh,
31 + void *workspace, size_t workspace_size, cudaStream_t stream);
32 +
33 +}
34 +}
...\ No newline at end of file ...\ No newline at end of file
1 +/*
2 + * Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
3 + *
4 + * Permission is hereby granted, free of charge, to any person obtaining a
5 + * copy of this software and associated documentation files (the "Software"),
6 + * to deal in the Software without restriction, including without limitation
7 + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
8 + * and/or sell copies of the Software, and to permit persons to whom the
9 + * Software is furnished to do so, subject to the following conditions:
10 + *
11 + * The above copyright notice and this permission notice shall be included in
12 + * all copies or substantial portions of the Software.
13 + *
14 + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
17 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
19 + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
20 + * DEALINGS IN THE SOFTWARE.
21 + */
22 +
23 +#include "nms_iou.h"
24 +#include "utils.h"
25 +
26 +#include <algorithm>
27 +#include <cmath>
28 +#include <cstdint>
29 +#include <iostream>
30 +#include <stdexcept>
31 +#include <vector>
32 +#include <cuda.h>
33 +
34 +#include <thrust/device_ptr.h>
35 +#include <thrust/execution_policy.h>
36 +#include <thrust/gather.h>
37 +#include <thrust/sequence.h>
38 +#include <cub/device/device_radix_sort.cuh>
39 +#include <cub/iterator/counting_input_iterator.cuh>
40 +
41 +constexpr int kTPB = 64; // threads per block
42 +constexpr int kCorners = 4;
43 +constexpr int kPoints = 8;
44 +
45 +namespace odtk {
46 +namespace cuda {
47 +
48 +class Vector {
49 +public:
50 + __host__ __device__ Vector( ); // Default constructor
51 + __host__ __device__ ~Vector( ); // Deconstructor
52 + __host__ __device__ Vector( float2 const point );
53 + float2 const p;
54 + friend class Line;
55 +
56 +private:
57 + __host__ __device__ float cross( Vector const v ) const;
58 +};
59 +
60 +Vector::Vector( ) : p( make_float2( 0.0f, 0.0f ) ) {}
61 +
62 +Vector::~Vector( ) {}
63 +
64 +Vector::Vector( float2 const point ) : p( point ) {}
65 +
66 +float Vector::cross( Vector const v ) const {
67 + return ( p.x * v.p.y - p.y * v.p.x );
68 +}
69 +
70 +class Line {
71 +public:
72 + __host__ __device__ Line( ); // Default constructor
73 + __host__ __device__ ~Line( ); // Deconstructor
74 + __host__ __device__ Line( Vector const v1, Vector const v2 );
75 + __host__ __device__ float call( Vector const v ) const;
76 + __host__ __device__ float2 intersection( Line const l ) const;
77 +
78 +private:
79 + float const a;
80 + float const b;
81 + float const c;
82 +};
83 +
84 +Line::Line( ) : a( 0.0f ), b( 0.0f ), c( 0.0f ) {}
85 +
86 +Line::~Line( ) {}
87 +
88 +Line::Line( Vector const v1, Vector const v2 ) : a( v2.p.y - v1.p.y ), b( v1.p.x - v2.p.x ), c( v2.cross( v1 ) ) {}
89 +
90 +float Line::call( Vector const v ) const {
91 + return ( a * v.p.x + b * v.p.y + c );
92 +}
93 +
94 +float2 Line::intersection( Line const l ) const {
95 + float w { a * l.b - b * l.a };
96 + return ( make_float2( ( b * l.c - c * l.b ) / w, ( c * l.a - a * l.c ) / w ) );
97 +}
98 +
99 +template<typename T>
100 +__host__ __device__ void rotateLeft( T *array, int const &count ) {
101 + T temp = array[0];
102 + for ( int i = 0; i < count - 1; i++ )
103 + array[i] = array[i + 1];
104 + array[count - 1] = temp;
105 +}
106 +
107 +__host__ __device__ static __inline__ float2 padfloat2( float2 a, float2 b ) {
108 + float2 res;
109 + res.x = a.x + b.x;
110 + res.y = a.y + b.y;
111 + return res;
112 +}
113 +
114 +__device__ float IntersectionArea( float2 *mrect, float2 *mrect_shift, float2 *intersection ) {
115 + int count = kCorners;
116 + for ( int i = 0; i < kCorners; i++ ) {
117 + float2 intersection_shift[kPoints] {};
118 + for ( int k = 0; k < count; k++ )
119 + intersection_shift[k] = intersection[k];
120 + float line_values[kPoints] {};
121 + Vector const r1( mrect[i] );
122 + Vector const r2( mrect_shift[i] );
123 + Line const line1( r1, r2 );
124 + for ( int j = 0; j < count; j++ ) {
125 + Vector const inter( intersection[j] );
126 + line_values[j] = line1.call( inter );
127 + }
128 + float line_values_shift[kPoints] {};
129 +
130 +#pragma unroll
131 + for ( int k = 0; k < kPoints; k++ )
132 + line_values_shift[k] = line_values[k];
133 + rotateLeft( line_values_shift, count );
134 + rotateLeft( intersection_shift, count );
135 + float2 new_intersection[kPoints] {};
136 + int temp = count;
137 + count = 0;
138 + for ( int j = 0; j < temp; j++ ) {
139 + if ( line_values[j] <= 0 ) {
140 + new_intersection[count] = intersection[j];
141 + count++;
142 + }
143 + if ( ( line_values[j] * line_values_shift[j] ) <= 0 ) {
144 + Vector const r3( intersection[j] );
145 + Vector const r4( intersection_shift[j] );
146 + Line const Line( r3, r4 );
147 + new_intersection[count] = line1.intersection( Line );
148 + count++;
149 + }
150 + }
151 + for ( int k = 0; k < count; k++ )
152 + intersection[k] = new_intersection[k];
153 + }
154 +
155 + float2 intersection_shift[kPoints] {};
156 +
157 + for ( int k = 0; k < count; k++ )
158 + intersection_shift[k] = intersection[k];
159 + rotateLeft( intersection_shift, count );
160 +
161 + // Intersection
162 + float intersection_area = 0.0f;
163 + if ( count > 2 ) {
164 + for ( int k = 0; k < count; k++ )
165 + intersection_area +=
166 + intersection[k].x * intersection_shift[k].y - intersection[k].y * intersection_shift[k].x;
167 + }
168 + return ( abs( intersection_area / 2.0f ) );
169 +}
170 +
171 +__global__ void nms_rotate_kernel(const int num_per_thread, const float threshold, const int num_detections,
172 + const int *indices, float *scores, const float *classes, const float6 *boxes ) {
173 + // Go through detections by descending score
174 + for ( int m = 0; m < num_detections; m++ ) {
175 + for ( int n = 0; n < num_per_thread; n++ ) {
176 + int ii = threadIdx.x * num_per_thread + n;
177 + if ( ii < num_detections && m < ii && scores[m] > 0.0f ) {
178 + int idx = indices[ii];
179 + int max_idx = indices[m];
180 + int icls = classes[idx];
181 + int mcls = classes[max_idx];
182 + if ( mcls == icls ) {
183 + float6 ibox = make_float6( make_float4( boxes[idx].x1,
184 + boxes[idx].y1,
185 + boxes[idx].x2,
186 + boxes[idx].y2 ),
187 + make_float2( boxes[idx].s, boxes[idx].c ) );
188 + float6 mbox = make_float6( make_float4( boxes[max_idx].x1,
189 + boxes[max_idx].y1,
190 + boxes[max_idx].x2,
191 + boxes[max_idx].y2 ),
192 + make_float2( boxes[idx].s, boxes[idx].c ) );
193 + float2 intersection[kPoints] { -1.0f, -1.0f, -1.0f, -1.0f, -1.0f, -1.0f, -1.0f, -1.0f,
194 + -1.0f, -1.0f, -1.0f, -1.0f, -1.0f, -1.0f, -1.0f, -1.0f };
195 + float2 irect[kPoints] {};
196 + float2 irect_shift[kPoints] {};
197 + float2 mrect[kPoints] {};
198 + float2 mrect_shift[kPoints] {};
199 + float2 icent = { ( ibox.x1 + ibox.x2 ) / 2.0f, ( ibox.y1 + ibox.y2 ) / 2.0f };
200 + float2 mcent = { ( mbox.x1 + mbox.x2 ) / 2.0f, ( mbox.y1 + mbox.y2 ) / 2.0f };
201 + float2 iboxc[kCorners] = { ibox.x1 - icent.x, ibox.y1 - icent.y, ibox.x2 - icent.x,
202 + ibox.y1 - icent.y, ibox.x2 - icent.x, ibox.y2 - icent.y,
203 + ibox.x1 - icent.x, ibox.y2 - icent.y };
204 + float2 mboxc[kCorners] = { mbox.x1 - mcent.x, mbox.y1 - mcent.y, mbox.x2 - mcent.x,
205 + mbox.y1 - mcent.y, mbox.x2 - mcent.x, mbox.y2 - mcent.y,
206 + mbox.x1 - mcent.x, mbox.y2 - mcent.y };
207 + float2 pad;
208 +#pragma unroll
209 + for ( int b = 0; b < kCorners; b++ ) {
210 + if ((iboxc[b].x * ibox.c - iboxc[b].y * ibox.s) + icent.x == (mboxc[b].x * mbox.c - mboxc[b].y * mbox.s) + mcent.x)
211 + pad.x = 0.001f;
212 + else
213 + pad.x = 0.0f;
214 + if ((iboxc[b].y * ibox.c + iboxc[b].x * ibox.s) + icent.y == (mboxc[b].y * mbox.c + mboxc[b].x * mbox.s) + mcent.y)
215 + pad.y = 0.001f;
216 + else
217 + pad.y = 0.0f;
218 + intersection[b] = { ( iboxc[b].x * ibox.c - iboxc[b].y * ibox.s ) + icent.x + pad.x,
219 + ( iboxc[b].y * ibox.c + iboxc[b].x * ibox.s ) + icent.y + pad.y};
220 + irect[b] = { ( iboxc[b].x * ibox.c - iboxc[b].y * ibox.s ) + icent.x,
221 + ( iboxc[b].y * ibox.c + iboxc[b].x * ibox.s ) + icent.y };
222 + irect_shift[b] = { ( iboxc[b].x * ibox.c - iboxc[b].y * ibox.s ) + icent.x,
223 + ( iboxc[b].y * ibox.c + iboxc[b].x * ibox.s ) + icent.y };
224 + mrect[b] = { ( mboxc[b].x * mbox.c - mboxc[b].y * mbox.s ) + mcent.x,
225 + ( mboxc[b].y * mbox.c + mboxc[b].x * mbox.s ) + mcent.y };
226 + mrect_shift[b] = { ( mboxc[b].x * mbox.c - mboxc[b].y * mbox.s ) + mcent.x,
227 + ( mboxc[b].y * mbox.c + mboxc[b].x * mbox.s ) + mcent.y };
228 + }
229 + rotateLeft( irect_shift, 4 );
230 + rotateLeft( mrect_shift, 4 );
231 + float intersection_area = IntersectionArea( mrect, mrect_shift, intersection );
232 + // Union
233 + float irect_area = 0.0f;
234 + float mrect_area = 0.0f;
235 +#pragma unroll
236 + for ( int k = 0; k < kCorners; k++ ) {
237 + irect_area += irect[k].x * irect_shift[k].y - irect[k].y * irect_shift[k].x;
238 + mrect_area += mrect[k].x * mrect_shift[k].y - mrect[k].y * mrect_shift[k].x;
239 + }
240 + float union_area = ( abs( irect_area ) + abs( mrect_area ) ) / 2.0f;
241 + float overlap;
242 + if ( isnan( intersection_area ) && isnan( union_area ) ) {
243 + overlap = 1.0f;
244 + } else if ( isnan( intersection_area ) ) {
245 + overlap = 0.0f;
246 + } else {
247 + overlap = intersection_area / ( union_area - intersection_area ); // Check nans and inf
248 + }
249 + if ( overlap > threshold ) {
250 + scores[ii] = 0.0f;
251 + }
252 + }
253 + }
254 + }
255 + // Sync discarded detections
256 + __syncthreads( );
257 + }
258 +}
259 +
260 +int nms_rotate(int batch_size, const void *const *inputs, void *const *outputs, size_t count,
261 + int detections_per_im, float nms_thresh, void *workspace, size_t workspace_size, cudaStream_t stream ) {
262 +
263 + if ( !workspace || !workspace_size ) {
264 + // Return required scratch space size cub style
265 + workspace_size = get_size_aligned<bool>( count ); // flags
266 + workspace_size += get_size_aligned<int>( count ); // indices
267 + workspace_size += get_size_aligned<int>( count ); // indices_sorted
268 + workspace_size += get_size_aligned<float>( count ); // scores
269 + workspace_size += get_size_aligned<float>( count ); // scores_sorted
270 + size_t temp_size_flag = 0;
271 + cub::DeviceSelect::Flagged((void*)nullptr, temp_size_flag,
272 + cub::CountingInputIterator<int>(count), (bool*)nullptr, (int*)nullptr, (int*)nullptr, count);
273 + size_t temp_size_sort = 0;
274 + cub::DeviceRadixSort::SortPairsDescending((void*)nullptr, temp_size_sort, (float*)nullptr,
275 + (float*)nullptr, (int*)nullptr, (int*)nullptr, count);
276 + workspace_size += std::max( temp_size_flag, temp_size_sort );
277 + return workspace_size;
278 + }
279 +
280 + auto on_stream = thrust::cuda::par.on( stream );
281 + auto flags = get_next_ptr<bool>( count, workspace, workspace_size );
282 + auto indices = get_next_ptr<int>( count, workspace, workspace_size );
283 + auto indices_sorted = get_next_ptr<int>( count, workspace, workspace_size );
284 + auto scores = get_next_ptr<float>( count, workspace, workspace_size );
285 + auto scores_sorted = get_next_ptr<float>( count, workspace, workspace_size );
286 + for ( int batch = 0; batch < batch_size; batch++ ) {
287 + auto in_scores = static_cast<const float *>( inputs[0] ) + batch * count;
288 + auto in_boxes = static_cast<const float6 *>( inputs[1] ) + batch * count;
289 + auto in_classes = static_cast<const float *>( inputs[2] ) + batch * count;
290 + auto out_scores = static_cast<float *>( outputs[0] ) + batch * detections_per_im;
291 + auto out_boxes = static_cast<float6 *>( outputs[1] ) + batch * detections_per_im;
292 + auto out_classes = static_cast<float *>( outputs[2] ) + batch * detections_per_im;
293 + // Discard null scores
294 + thrust::transform( on_stream, in_scores, in_scores + count, flags, thrust::placeholders::_1 > 0.0f );
295 + int *num_selected = reinterpret_cast<int *>( indices_sorted );
296 + cub::DeviceSelect::Flagged(workspace, workspace_size, cub::CountingInputIterator<int>(0), flags,
297 + indices, num_selected, count, stream );
298 + cudaStreamSynchronize( stream );
299 + int num_detections = *thrust::device_pointer_cast( num_selected );
300 + // Sort scores and corresponding indices
301 + thrust::gather( on_stream, indices, indices + num_detections, in_scores, scores );
302 + cub::DeviceRadixSort::SortPairsDescending(workspace, workspace_size, scores, scores_sorted,
303 + indices, indices_sorted, num_detections, 0, sizeof(*scores)*8, stream ); // From 8
304 + // Launch actual NMS kernel - 1 block with each thread handling n detections
305 + const int max_threads = 1024;
306 + int num_per_thread = ceil((float)num_detections/max_threads);
307 + nms_rotate_kernel<<<1, max_threads, 0, stream>>>(
308 + num_per_thread, nms_thresh, num_detections, indices_sorted, scores_sorted, in_classes, in_boxes);
309 + // Re-sort with updated scores
310 + cub::DeviceRadixSort::SortPairsDescending(workspace, workspace_size, scores_sorted, scores,
311 + indices_sorted, indices, num_detections, 0, sizeof( *scores ) * 8, stream ); // From 8
312 + // Gather filtered scores, boxes, classes
313 + num_detections = min( detections_per_im, num_detections );
314 + cudaMemcpyAsync( out_scores, scores, num_detections * sizeof *scores, cudaMemcpyDeviceToDevice, stream );
315 + if ( num_detections < detections_per_im ) {
316 + thrust::fill_n( on_stream, out_scores + num_detections, detections_per_im - num_detections, 0 );
317 + }
318 + thrust::gather( on_stream, indices, indices + num_detections, in_boxes, out_boxes );
319 + thrust::gather( on_stream, indices, indices + num_detections, in_classes, out_classes );
320 + }
321 + return 0;
322 +}
323 +
324 +__global__ void iou_cuda_kernel(int const numBoxes, int const numAnchors,
325 + float2 const *b_box_vals, float2 const *a_box_vals, float *iou_vals ) {
326 + int t = blockIdx.x * blockDim.x + threadIdx.x;
327 + int stride = blockDim.x * gridDim.x;
328 + int combos = numBoxes * numAnchors;
329 + for ( int tid = t; tid < combos; tid += stride ) {
330 + float2 intersection[kPoints] { -1.0f, -1.0f, -1.0f, -1.0f, -1.0f, -1.0f, -1.0f, -1.0f,
331 + -1.0f, -1.0f, -1.0f, -1.0f, -1.0f, -1.0f, -1.0f, -1.0f };
332 + float2 rect1[kPoints] {};
333 + float2 rect1_shift[kPoints] {};
334 + float2 rect2[kPoints] {};
335 + float2 rect2_shift[kPoints] {};
336 + float2 pad;
337 +#pragma unroll
338 + for ( int b = 0; b < kCorners; b++ ) {
339 + if (b_box_vals[(static_cast<int>(tid/numAnchors) * kCorners + b)].x == a_box_vals[(tid * kCorners + b) % (numAnchors * kCorners)].x)
340 + pad.x = 0.001f;
341 + else
342 + pad.x = 0.0f;
343 + if (b_box_vals[(static_cast<int>(tid/numAnchors) * kCorners + b)].y == a_box_vals[(tid * kCorners + b) % (numAnchors * kCorners)].y)
344 + pad.y = 0.001f;
345 + else
346 + pad.y = 0.0f;
347 + intersection[b] = padfloat2( b_box_vals[( static_cast<int>( tid / numAnchors ) * kCorners + b )], pad);
348 + rect1[b] = b_box_vals[( static_cast<int>( tid / numAnchors ) * kCorners + b )];
349 + rect1_shift[b] = b_box_vals[( static_cast<int>( tid / numAnchors ) * kCorners + b )];
350 + rect2[b] = a_box_vals[( tid * kCorners + b ) % ( numAnchors * kCorners )];
351 + rect2_shift[b] = a_box_vals[( tid * kCorners + b ) % ( numAnchors * kCorners )];
352 + }
353 + rotateLeft( rect1_shift, 4 );
354 + rotateLeft( rect2_shift, 4 );
355 + float intersection_area = IntersectionArea( rect2, rect2_shift, intersection );
356 + // Union
357 + float rect1_area = 0.0f;
358 + float rect2_area = 0.0f;
359 +#pragma unroll
360 + for ( int k = 0; k < kCorners; k++ ) {
361 + rect1_area += rect1[k].x * rect1_shift[k].y - rect1[k].y * rect1_shift[k].x;
362 + rect2_area += rect2[k].x * rect2_shift[k].y - rect2[k].y * rect2_shift[k].x;
363 + }
364 + float union_area = ( abs( rect1_area ) + abs( rect2_area ) ) / 2.0f;
365 + float iou_val = intersection_area / ( union_area - intersection_area );
366 + // Write out answer
367 + if ( isnan( intersection_area ) && isnan( union_area ) ) {
368 + iou_vals[tid] = 1.0f;
369 + } else if ( isnan( intersection_area ) ) {
370 + iou_vals[tid] = 0.0f;
371 + } else {
372 + iou_vals[tid] = iou_val;
373 + }
374 + }
375 +}
376 +
377 +int iou( const void *const *inputs, void *const *outputs, int num_boxes, int num_anchors, cudaStream_t stream ) {
378 + auto boxes = static_cast<const float2 *>( inputs[0] );
379 + auto anchors = static_cast<const float2 *>( inputs[1] );
380 + auto iou_vals = static_cast<float *>( outputs[0] );
381 + int numSMs;
382 + cudaDeviceGetAttribute( &numSMs, cudaDevAttrMultiProcessorCount, 0 );
383 + int threadsPerBlock = kTPB;
384 + int blocksPerGrid = numSMs * 10;
385 + iou_cuda_kernel<<<blocksPerGrid, threadsPerBlock, 0, stream>>>( num_anchors, num_boxes, anchors, boxes, iou_vals );
386 + return 0;
387 +}
388 +
389 +}
390 +}
1 +/*
2 + * Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
3 + *
4 + * Permission is hereby granted, free of charge, to any person obtaining a
5 + * copy of this software and associated documentation files (the "Software"),
6 + * to deal in the Software without restriction, including without limitation
7 + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
8 + * and/or sell copies of the Software, and to permit persons to whom the
9 + * Software is furnished to do so, subject to the following conditions:
10 + *
11 + * The above copyright notice and this permission notice shall be included in
12 + * all copies or substantial portions of the Software.
13 + *
14 + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
17 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
19 + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
20 + * DEALINGS IN THE SOFTWARE.
21 + */
22 +
23 +#pragma once
24 +
25 +namespace odtk {
26 +namespace cuda {
27 +
28 +int nms_rotate(int batchSize,
29 + const void *const *inputs, void *const *outputs,
30 + size_t count, int detections_per_im, float nms_thresh,
31 + void *workspace, size_t workspace_size, cudaStream_t stream);
32 +
33 +int iou(
34 + const void *const *inputs, void *const *outputs,
35 + int num_boxes, int num_anchors, cudaStream_t stream);
36 +}
37 +}
...\ No newline at end of file ...\ No newline at end of file
1 +/*
2 + * Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
3 + *
4 + * Permission is hereby granted, free of charge, to any person obtaining a
5 + * copy of this software and associated documentation files (the "Software"),
6 + * to deal in the Software without restriction, including without limitation
7 + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
8 + * and/or sell copies of the Software, and to permit persons to whom the
9 + * Software is furnished to do so, subject to the following conditions:
10 + *
11 + * The above copyright notice and this permission notice shall be included in
12 + * all copies or substantial portions of the Software.
13 + *
14 + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
17 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
19 + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
20 + * DEALINGS IN THE SOFTWARE.
21 + */
22 +
23 +#pragma once
24 +#include <stdexcept>
25 +#include <cstdint>
26 +#include <thrust/functional.h>
27 +
28 +#define CUDA_ALIGN 256
29 +
30 +struct float6
31 +{
32 + float x1, y1, x2, y2, s, c;
33 +};
34 +
35 +inline __host__ __device__ float6 make_float6(float4 f, float2 t)
36 +{
37 + float6 fs;
38 + fs.x1 = f.x; fs.y1 = f.y; fs.x2 = f.z; fs.y2 = f.w; fs.s = t.x; fs.c = t.y;
39 + return fs;
40 +}
41 +
42 +template <typename T>
43 +inline size_t get_size_aligned(size_t num_elem) {
44 + size_t size = num_elem * sizeof(T);
45 + size_t extra_align = 0;
46 + if (size % CUDA_ALIGN != 0) {
47 + extra_align = CUDA_ALIGN - size % CUDA_ALIGN;
48 + }
49 + return size + extra_align;
50 +}
51 +
52 +template <typename T>
53 +inline T *get_next_ptr(size_t num_elem, void *&workspace, size_t &workspace_size) {
54 + size_t size = get_size_aligned<T>(num_elem);
55 + if (size > workspace_size) {
56 + throw std::runtime_error("Workspace is too small!");
57 + }
58 + workspace_size -= size;
59 + T *ptr = reinterpret_cast<T *>(workspace);
60 + workspace = reinterpret_cast<void *>(reinterpret_cast<uintptr_t>(workspace) + size);
61 + return ptr;
62 +}
1 +/*
2 + * Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
3 + *
4 + * Permission is hereby granted, free of charge, to any person obtaining a
5 + * copy of this software and associated documentation files (the "Software"),
6 + * to deal in the Software without restriction, including without limitation
7 + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
8 + * and/or sell copies of the Software, and to permit persons to whom the
9 + * Software is furnished to do so, subject to the following conditions:
10 + *
11 + * The above copyright notice and this permission notice shall be included in
12 + * all copies or substantial portions of the Software.
13 + *
14 + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
17 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
19 + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
20 + * DEALINGS IN THE SOFTWARE.
21 + */
22 +
23 +#include "engine.h"
24 +
25 +#include <iostream>
26 +#include <fstream>
27 +
28 +#include <NvOnnxConfig.h>
29 +#include <NvOnnxParser.h>
30 +
31 +#include "plugins/DecodePlugin.h"
32 +#include "plugins/NMSPlugin.h"
33 +#include "plugins/DecodeRotatePlugin.h"
34 +#include "plugins/NMSRotatePlugin.h"
35 +#include "calibrator.h"
36 +
37 +#include <stdio.h>
38 +#include <string>
39 +
40 +using namespace nvinfer1;
41 +using namespace nvonnxparser;
42 +
43 +namespace odtk {
44 +
45 +class Logger : public ILogger {
46 +public:
47 + Logger(bool verbose)
48 + : _verbose(verbose) {
49 + }
50 +
51 + void log(Severity severity, const char *msg) override {
52 + if (_verbose || ((severity != Severity::kINFO) && (severity != Severity::kVERBOSE)))
53 + cout << msg << endl;
54 + }
55 +
56 +private:
57 + bool _verbose{false};
58 +};
59 +
60 +void Engine::_load(const string &path) {
61 + ifstream file(path, ios::in | ios::binary);
62 + file.seekg (0, file.end);
63 + size_t size = file.tellg();
64 + file.seekg (0, file.beg);
65 +
66 + char *buffer = new char[size];
67 + file.read(buffer, size);
68 + file.close();
69 +
70 + _engine = _runtime->deserializeCudaEngine(buffer, size, nullptr);
71 +
72 + delete[] buffer;
73 +}
74 +
75 +void Engine::_prepare() {
76 + _context = _engine->createExecutionContext();
77 + _context->setOptimizationProfileAsync(0, _stream);
78 + cudaStreamCreate(&_stream);
79 +}
80 +
81 +Engine::Engine(const string &engine_path, bool verbose) {
82 + Logger logger(verbose);
83 + _runtime = createInferRuntime(logger);
84 + _load(engine_path);
85 + _prepare();
86 +}
87 +
88 +Engine::~Engine() {
89 + if (_stream) cudaStreamDestroy(_stream);
90 + if (_context) _context->destroy();
91 + if (_engine) _engine->destroy();
92 + if (_runtime) _runtime->destroy();
93 +}
94 +
95 +Engine::Engine(const char *onnx_model, size_t onnx_size, const vector<int>& dynamic_batch_opts,
96 + string precision, float score_thresh, int top_n, const vector<vector<float>>& anchors,
97 + bool rotated, float nms_thresh, int detections_per_im, const vector<string>& calibration_images,
98 + string model_name, string calibration_table, bool verbose, size_t workspace_size) {
99 +
100 + Logger logger(verbose);
101 + _runtime = createInferRuntime(logger);
102 +
103 + bool fp16 = precision.compare("FP16") == 0;
104 + bool int8 = precision.compare("INT8") == 0;
105 +
106 + // Create builder
107 + auto builder = createInferBuilder(logger);
108 + const auto builderConfig = builder->createBuilderConfig();
109 + // Allow use of FP16 layers when running in INT8
110 + if(fp16 || int8) builderConfig->setFlag(BuilderFlag::kFP16);
111 + builderConfig->setMaxWorkspaceSize(workspace_size);
112 +
113 + // Parse ONNX FCN
114 + cout << "Building " << precision << " core model..." << endl;
115 + const auto flags = 1U << static_cast<int>(NetworkDefinitionCreationFlag::kEXPLICIT_BATCH);
116 + auto network = builder->createNetworkV2(flags);
117 + auto parser = createParser(*network, logger);
118 + parser->parse(onnx_model, onnx_size);
119 +
120 + auto input = network->getInput(0);
121 + auto inputDims = input->getDimensions();
122 + auto profile = builder->createOptimizationProfile();
123 + auto inputName = input->getName();
124 + auto profileDimsmin = Dims4{dynamic_batch_opts[0], inputDims.d[1], inputDims.d[2], inputDims.d[3]};
125 + auto profileDimsopt = Dims4{dynamic_batch_opts[1], inputDims.d[1], inputDims.d[2], inputDims.d[3]};
126 + auto profileDimsmax = Dims4{dynamic_batch_opts[2], inputDims.d[1], inputDims.d[2], inputDims.d[3]};
127 +
128 + profile->setDimensions(inputName, nvinfer1::OptProfileSelector::kMIN, profileDimsmin);
129 + profile->setDimensions(inputName, nvinfer1::OptProfileSelector::kOPT, profileDimsopt);
130 + profile->setDimensions(inputName, nvinfer1::OptProfileSelector::kMAX, profileDimsmax);
131 +
132 + if(profile->isValid())
133 + builderConfig->addOptimizationProfile(profile);
134 +
135 + std::unique_ptr<Int8EntropyCalibrator> calib;
136 + if (int8) {
137 + builderConfig->setFlag(BuilderFlag::kINT8);
138 + // Calibration is performed using kOPT values of the profile.
139 + // Calibration batch size must match this profile.
140 + builderConfig->setCalibrationProfile(profile);
141 + ImageStream stream(dynamic_batch_opts[1], inputDims, calibration_images);
142 + calib = std::unique_ptr<Int8EntropyCalibrator>(new Int8EntropyCalibrator(stream, model_name, calibration_table));
143 + builderConfig->setInt8Calibrator(calib.get());
144 + }
145 +
146 + // Add decode plugins
147 + cout << "Building accelerated plugins..." << endl;
148 + vector<DecodePlugin> decodePlugins;
149 + vector<DecodeRotatePlugin> decodeRotatePlugins;
150 + vector<ITensor *> scores, boxes, classes;
151 + auto nbOutputs = network->getNbOutputs();
152 +
153 + for (int i = 0; i < nbOutputs / 2; i++) {
154 + auto classOutput = network->getOutput(i);
155 + auto boxOutput = network->getOutput(nbOutputs / 2 + i);
156 + auto outputDims = classOutput->getDimensions();
157 + int scale = inputDims.d[2] / outputDims.d[2];
158 + auto decodePlugin = DecodePlugin(score_thresh, top_n, anchors[i], scale);
159 + auto decodeRotatePlugin = DecodeRotatePlugin(score_thresh, top_n, anchors[i], scale);
160 + decodePlugins.push_back(decodePlugin);
161 + decodeRotatePlugins.push_back(decodeRotatePlugin);
162 + vector<ITensor *> inputs = {classOutput, boxOutput};
163 + auto layer = (!rotated) ? network->addPluginV2(inputs.data(), inputs.size(), decodePlugin) \
164 + : network->addPluginV2(inputs.data(), inputs.size(), decodeRotatePlugin);
165 + scores.push_back(layer->getOutput(0));
166 + boxes.push_back(layer->getOutput(1));
167 + classes.push_back(layer->getOutput(2));
168 + }
169 +
170 + // Cleanup outputs
171 + for (int i = 0; i < nbOutputs; i++) {
172 + auto output = network->getOutput(0);
173 + network->unmarkOutput(*output);
174 + }
175 +
176 + // Concat tensors from each feature map
177 + vector<ITensor *> concat;
178 + for (auto tensors : {scores, boxes, classes}) {
179 + auto layer = network->addConcatenation(tensors.data(), tensors.size());
180 + concat.push_back(layer->getOutput(0));
181 + }
182 +
183 + // Add NMS plugin
184 + auto nmsPlugin = NMSPlugin(nms_thresh, detections_per_im);
185 + auto nmsRotatePlugin = NMSRotatePlugin(nms_thresh, detections_per_im);
186 + auto layer = (!rotated) ? network->addPluginV2(concat.data(), concat.size(), nmsPlugin) \
187 + : network->addPluginV2(concat.data(), concat.size(), nmsRotatePlugin);
188 + vector<string> names = {"scores", "boxes", "classes"};
189 + for (int i = 0; i < layer->getNbOutputs(); i++) {
190 + auto output = layer->getOutput(i);
191 + network->markOutput(*output);
192 + output->setName(names[i].c_str());
193 + }
194 +
195 + // Build engine
196 + cout << "Applying optimizations and building TRT CUDA engine..." << endl;
197 + _engine = builder->buildEngineWithConfig(*network, *builderConfig);
198 +
199 + // Housekeeping
200 + parser->destroy();
201 + network->destroy();
202 + builderConfig->destroy();
203 + builder->destroy();
204 +
205 + _prepare();
206 +}
207 +
208 +void Engine::save(const string &path) {
209 + cout << "Writing to " << path << "..." << endl;
210 + auto serialized = _engine->serialize();
211 + ofstream file(path, ios::out | ios::binary);
212 + file.write(reinterpret_cast<const char*>(serialized->data()), serialized->size());
213 +
214 + serialized->destroy();
215 +}
216 +
217 +void Engine::infer(vector<void *> &buffers, int batch){
218 + auto dims = _engine->getBindingDimensions(0);
219 + _context->setBindingDimensions(0, Dims4(batch, dims.d[1], dims.d[2], dims.d[3]));
220 + _context->enqueueV2(buffers.data(), _stream, nullptr);
221 + cudaStreamSynchronize(_stream);
222 +}
223 +
224 +vector<int> Engine::getInputSize() {
225 + auto dims = _engine->getBindingDimensions(0);
226 + return {dims.d[2], dims.d[3]};
227 +}
228 +
229 +int Engine::getMaxBatchSize() {
230 + return _engine->getMaxBatchSize();
231 +}
232 +
233 +int Engine::getMaxDetections() {
234 + return _engine->getBindingDimensions(1).d[1];
235 +}
236 +
237 +int Engine::getStride() {
238 + return 1;
239 +}
240 +
241 +}
1 +/*
2 + * Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
3 + *
4 + * Permission is hereby granted, free of charge, to any person obtaining a
5 + * copy of this software and associated documentation files (the "Software"),
6 + * to deal in the Software without restriction, including without limitation
7 + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
8 + * and/or sell copies of the Software, and to permit persons to whom the
9 + * Software is furnished to do so, subject to the following conditions:
10 + *
11 + * The above copyright notice and this permission notice shall be included in
12 + * all copies or substantial portions of the Software.
13 + *
14 + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
17 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
19 + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
20 + * DEALINGS IN THE SOFTWARE.
21 + */
22 +
23 +#pragma once
24 +
25 +#include <string>
26 +#include <vector>
27 +
28 +#include <NvInfer.h>
29 +
30 +#include <cuda_runtime.h>
31 +
32 +using namespace std;
33 +using namespace nvinfer1;
34 +
35 +namespace odtk {
36 +
37 +// RetinaNet wrapper around TensorRT CUDA engine
38 +class Engine {
39 +public:
40 + // Create engine from engine path
41 + Engine(const string &engine_path, bool verbose=false);
42 +
43 + // Create engine from serialized onnx model
44 +
45 + Engine(const char *onnx_model, size_t onnx_size, const vector<int>& dynamic_batch_opts,
46 + string precision, float score_thresh, int top_n, const vector<vector<float>>& anchors,
47 + bool rotated, float nms_thresh, int detections_per_im, const vector<string>& calibration_images,
48 + string model_name, string calibration_table, bool verbose, size_t workspace_size=(1ULL << 30));
49 +
50 + ~Engine();
51 +
52 + // Save model to path
53 + void save(const string &path);
54 +
55 + // Infer using pre-allocated GPU buffers {data, scores, boxes, classes}
56 + void infer(vector<void *> &buffers, int batch);
57 +
58 + // Get (h, w) size of the fixed input
59 + vector<int> getInputSize();
60 +
61 + // Get max allowed batch size
62 + int getMaxBatchSize();
63 +
64 + // Get max number of detections
65 + int getMaxDetections();
66 +
67 + // Get stride
68 + int getStride();
69 +
70 +private:
71 + IRuntime *_runtime = nullptr;
72 + ICudaEngine *_engine = nullptr;
73 + IExecutionContext *_context = nullptr;
74 + cudaStream_t _stream = nullptr;
75 +
76 + void _load(const string &path);
77 + void _prepare();
78 +
79 +};
80 +
81 +}
1 +/*
2 + * Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
3 + *
4 + * Permission is hereby granted, free of charge, to any person obtaining a
5 + * copy of this software and associated documentation files (the "Software"),
6 + * to deal in the Software without restriction, including without limitation
7 + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
8 + * and/or sell copies of the Software, and to permit persons to whom the
9 + * Software is furnished to do so, subject to the following conditions:
10 + *
11 + * The above copyright notice and this permission notice shall be included in
12 + * all copies or substantial portions of the Software.
13 + *
14 + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
17 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
19 + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
20 + * DEALINGS IN THE SOFTWARE.
21 + */
22 +
23 +#include <algorithm>
24 +#include <iostream>
25 +#include <stdexcept>
26 +#include <cstdint>
27 +#include <cmath>
28 +
29 +#include <torch/extension.h>
30 +#include <ATen/cuda/CUDAContext.h>
31 +
32 +#include <vector>
33 +#include <optional>
34 +
35 +#include "engine.h"
36 +#include "cuda/decode.h"
37 +#include "cuda/decode_rotate.h"
38 +#include "cuda/nms.h"
39 +#include "cuda/nms_iou.h"
40 +#include <stdio.h>
41 +
42 +#define CHECK_CUDA(x) AT_ASSERTM(x.is_cuda(), #x " must be a CUDA tensor")
43 +#define CHECK_CONTIGUOUS(x) AT_ASSERTM(x.is_contiguous(), #x " must be contiguous")
44 +#define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
45 +
46 +
47 +vector<at::Tensor> iou(at::Tensor boxes, at::Tensor anchors) {
48 +
49 + CHECK_INPUT(boxes);
50 + CHECK_INPUT(anchors);
51 +
52 + int num_boxes = boxes.numel() / 8;
53 + int num_anchors = anchors.numel() / 8;
54 + auto options = boxes.options();
55 +
56 + auto iou_vals = at::zeros({num_boxes*num_anchors}, options);
57 +
58 + // Calculate Polygon IOU
59 + vector<void *> inputs = {boxes.data_ptr(), anchors.data_ptr()};
60 + vector<void *> outputs = {iou_vals.data_ptr()};
61 +
62 + odtk::cuda::iou(inputs.data(), outputs.data(), num_boxes, num_anchors, at::cuda::getCurrentCUDAStream());
63 +
64 + auto shape = std::vector<int64_t>{num_anchors, num_boxes};
65 +
66 + return {iou_vals.reshape(shape)};
67 +}
68 +
69 +vector<at::Tensor> decode(at::Tensor cls_head, at::Tensor box_head,
70 + vector<float> &anchors, int scale, float score_thresh, int top_n, bool rotated=false) {
71 +
72 + CHECK_INPUT(cls_head);
73 + CHECK_INPUT(box_head);
74 +
75 + int num_boxes = (!rotated) ? 4 : 6;
76 + int batch = cls_head.size(0);
77 + int num_anchors = anchors.size() / 4;
78 + int num_classes = cls_head.size(1) / num_anchors;
79 + int height = cls_head.size(2);
80 + int width = cls_head.size(3);
81 + auto options = cls_head.options();
82 +
83 + auto scores = at::zeros({batch, top_n}, options);
84 + auto boxes = at::zeros({batch, top_n, num_boxes}, options);
85 + auto classes = at::zeros({batch, top_n}, options);
86 +
87 + vector<void *> inputs = {cls_head.data_ptr(), box_head.data_ptr()};
88 + vector<void *> outputs = {scores.data_ptr(), boxes.data_ptr(), classes.data_ptr()};
89 +
90 + if(!rotated) {
91 + // Create scratch buffer
92 + int size = odtk::cuda::decode(batch, nullptr, nullptr, height, width, scale,
93 + num_anchors, num_classes, anchors, score_thresh, top_n, nullptr, 0, nullptr);
94 + auto scratch = at::zeros({size}, options.dtype(torch::kUInt8));
95 +
96 + // Decode boxes
97 + odtk::cuda::decode(batch, inputs.data(), outputs.data(), height, width, scale,
98 + num_anchors, num_classes, anchors, score_thresh, top_n,
99 + scratch.data_ptr(), size, at::cuda::getCurrentCUDAStream());
100 +
101 + }
102 + else {
103 + // Create scratch buffer
104 + int size = odtk::cuda::decode_rotate(batch, nullptr, nullptr, height, width, scale,
105 + num_anchors, num_classes, anchors, score_thresh, top_n, nullptr, 0, nullptr);
106 + auto scratch = at::zeros({size}, options.dtype(torch::kUInt8));
107 +
108 + // Decode boxes
109 + odtk::cuda::decode_rotate(batch, inputs.data(), outputs.data(), height, width, scale,
110 + num_anchors, num_classes, anchors, score_thresh, top_n,
111 + scratch.data_ptr(), size, at::cuda::getCurrentCUDAStream());
112 + }
113 +
114 + return {scores, boxes, classes};
115 +}
116 +
117 +vector<at::Tensor> nms(at::Tensor scores, at::Tensor boxes, at::Tensor classes,
118 + float nms_thresh, int detections_per_im, bool rotated=false) {
119 +
120 + CHECK_INPUT(scores);
121 + CHECK_INPUT(boxes);
122 + CHECK_INPUT(classes);
123 +
124 + int num_boxes = (!rotated) ? 4 : 6;
125 + int batch = scores.size(0);
126 + int count = scores.size(1);
127 + auto options = scores.options();
128 + auto nms_scores = at::zeros({batch, detections_per_im}, scores.options());
129 + auto nms_boxes = at::zeros({batch, detections_per_im, num_boxes}, boxes.options());
130 + auto nms_classes = at::zeros({batch, detections_per_im}, classes.options());
131 +
132 + vector<void *> inputs = {scores.data_ptr(), boxes.data_ptr(), classes.data_ptr()};
133 + vector<void *> outputs = {nms_scores.data_ptr(), nms_boxes.data_ptr(), nms_classes.data_ptr()};
134 +
135 + if(!rotated) {
136 + // Create scratch buffer
137 + int size = odtk::cuda::nms(batch, nullptr, nullptr, count,
138 + detections_per_im, nms_thresh, nullptr, 0, nullptr);
139 + auto scratch = at::zeros({size}, options.dtype(torch::kUInt8));
140 +
141 + // Perform NMS
142 + odtk::cuda::nms(batch, inputs.data(), outputs.data(), count, detections_per_im,
143 + nms_thresh, scratch.data_ptr(), size, at::cuda::getCurrentCUDAStream());
144 + }
145 + else {
146 + // Create scratch buffer
147 + int size = odtk::cuda::nms_rotate(batch, nullptr, nullptr, count,
148 + detections_per_im, nms_thresh, nullptr, 0, nullptr);
149 + auto scratch = at::zeros({size}, options.dtype(torch::kUInt8));
150 +
151 + // Perform NMS
152 + odtk::cuda::nms_rotate(batch, inputs.data(), outputs.data(), count,
153 + detections_per_im, nms_thresh, scratch.data_ptr(), size, at::cuda::getCurrentCUDAStream());
154 + }
155 +
156 +
157 + return {nms_scores, nms_boxes, nms_classes};
158 +}
159 +
160 +vector<at::Tensor> infer(odtk::Engine &engine, at::Tensor data, bool rotated=false) {
161 + CHECK_INPUT(data);
162 +
163 + int num_boxes = (!rotated) ? 4 : 6;
164 + int batch = data.size(0);
165 + auto input_size = engine.getInputSize();
166 + data = at::constant_pad_nd(data, {0, input_size[1] - data.size(3), 0, input_size[0] - data.size(2)});
167 +
168 + int num_detections = engine.getMaxDetections();
169 + auto scores = at::zeros({batch, num_detections}, data.options());
170 + auto boxes = at::zeros({batch, num_detections, num_boxes}, data.options());
171 + auto classes = at::zeros({batch, num_detections}, data.options());
172 +
173 + vector<void *> buffers;
174 + for (auto buffer : {data, scores, boxes, classes}) {
175 + buffers.push_back(buffer.data<float>());
176 + }
177 +
178 + engine.infer(buffers, batch);
179 +
180 + return {scores, boxes, classes};
181 +}
182 +
183 +
184 +PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
185 + pybind11::class_<odtk::Engine>(m, "Engine")
186 + .def(pybind11::init<const char *, size_t, const vector<int>&, string, float, int,
187 + const vector<vector<float>>&, bool, float, int, const vector<string>&, string, string, bool>())
188 + .def("save", &odtk::Engine::save)
189 + .def("infer", &odtk::Engine::infer)
190 + .def_property_readonly("stride", &odtk::Engine::getStride)
191 + .def_property_readonly("input_size", &odtk::Engine::getInputSize)
192 + .def_static("load", [](const string &path) {
193 + return new odtk::Engine(path);
194 + })
195 + .def("__call__", [](odtk::Engine &engine, at::Tensor data, bool rotated=false) {
196 + return infer(engine, data, rotated);
197 + });
198 + m.def("decode", &decode);
199 + m.def("nms", &nms);
200 + m.def("iou", &iou);
201 +}
...\ No newline at end of file ...\ No newline at end of file
1 +/*
2 + * Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
3 + *
4 + * Permission is hereby granted, free of charge, to any person obtaining a
5 + * copy of this software and associated documentation files (the "Software"),
6 + * to deal in the Software without restriction, including without limitation
7 + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
8 + * and/or sell copies of the Software, and to permit persons to whom the
9 + * Software is furnished to do so, subject to the following conditions:
10 + *
11 + * The above copyright notice and this permission notice shall be included in
12 + * all copies or substantial portions of the Software.
13 + *
14 + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
17 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
19 + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
20 + * DEALINGS IN THE SOFTWARE.
21 + */
22 +
23 +#pragma once
24 +
25 +#include <NvInfer.h>
26 +
27 +#include <cassert>
28 +#include <vector>
29 +
30 +#include "../cuda/decode.h"
31 +
32 +using namespace nvinfer1;
33 +
34 +#define RETINANET_PLUGIN_NAME "RetinaNetDecode"
35 +#define RETINANET_PLUGIN_VERSION "1"
36 +#define RETINANET_PLUGIN_NAMESPACE ""
37 +
38 +namespace odtk {
39 +
40 +class DecodePlugin : public IPluginV2DynamicExt {
41 + float _score_thresh;
42 + int _top_n;
43 + std::vector<float> _anchors;
44 + float _scale;
45 +
46 + size_t _height;
47 + size_t _width;
48 + size_t _num_anchors;
49 + size_t _num_classes;
50 + mutable int size = -1;
51 +
52 +protected:
53 + void deserialize(void const* data, size_t length) {
54 + const char* d = static_cast<const char*>(data);
55 + read(d, _score_thresh);
56 + read(d, _top_n);
57 + size_t anchors_size;
58 + read(d, anchors_size);
59 + while( anchors_size-- ) {
60 + float val;
61 + read(d, val);
62 + _anchors.push_back(val);
63 + }
64 + read(d, _scale);
65 + read(d, _height);
66 + read(d, _width);
67 + read(d, _num_anchors);
68 + read(d, _num_classes);
69 + }
70 +
71 + size_t getSerializationSize() const override {
72 + return sizeof(_score_thresh) + sizeof(_top_n)
73 + + sizeof(size_t) + sizeof(float) * _anchors.size() + sizeof(_scale)
74 + + sizeof(_height) + sizeof(_width) + sizeof(_num_anchors) + sizeof(_num_classes);
75 + }
76 +
77 + void serialize(void *buffer) const override {
78 + char* d = static_cast<char*>(buffer);
79 + write(d, _score_thresh);
80 + write(d, _top_n);
81 + write(d, _anchors.size());
82 + for( auto &val : _anchors ) {
83 + write(d, val);
84 + }
85 + write(d, _scale);
86 + write(d, _height);
87 + write(d, _width);
88 + write(d, _num_anchors);
89 + write(d, _num_classes);
90 + }
91 +
92 +public:
93 + DecodePlugin(float score_thresh, int top_n, std::vector<float> const& anchors, int scale)
94 + : _score_thresh(score_thresh), _top_n(top_n), _anchors(anchors), _scale(scale) {}
95 +
96 + DecodePlugin(float score_thresh, int top_n, std::vector<float> const& anchors, int scale,
97 + size_t height, size_t width, size_t num_anchors, size_t num_classes)
98 + : _score_thresh(score_thresh), _top_n(top_n), _anchors(anchors), _scale(scale),
99 + _height(height), _width(width), _num_anchors(num_anchors), _num_classes(num_classes) {}
100 +
101 + DecodePlugin(void const* data, size_t length) {
102 + this->deserialize(data, length);
103 + }
104 +
105 + const char *getPluginType() const override {
106 + return RETINANET_PLUGIN_NAME;
107 + }
108 +
109 + const char *getPluginVersion() const override {
110 + return RETINANET_PLUGIN_VERSION;
111 + }
112 +
113 + int getNbOutputs() const override {
114 + return 3;
115 + }
116 +
117 + DimsExprs getOutputDimensions(int outputIndex, const DimsExprs *inputs,
118 + int nbInputs, IExprBuilder &exprBuilder) override
119 + {
120 + DimsExprs output(inputs[0]);
121 + output.d[1] = exprBuilder.constant(_top_n * (outputIndex == 1 ? 4 : 1));
122 + output.d[2] = exprBuilder.constant(1);
123 + output.d[3] = exprBuilder.constant(1);
124 +
125 + return output;
126 + }
127 +
128 + bool supportsFormatCombination(int pos, const PluginTensorDesc *inOut,
129 + int nbInputs, int nbOutputs) override
130 + {
131 + assert(nbInputs == 2);
132 + assert(nbOutputs == 3);
133 + assert(pos < 5);
134 + return inOut[pos].type == DataType::kFLOAT && inOut[pos].format == nvinfer1::PluginFormat::kLINEAR;
135 + }
136 +
137 + int initialize() override { return 0; }
138 +
139 + void terminate() override {}
140 +
141 + size_t getWorkspaceSize(const PluginTensorDesc *inputs,
142 + int nbInputs, const PluginTensorDesc *outputs, int nbOutputs) const override
143 + {
144 + if (size < 0) {
145 + size = cuda::decode(inputs->dims.d[0], nullptr, nullptr, _height, _width, _scale,
146 + _num_anchors, _num_classes, _anchors, _score_thresh, _top_n,
147 + nullptr, 0, nullptr);
148 + }
149 + return size;
150 + }
151 +
152 + int enqueue(const PluginTensorDesc *inputDesc,
153 + const PluginTensorDesc *outputDesc, const void *const *inputs,
154 + void *const *outputs, void *workspace, cudaStream_t stream)
155 + {
156 +
157 + return cuda::decode(inputDesc->dims.d[0], inputs, outputs, _height, _width, _scale,
158 + _num_anchors, _num_classes, _anchors, _score_thresh, _top_n,
159 + workspace, getWorkspaceSize(inputDesc, 2, outputDesc, 3), stream);
160 +
161 + }
162 +
163 + void destroy() override {
164 + delete this;
165 + };
166 +
167 + const char *getPluginNamespace() const override {
168 + return RETINANET_PLUGIN_NAMESPACE;
169 + }
170 +
171 + void setPluginNamespace(const char *N) override {}
172 +
173 + DataType getOutputDataType(int index, const DataType* inputTypes, int nbInputs) const
174 + {
175 + assert(index < 3);
176 + return DataType::kFLOAT;
177 + }
178 +
179 + void configurePlugin(const DynamicPluginTensorDesc *in, int nbInputs,
180 + const DynamicPluginTensorDesc *out, int nbOutputs)
181 + {
182 + assert(nbInputs == 2);
183 + assert(nbOutputs == 3);
184 + auto const& scores_dims = in[0].desc.dims;
185 + auto const& boxes_dims = in[1].desc.dims;
186 + assert(scores_dims.d[2] == boxes_dims.d[2]);
187 + assert(scores_dims.d[3] == boxes_dims.d[3]);
188 + _height = scores_dims.d[2];
189 + _width = scores_dims.d[3];
190 + _num_anchors = boxes_dims.d[1] / 4;
191 + _num_classes = scores_dims.d[1] / _num_anchors;
192 + }
193 +
194 + IPluginV2DynamicExt *clone() const {
195 + return new DecodePlugin(_score_thresh, _top_n, _anchors, _scale, _height, _width,
196 + _num_anchors, _num_classes);
197 + }
198 +
199 +private:
200 + template<typename T> void write(char*& buffer, const T& val) const {
201 + *reinterpret_cast<T*>(buffer) = val;
202 + buffer += sizeof(T);
203 + }
204 +
205 + template<typename T> void read(const char*& buffer, T& val) {
206 + val = *reinterpret_cast<const T*>(buffer);
207 + buffer += sizeof(T);
208 + }
209 +};
210 +
211 +class DecodePluginCreator : public IPluginCreator {
212 +public:
213 + DecodePluginCreator() {}
214 +
215 + const char *getPluginName () const override {
216 + return RETINANET_PLUGIN_NAME;
217 + }
218 +
219 + const char *getPluginVersion () const override {
220 + return RETINANET_PLUGIN_VERSION;
221 + }
222 +
223 + const char *getPluginNamespace() const override {
224 + return RETINANET_PLUGIN_NAMESPACE;
225 + }
226 +
227 +
228 + IPluginV2DynamicExt *deserializePlugin (const char *name, const void *serialData, size_t serialLength) override {
229 + return new DecodePlugin(serialData, serialLength);
230 + }
231 +
232 + void setPluginNamespace(const char *N) override {}
233 + const PluginFieldCollection *getFieldNames() override { return nullptr; }
234 + IPluginV2DynamicExt *createPlugin (const char *name, const PluginFieldCollection *fc) override { return nullptr; }
235 +};
236 +
237 +REGISTER_TENSORRT_PLUGIN(DecodePluginCreator);
238 +
239 +}
240 +
241 +#undef RETINANET_PLUGIN_NAME
242 +#undef RETINANET_PLUGIN_VERSION
243 +#undef RETINANET_PLUGIN_NAMESPACE
...\ No newline at end of file ...\ No newline at end of file
1 +/*
2 + * Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
3 + *
4 + * Permission is hereby granted, free of charge, to any person obtaining a
5 + * copy of this software and associated documentation files (the "Software"),
6 + * to deal in the Software without restriction, including without limitation
7 + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
8 + * and/or sell copies of the Software, and to permit persons to whom the
9 + * Software is furnished to do so, subject to the following conditions:
10 + *
11 + * The above copyright notice and this permission notice shall be included in
12 + * all copies or substantial portions of the Software.
13 + *
14 + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
17 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
19 + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
20 + * DEALINGS IN THE SOFTWARE.
21 + */
22 +
23 +#pragma once
24 +
25 +#include <NvInfer.h>
26 +
27 +#include <cassert>
28 +#include <vector>
29 +
30 +#include "../cuda/decode_rotate.h"
31 +
32 +using namespace nvinfer1;
33 +
34 +#define RETINANET_PLUGIN_NAME "RetinaNetDecodeRotate"
35 +#define RETINANET_PLUGIN_VERSION "1"
36 +#define RETINANET_PLUGIN_NAMESPACE ""
37 +
38 +namespace odtk {
39 +
40 +class DecodeRotatePlugin : public IPluginV2DynamicExt {
41 + float _score_thresh;
42 + int _top_n;
43 + std::vector<float> _anchors;
44 + float _scale;
45 +
46 + size_t _height;
47 + size_t _width;
48 + size_t _num_anchors;
49 + size_t _num_classes;
50 + mutable int size = -1;
51 +
52 +protected:
53 + void deserialize(void const* data, size_t length) {
54 + const char* d = static_cast<const char*>(data);
55 + read(d, _score_thresh);
56 + read(d, _top_n);
57 + size_t anchors_size;
58 + read(d, anchors_size);
59 + while( anchors_size-- ) {
60 + float val;
61 + read(d, val);
62 + _anchors.push_back(val);
63 + }
64 + read(d, _scale);
65 + read(d, _height);
66 + read(d, _width);
67 + read(d, _num_anchors);
68 + read(d, _num_classes);
69 + }
70 +
71 + size_t getSerializationSize() const override {
72 + return sizeof(_score_thresh) + sizeof(_top_n)
73 + + sizeof(size_t) + sizeof(float) * _anchors.size() + sizeof(_scale)
74 + + sizeof(_height) + sizeof(_width) + sizeof(_num_anchors) + sizeof(_num_classes);
75 + }
76 +
77 + void serialize(void *buffer) const override {
78 + char* d = static_cast<char*>(buffer);
79 + write(d, _score_thresh);
80 + write(d, _top_n);
81 + write(d, _anchors.size());
82 + for( auto &val : _anchors ) {
83 + write(d, val);
84 + }
85 + write(d, _scale);
86 + write(d, _height);
87 + write(d, _width);
88 + write(d, _num_anchors);
89 + write(d, _num_classes);
90 + }
91 +
92 +public:
93 + DecodeRotatePlugin(float score_thresh, int top_n, std::vector<float> const& anchors, int scale)
94 + : _score_thresh(score_thresh), _top_n(top_n), _anchors(anchors), _scale(scale) {}
95 +
96 + DecodeRotatePlugin(float score_thresh, int top_n, std::vector<float> const& anchors, int scale,
97 + size_t height, size_t width, size_t num_anchors, size_t num_classes)
98 + : _score_thresh(score_thresh), _top_n(top_n), _anchors(anchors), _scale(scale),
99 + _height(height), _width(width), _num_anchors(num_anchors), _num_classes(num_classes) {}
100 +
101 + DecodeRotatePlugin(void const* data, size_t length) {
102 + this->deserialize(data, length);
103 + }
104 +
105 + const char *getPluginType() const override {
106 + return RETINANET_PLUGIN_NAME;
107 + }
108 +
109 + const char *getPluginVersion() const override {
110 + return RETINANET_PLUGIN_VERSION;
111 + }
112 +
113 + int getNbOutputs() const override {
114 + return 3;
115 + }
116 +
117 + DimsExprs getOutputDimensions(int outputIndex, const DimsExprs *inputs,
118 + int nbInputs, IExprBuilder &exprBuilder) override
119 + {
120 + DimsExprs output(inputs[0]);
121 + output.d[1] = exprBuilder.constant(_top_n * (outputIndex == 1 ? 6 : 1));
122 + output.d[2] = exprBuilder.constant(1);
123 + output.d[3] = exprBuilder.constant(1);
124 +
125 + return output;
126 + }
127 +
128 +
129 + bool supportsFormatCombination(int pos, const PluginTensorDesc *inOut,
130 + int nbInputs, int nbOutputs) override
131 + {
132 + assert(nbInputs == 2);
133 + assert(nbOutputs == 3);
134 + assert(pos < 5);
135 + return inOut[pos].type == DataType::kFLOAT && inOut[pos].format == nvinfer1::PluginFormat::kLINEAR;
136 + }
137 +
138 +
139 + int initialize() override { return 0; }
140 +
141 + void terminate() override {}
142 +
143 + size_t getWorkspaceSize(const PluginTensorDesc *inputs,
144 + int nbInputs, const PluginTensorDesc *outputs, int nbOutputs) const override
145 + {
146 + if (size < 0) {
147 + size = cuda::decode_rotate(inputs->dims.d[0], nullptr, nullptr, _height, _width, _scale,
148 + _num_anchors, _num_classes, _anchors, _score_thresh, _top_n,
149 + nullptr, 0, nullptr);
150 + }
151 + return size;
152 + }
153 +
154 + int enqueue(const PluginTensorDesc *inputDesc,
155 + const PluginTensorDesc *outputDesc, const void *const *inputs,
156 + void *const *outputs, void *workspace, cudaStream_t stream) override
157 + {
158 + return cuda::decode_rotate(inputDesc->dims.d[0], inputs, outputs, _height, _width, _scale,
159 + _num_anchors, _num_classes, _anchors, _score_thresh, _top_n,
160 + workspace, getWorkspaceSize(inputDesc, 2, outputDesc, 3), stream);
161 + }
162 +
163 + void destroy() override {
164 + delete this;
165 + };
166 +
167 + const char *getPluginNamespace() const override {
168 + return RETINANET_PLUGIN_NAMESPACE;
169 + }
170 +
171 + void setPluginNamespace(const char *N) override {}
172 +
173 + DataType getOutputDataType(int index, const DataType* inputTypes, int nbInputs) const
174 + {
175 + assert(index < 3);
176 + return DataType::kFLOAT;
177 + }
178 +
179 + void configurePlugin(const DynamicPluginTensorDesc *in, int nbInputs,
180 + const DynamicPluginTensorDesc *out, int nbOutputs)
181 + {
182 + assert(nbInputs == 2);
183 + assert(nbOutputs == 3);
184 + auto const& scores_dims = in[0].desc.dims;
185 + auto const& boxes_dims = in[1].desc.dims;
186 + assert(scores_dims.d[2] == boxes_dims.d[2]);
187 + assert(scores_dims.d[3] == boxes_dims.d[3]);
188 + _height = scores_dims.d[2];
189 + _width = scores_dims.d[3];
190 + _num_anchors = boxes_dims.d[1] / 6;
191 + _num_classes = scores_dims.d[1] / _num_anchors;
192 + }
193 +
194 + IPluginV2DynamicExt *clone() const override {
195 + return new DecodeRotatePlugin(_score_thresh, _top_n, _anchors, _scale, _height, _width,
196 + _num_anchors, _num_classes);
197 + }
198 +
199 +private:
200 + template<typename T> void write(char*& buffer, const T& val) const {
201 + *reinterpret_cast<T*>(buffer) = val;
202 + buffer += sizeof(T);
203 + }
204 +
205 + template<typename T> void read(const char*& buffer, T& val) {
206 + val = *reinterpret_cast<const T*>(buffer);
207 + buffer += sizeof(T);
208 + }
209 +};
210 +
211 +class DecodeRotatePluginCreator : public IPluginCreator {
212 +public:
213 + DecodeRotatePluginCreator() {}
214 +
215 + const char *getPluginName () const override {
216 + return RETINANET_PLUGIN_NAME;
217 + }
218 +
219 + const char *getPluginVersion () const override {
220 + return RETINANET_PLUGIN_VERSION;
221 + }
222 +
223 + const char *getPluginNamespace() const override {
224 + return RETINANET_PLUGIN_NAMESPACE;
225 + }
226 +
227 + IPluginV2DynamicExt *deserializePlugin (const char *name, const void *serialData, size_t serialLength) override {
228 + return new DecodeRotatePlugin(serialData, serialLength);
229 + }
230 +
231 + void setPluginNamespace(const char *N) override {}
232 + const PluginFieldCollection *getFieldNames() override { return nullptr; }
233 + IPluginV2DynamicExt *createPlugin (const char *name, const PluginFieldCollection *fc) override { return nullptr; }
234 +};
235 +
236 +REGISTER_TENSORRT_PLUGIN(DecodeRotatePluginCreator);
237 +
238 +}
239 +
240 +#undef RETINANET_PLUGIN_NAME
241 +#undef RETINANET_PLUGIN_VERSION
242 +#undef RETINANET_PLUGIN_NAMESPACE
1 +/*
2 + * Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
3 + *
4 + * Permission is hereby granted, free of charge, to any person obtaining a
5 + * copy of this software and associated documentation files (the "Software"),
6 + * to deal in the Software without restriction, including without limitation
7 + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
8 + * and/or sell copies of the Software, and to permit persons to whom the
9 + * Software is furnished to do so, subject to the following conditions:
10 + *
11 + * The above copyright notice and this permission notice shall be included in
12 + * all copies or substantial portions of the Software.
13 + *
14 + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
17 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
19 + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
20 + * DEALINGS IN THE SOFTWARE.
21 + */
22 +
23 +#pragma once
24 +
25 +#include <NvInfer.h>
26 +
27 +#include <vector>
28 +#include <cassert>
29 +
30 +#include "../cuda/nms.h"
31 +
32 +using namespace nvinfer1;
33 +
34 +#define RETINANET_PLUGIN_NAME "RetinaNetNMS"
35 +#define RETINANET_PLUGIN_VERSION "1"
36 +#define RETINANET_PLUGIN_NAMESPACE ""
37 +
38 +namespace odtk {
39 +
40 +class NMSPlugin : public IPluginV2DynamicExt {
41 + float _nms_thresh;
42 + int _detections_per_im;
43 +
44 + size_t _count;
45 + mutable int size = -1;
46 +
47 +protected:
48 + void deserialize(void const* data, size_t length) {
49 + const char* d = static_cast<const char*>(data);
50 + read(d, _nms_thresh);
51 + read(d, _detections_per_im);
52 + read(d, _count);
53 + }
54 +
55 + size_t getSerializationSize() const override {
56 + return sizeof(_nms_thresh) + sizeof(_detections_per_im)
57 + + sizeof(_count);
58 + }
59 +
60 + void serialize(void *buffer) const override {
61 + char* d = static_cast<char*>(buffer);
62 + write(d, _nms_thresh);
63 + write(d, _detections_per_im);
64 + write(d, _count);
65 + }
66 +
67 +public:
68 + NMSPlugin(float nms_thresh, int detections_per_im)
69 + : _nms_thresh(nms_thresh), _detections_per_im(detections_per_im) {
70 + assert(nms_thresh > 0);
71 + assert(detections_per_im > 0);
72 + }
73 +
74 + NMSPlugin(float nms_thresh, int detections_per_im, size_t count)
75 + : _nms_thresh(nms_thresh), _detections_per_im(detections_per_im), _count(count) {
76 + assert(nms_thresh > 0);
77 + assert(detections_per_im > 0);
78 + assert(count > 0);
79 + }
80 +
81 + NMSPlugin(void const* data, size_t length) {
82 + this->deserialize(data, length);
83 + }
84 +
85 + const char *getPluginType() const override {
86 + return RETINANET_PLUGIN_NAME;
87 + }
88 +
89 + const char *getPluginVersion() const override {
90 + return RETINANET_PLUGIN_VERSION;
91 + }
92 +
93 + int getNbOutputs() const override {
94 + return 3;
95 + }
96 +
97 + DimsExprs getOutputDimensions(int outputIndex, const DimsExprs *inputs,
98 + int nbInputs, IExprBuilder &exprBuilder) override
99 + {
100 + DimsExprs output(inputs[0]);
101 + output.d[1] = exprBuilder.constant(_detections_per_im * (outputIndex == 1 ? 4 : 1));
102 + output.d[2] = exprBuilder.constant(1);
103 + output.d[3] = exprBuilder.constant(1);
104 + return output;
105 + }
106 +
107 + bool supportsFormatCombination(int pos, const PluginTensorDesc *inOut,
108 + int nbInputs, int nbOutputs) override
109 + {
110 + assert(nbInputs == 3);
111 + assert(nbOutputs == 3);
112 + assert(pos < 6);
113 + return inOut[pos].type == DataType::kFLOAT && inOut[pos].format == nvinfer1::PluginFormat::kLINEAR;
114 + }
115 +
116 + int initialize() override { return 0; }
117 +
118 + void terminate() override {}
119 +
120 + size_t getWorkspaceSize(const PluginTensorDesc *inputs,
121 + int nbInputs, const PluginTensorDesc *outputs, int nbOutputs) const override
122 + {
123 + if (size < 0) {
124 + size = cuda::nms(inputs->dims.d[0], nullptr, nullptr, _count,
125 + _detections_per_im, _nms_thresh,
126 + nullptr, 0, nullptr);
127 + }
128 + return size;
129 + }
130 +
131 + int enqueue(const PluginTensorDesc *inputDesc,
132 + const PluginTensorDesc *outputDesc, const void *const *inputs,
133 + void *const *outputs, void *workspace, cudaStream_t stream)
134 + {
135 + return cuda::nms(inputDesc->dims.d[0], inputs, outputs, _count,
136 + _detections_per_im, _nms_thresh,
137 + workspace, getWorkspaceSize(inputDesc, 3, outputDesc, 3), stream);
138 + }
139 +
140 + void destroy() override {
141 + delete this;
142 + }
143 +
144 + const char *getPluginNamespace() const override {
145 + return RETINANET_PLUGIN_NAMESPACE;
146 + }
147 +
148 + void setPluginNamespace(const char *N) override {}
149 +
150 + DataType getOutputDataType(int index, const DataType* inputTypes, int nbInputs) const
151 + {
152 + assert(index < 3);
153 + return DataType::kFLOAT;
154 + }
155 +
156 + void configurePlugin(const DynamicPluginTensorDesc *in, int nbInputs,
157 + const DynamicPluginTensorDesc *out, int nbOutputs)
158 + {
159 + assert(nbInputs == 3);
160 + assert(in[0].desc.dims.d[1] == in[2].desc.dims.d[1]);
161 + assert(in[1].desc.dims.d[1] == in[2].desc.dims.d[1] * 4);
162 + _count = in[0].desc.dims.d[1];
163 + }
164 +
165 + IPluginV2DynamicExt *clone() const {
166 + return new NMSPlugin(_nms_thresh, _detections_per_im, _count);
167 + }
168 +
169 +
170 +private:
171 + template<typename T> void write(char*& buffer, const T& val) const {
172 + *reinterpret_cast<T*>(buffer) = val;
173 + buffer += sizeof(T);
174 + }
175 +
176 + template<typename T> void read(const char*& buffer, T& val) {
177 + val = *reinterpret_cast<const T*>(buffer);
178 + buffer += sizeof(T);
179 + }
180 +};
181 +
182 +class NMSPluginCreator : public IPluginCreator {
183 +public:
184 + NMSPluginCreator() {}
185 +
186 + const char *getPluginNamespace() const override {
187 + return RETINANET_PLUGIN_NAMESPACE;
188 + }
189 + const char *getPluginName () const override {
190 + return RETINANET_PLUGIN_NAME;
191 + }
192 +
193 + const char *getPluginVersion () const override {
194 + return RETINANET_PLUGIN_VERSION;
195 + }
196 +
197 + //Was IPluginV2
198 + IPluginV2DynamicExt *deserializePlugin (const char *name, const void *serialData, size_t serialLength) override {
199 + return new NMSPlugin(serialData, serialLength);
200 + }
201 +
202 + //Was IPluginV2
203 + void setPluginNamespace(const char *N) override {}
204 + const PluginFieldCollection *getFieldNames() override { return nullptr; }
205 + IPluginV2DynamicExt *createPlugin (const char *name, const PluginFieldCollection *fc) override { return nullptr; }
206 +};
207 +
208 +REGISTER_TENSORRT_PLUGIN(NMSPluginCreator);
209 +
210 +}
211 +
212 +#undef RETINANET_PLUGIN_NAME
213 +#undef RETINANET_PLUGIN_VERSION
214 +#undef RETINANET_PLUGIN_NAMESPACE
...\ No newline at end of file ...\ No newline at end of file
1 +/*
2 + * Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
3 + *
4 + * Permission is hereby granted, free of charge, to any person obtaining a
5 + * copy of this software and associated documentation files (the "Software"),
6 + * to deal in the Software without restriction, including without limitation
7 + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
8 + * and/or sell copies of the Software, and to permit persons to whom the
9 + * Software is furnished to do so, subject to the following conditions:
10 + *
11 + * The above copyright notice and this permission notice shall be included in
12 + * all copies or substantial portions of the Software.
13 + *
14 + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
17 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
19 + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
20 + * DEALINGS IN THE SOFTWARE.
21 + */
22 +
23 +#pragma once
24 +
25 +#include <NvInfer.h>
26 +
27 +#include <vector>
28 +#include <cassert>
29 +
30 +#include "../cuda/nms_iou.h"
31 +
32 +using namespace nvinfer1;
33 +
34 +#define RETINANET_PLUGIN_NAME "RetinaNetNMSRotate"
35 +#define RETINANET_PLUGIN_VERSION "1"
36 +#define RETINANET_PLUGIN_NAMESPACE ""
37 +
38 +namespace odtk {
39 +
40 +class NMSRotatePlugin : public IPluginV2DynamicExt {
41 + float _nms_thresh;
42 + int _detections_per_im;
43 +
44 + size_t _count;
45 + mutable int size = -1;
46 +
47 +protected:
48 + void deserialize(void const* data, size_t length) {
49 + const char* d = static_cast<const char*>(data);
50 + read(d, _nms_thresh);
51 + read(d, _detections_per_im);
52 + read(d, _count);
53 + }
54 +
55 + size_t getSerializationSize() const override {
56 + return sizeof(_nms_thresh) + sizeof(_detections_per_im)
57 + + sizeof(_count);
58 + }
59 +
60 + void serialize(void *buffer) const override {
61 + char* d = static_cast<char*>(buffer);
62 + write(d, _nms_thresh);
63 + write(d, _detections_per_im);
64 + write(d, _count);
65 + }
66 +
67 +public:
68 + NMSRotatePlugin(float nms_thresh, int detections_per_im)
69 + : _nms_thresh(nms_thresh), _detections_per_im(detections_per_im) {
70 + assert(nms_thresh > 0);
71 + assert(detections_per_im > 0);
72 + }
73 +
74 + NMSRotatePlugin(float nms_thresh, int detections_per_im, size_t count)
75 + : _nms_thresh(nms_thresh), _detections_per_im(detections_per_im), _count(count) {
76 + assert(nms_thresh > 0);
77 + assert(detections_per_im > 0);
78 + assert(count > 0);
79 + }
80 +
81 + NMSRotatePlugin(void const* data, size_t length) {
82 + this->deserialize(data, length);
83 + }
84 +
85 + const char *getPluginType() const override {
86 + return RETINANET_PLUGIN_NAME;
87 + }
88 +
89 + const char *getPluginVersion() const override {
90 + return RETINANET_PLUGIN_VERSION;
91 + }
92 +
93 + int getNbOutputs() const override {
94 + return 3;
95 + }
96 +
97 + DimsExprs getOutputDimensions(int outputIndex, const DimsExprs *inputs,
98 + int nbInputs, IExprBuilder &exprBuilder) override
99 + {
100 + DimsExprs output(inputs[0]);
101 + output.d[1] = exprBuilder.constant(_detections_per_im * (outputIndex == 1 ? 6 : 1));
102 + output.d[2] = exprBuilder.constant(1);
103 + output.d[3] = exprBuilder.constant(1);
104 + return output;
105 + }
106 +
107 + bool supportsFormatCombination(int pos, const PluginTensorDesc *inOut,
108 + int nbInputs, int nbOutputs) override
109 + {
110 + assert(nbInputs == 3);
111 + assert(nbOutputs == 3);
112 + assert(pos < 6);
113 + return inOut[pos].type == DataType::kFLOAT && inOut[pos].format == nvinfer1::PluginFormat::kLINEAR;
114 + }
115 +
116 + int initialize() override { return 0; }
117 +
118 + void terminate() override {}
119 +
120 + size_t getWorkspaceSize(const PluginTensorDesc *inputs,
121 + int nbInputs, const PluginTensorDesc *outputs, int nbOutputs) const override
122 + {
123 + if (size < 0) {
124 + size = cuda::nms_rotate(inputs->dims.d[0], nullptr, nullptr, _count,
125 + _detections_per_im, _nms_thresh,
126 + nullptr, 0, nullptr);
127 + }
128 + return size;
129 + }
130 +
131 + int enqueue(const PluginTensorDesc *inputDesc,
132 + const PluginTensorDesc *outputDesc, const void *const *inputs,
133 + void *const *outputs, void *workspace, cudaStream_t stream) override
134 + {
135 + return cuda::nms_rotate(inputDesc->dims.d[0], inputs, outputs, _count,
136 + _detections_per_im, _nms_thresh,
137 + workspace, getWorkspaceSize(inputDesc, 3, outputDesc, 3), stream);
138 + }
139 +
140 + void destroy() override {
141 + delete this;
142 + }
143 +
144 + const char *getPluginNamespace() const override {
145 + return RETINANET_PLUGIN_NAMESPACE;
146 + }
147 +
148 + void setPluginNamespace(const char *N) override {}
149 +
150 + DataType getOutputDataType(int index, const DataType* inputTypes, int nbInputs) const
151 + {
152 + assert(index < 3);
153 + return DataType::kFLOAT;
154 + }
155 +
156 +
157 + void configurePlugin(const DynamicPluginTensorDesc *in, int nbInputs,
158 + const DynamicPluginTensorDesc *out, int nbOutputs)
159 + {
160 + assert(nbInputs == 3);
161 + assert(in[0].desc.dims.d[1] == in[2].desc.dims.d[1]);
162 + assert(in[1].desc.dims.d[1] == in[2].desc.dims.d[1] * 6);
163 + _count = in[0].desc.dims.d[1];
164 + }
165 +
166 + IPluginV2DynamicExt *clone() const {
167 + return new NMSRotatePlugin(_nms_thresh, _detections_per_im, _count);
168 + }
169 +
170 +private:
171 + template<typename T> void write(char*& buffer, const T& val) const {
172 + *reinterpret_cast<T*>(buffer) = val;
173 + buffer += sizeof(T);
174 + }
175 +
176 + template<typename T> void read(const char*& buffer, T& val) {
177 + val = *reinterpret_cast<const T*>(buffer);
178 + buffer += sizeof(T);
179 + }
180 +};
181 +
182 +class NMSRotatePluginCreator : public IPluginCreator {
183 +public:
184 + NMSRotatePluginCreator() {}
185 +
186 + const char *getPluginNamespace() const override {
187 + return RETINANET_PLUGIN_NAMESPACE;
188 + }
189 + const char *getPluginName () const override {
190 + return RETINANET_PLUGIN_NAME;
191 + }
192 +
193 + const char *getPluginVersion () const override {
194 + return RETINANET_PLUGIN_VERSION;
195 + }
196 +
197 + IPluginV2DynamicExt *deserializePlugin (const char *name, const void *serialData, size_t serialLength) override {
198 + return new NMSRotatePlugin(serialData, serialLength);
199 + }
200 +
201 + void setPluginNamespace(const char *N) override {}
202 + const PluginFieldCollection *getFieldNames() override { return nullptr; }
203 + IPluginV2DynamicExt *createPlugin (const char *name, const PluginFieldCollection *fc) override { return nullptr; }
204 +};
205 +
206 +REGISTER_TENSORRT_PLUGIN(NMSRotatePluginCreator);
207 +
208 +}
209 +
210 +#undef RETINANET_PLUGIN_NAME
211 +#undef RETINANET_PLUGIN_VERSION
212 +#undef RETINANET_PLUGIN_NAMESPACE
1 +cmake_minimum_required(VERSION 3.9 FATAL_ERROR)
2 +
3 +project(odtk_infer LANGUAGES CXX)
4 +set(CMAKE_CXX_STANDARD 14)
5 +find_package(CUDA REQUIRED)
6 +enable_language(CUDA)
7 +find_package(OpenCV REQUIRED)
8 +
9 +if(DEFINED TensorRT_DIR)
10 + include_directories("${TensorRT_DIR}/include")
11 + link_directories("${TensorRT_DIR}/lib")
12 +endif(DEFINED TensorRT_DIR)
13 +include_directories(${CUDA_INCLUDE_DIRS})
14 +
15 +add_library(odtk SHARED
16 + ../../csrc/cuda/decode.h
17 + ../../csrc/cuda/decode.cu
18 + ../../csrc/cuda/nms.h
19 + ../../csrc/cuda/nms.cu
20 + ../../csrc/cuda/decode_rotate.h
21 + ../../csrc/cuda/decode_rotate.cu
22 + ../../csrc/cuda/nms_iou.h
23 + ../../csrc/cuda/nms_iou.cu
24 + ../../csrc/cuda/utils.h
25 + ../../csrc/engine.h
26 + ../../csrc/engine.cpp
27 + ../../csrc/calibrator.h
28 +)
29 +set_target_properties(odtk PROPERTIES
30 + CUDA_RESOLVE_DEVICE_SYMBOLS ON
31 + CUDA_ARCHITECTURES 60 61 70 72 75 80 86
32 +)
33 +include_directories(${OpenCV_INCLUDE_DIRS})
34 +target_link_libraries(odtk PUBLIC nvinfer nvonnxparser ${OpenCV_LIBS})
35 +
36 +add_executable(export export.cpp)
37 +include_directories(${OpenCV_INCLUDE_DIRS})
38 +target_link_libraries(export PRIVATE odtk ${OpenCV_LIBS})
39 +
40 +add_executable(infer infer.cpp)
41 +include_directories(${OpenCV_INCLUDE_DIRS})
42 +target_link_libraries(infer PRIVATE odtk ${OpenCV_LIBS} cuda ${CUDA_LIBRARIES})
43 +
44 +if(CMAKE_SYSTEM_PROCESSOR MATCHES "aarch64")
45 + add_executable(infervideo infervideo.cpp)
46 + include_directories(${OpenCV_INCLUDE_DIRS})
47 + target_link_libraries(infervideo PRIVATE odtk ${OpenCV_LIBS} cuda ${CUDA_LIBRARIES})
48 +endif()
1 +# RetinaNet C++ Inference API - Sample Code
2 +
3 +The C++ API allows you to build a TensorRT engine for inference using the ONNX export of a core model.
4 +
5 +The following shows how to build and run code samples for exporting an ONNX core model (from RetinaNet or other toolkit supporting the same sort of core model structure) to a TensorRT engine and doing inference on images.
6 +
7 +## Building
8 +
9 +Building the example requires the following toolkits and libraries to be set up properly on your system:
10 +* A proper C++ toolchain (MSVS on Windows)
11 +* [CMake](https://cmake.org/download/) version 3.9 or later
12 +* NVIDIA [CUDA](https://developer.nvidia.com/cuda-toolkit)
13 +* NVIDIA [CuDNN](https://developer.nvidia.com/cudnn)
14 +* NVIDIA [TensorRT](https://developer.nvidia.com/tensorrt)
15 +* [OpenCV](https://opencv.org/releases.html)
16 +
17 +### Linux
18 +```bash
19 +mkdir build && cd build
20 +cmake -DCMAKE_CUDA_FLAGS="--expt-extended-lambda -std=c++14" ..
21 +make
22 +```
23 +
24 +### Windows
25 +```bash
26 +mkdir build && cd build
27 +cmake -G "Visual Studio 15 2017" -A x64 -T host=x64,cuda=10.0 -DTensorRT_DIR="C:\path\to\tensorrt" -DOpenCV_DIR="C:\path\to\opencv\build" ..
28 +msbuild odtk_infer.sln
29 +```
30 +
31 +## Running
32 +
33 +If you don't have an ONNX core model, generate one from your RetinaNet model:
34 +```bash
35 +odtk export model.pth model.onnx
36 +```
37 +
38 +Load the ONNX core model and export it to a RetinaNet TensorRT engine (using FP16 precision):
39 +```bash
40 +export{.exe} model.onnx engine.plan
41 +```
42 +
43 +You can also export the ONNX core model to an INT8 TensorRT engine if you have already done INT8 calibration:
44 +```bash
45 +export{.exe} model.onnx engine.plan INT8CalibrationTable
46 +```
47 +
48 +Run a test inference (default output if none provided: "detections.png"):
49 +```bash
50 +infer{.exe} engine.plan image.jpg [<OUTPUT>.png]
51 +```
52 +
53 +Note: make sure the TensorRT, CuDNN and OpenCV libraries are available in your environment and path.
54 +
55 +We have verified these steps with the following configurations:
56 +* DGX-1V using the provided Docker container (CUDA 10, cuDNN 7.4.2, TensorRT 5.0.2, OpenCV 3.4.3)
57 +* Jetson AGX Xavier with JetPack 4.1.1 Developer Preview (CUDA 10, cuDNN 7.3.1, TensorRT 5.0.3, OpenCV 3.3.1)
58 +
59 +
60 +
61 +
1 +#include <iostream>
2 +#include <stdexcept>
3 +#include <fstream>
4 +#include <vector>
5 +#include <glob.h>
6 +
7 +#include "../../csrc/engine.h"
8 +
9 +#define ROTATED false // Change to true for Rotated Bounding Box export
10 +#define COCO_PATH "/coco/coco2017/val2017" // Path to calibration images
11 +
12 +using namespace std;
13 +
14 +// Sample program to build a TensorRT Engine from an ONNX model from RetinaNet
15 +//
16 +// By default TensorRT will target FP16 precision (supported on Pascal, Volta, and Turing GPUs)
17 +//
18 +// You can optionally provide an INT8CalibrationTable file created during RetinaNet INT8 calibration
19 +// to build a TensorRT engine with INT8 precision
20 +
21 +inline vector<string> glob(int batch){
22 + glob_t glob_result;
23 + string path = string(COCO_PATH);
24 + if(path.back()!='/') path+="/";
25 + glob((path+"*").c_str(), (GLOB_TILDE | GLOB_NOSORT), NULL, &glob_result);
26 + vector<string> calibration_files;
27 + for(int i=0; i<batch; i++){
28 + calibration_files.push_back(string(glob_result.gl_pathv[i]));
29 + }
30 + globfree(&glob_result);
31 + return calibration_files;
32 +}
33 +
34 +int main(int argc, char *argv[]) {
35 + if (argc != 3 && argc != 4) {
36 + cerr << "Usage: " << argv[0] << " core_model.onnx engine.plan {Int8CalibrationTable}" << endl;
37 + return 1;
38 + }
39 +
40 + ifstream onnxFile;
41 + onnxFile.open(argv[1], ios::in | ios::binary);
42 +
43 + if (!onnxFile.good()) {
44 + cerr << "\nERROR: Unable to read specified ONNX model " << argv[1] << endl;
45 + return -1;
46 + }
47 +
48 + onnxFile.seekg (0, onnxFile.end);
49 + size_t size = onnxFile.tellg();
50 + onnxFile.seekg (0, onnxFile.beg);
51 +
52 + auto *buffer = new char[size];
53 + onnxFile.read(buffer, size);
54 + onnxFile.close();
55 +
56 + // Define default RetinaNet parameters to use for TRT export
57 + const vector<int> dynamic_batch_opts{1, 8, 16};
58 + int calibration_batches = 2; // must be >= 1
59 + float score_thresh = 0.05f;
60 + int top_n = 1000;
61 + size_t workspace_size =(1ULL << 30);
62 + float nms_thresh = 0.5;
63 + int detections_per_im = 100;
64 + bool verbose = false;
65 + // Generated from generate_anchors.py
66 + vector<vector<float>> anchors;
67 + if(!ROTATED) {
68 + // Axis-aligned
69 + anchors = {
70 + {-12.0, -12.0, 20.0, 20.0, -7.31, -18.63, 15.31, 26.63, -18.63, -7.31, 26.63, 15.31, -16.16, -16.16, 24.16, 24.16, -10.25, -24.51, 18.25, 32.51, -24.51, -10.25, 32.51, 18.25, -21.4, -21.4, 29.4, 29.4, -13.96, -31.92, 21.96, 39.92, -31.92, -13.96, 39.92, 21.96},
71 + {-24.0, -24.0, 40.0, 40.0, -14.63, -37.25, 30.63, 53.25, -37.25, -14.63, 53.25, 30.63, -32.32, -32.32, 48.32, 48.32, -20.51, -49.02, 36.51, 65.02, -49.02, -20.51, 65.02, 36.51, -42.8, -42.8, 58.8, 58.8, -27.92, -63.84, 43.92, 79.84, -63.84, -27.92, 79.84, 43.92},
72 + {-48.0, -48.0, 80.0, 80.0, -29.25, -74.51, 61.25, 106.51, -74.51, -29.25, 106.51, 61.25, -64.63, -64.63, 96.63, 96.63, -41.02, -98.04, 73.02, 130.04, -98.04, -41.02, 130.04, 73.02, -85.59, -85.59, 117.59, 117.59, -55.84, -127.68, 87.84, 159.68, -127.68, -55.84, 159.68, 87.84},
73 + {-96.0, -96.0, 160.0, 160.0, -58.51, -149.02, 122.51, 213.02, -149.02, -58.51, 213.02, 122.51, -129.27, -129.27, 193.27, 193.27, -82.04, -196.07, 146.04, 260.07, -196.07, -82.04, 260.07, 146.04, -171.19, -171.19, 235.19, 235.19, -111.68, -255.35, 175.68, 319.35, -255.35, -111.68, 319.35, 175.68},
74 + {-192.0, -192.0, 320.0, 320.0, -117.02, -298.04, 245.02, 426.04, -298.04, -117.02, 426.04, 245.02, -258.54, -258.54, 386.54, 386.54, -164.07, -392.14, 292.07, 520.14, -392.14, -164.07, 520.14, 292.07, -342.37, -342.37, 470.37, 470.37, -223.35, -510.7, 351.35, 638.7, -510.7, -223.35, 638.7, 351.35}
75 + };
76 + }
77 + else {
78 + // Rotated-bboxes
79 + anchors = {
80 + {-12.0, 0.0, 19.0, 7.0, -7.0, -2.0, 14.0, 9.0, -4.0, -4.0, 11.0, 11.0, -2.0, -8.0, 9.0, 15.0, 0.0, -12.0, 7.0, 19.0, -21.4, -2.35, 28.4, 9.35, -13.46, -5.52, 20.46, 12.52, -8.7, -8.7, 15.7, 15.7, -5.52, -15.05, 12.52, 22.05, -2.35, -21.4, 9.35, 28.4, -36.32, -6.08, 43.32, 13.08, -23.72, -11.12, 30.72, 18.12, -16.16, -16.16, 23.16, 23.16, -11.12, -26.24, 18.12, 33.24, -6.08, -36.32, 13.08, 43.32, -12.0, 0.0, 19.0, 7.0, -7.0, -2.0, 14.0, 9.0, -4.0, -4.0, 11.0, 11.0, -2.0, -8.0, 9.0, 15.0, 0.0, -12.0, 7.0, 19.0, -21.4, -2.35, 28.4, 9.35, -13.46, -5.52, 20.46, 12.52, -8.7, -8.7, 15.7, 15.7, -5.52, -15.05, 12.52, 22.05, -2.35, -21.4, 9.35, 28.4, -36.32, -6.08, 43.32, 13.08, -23.72, -11.12, 30.72, 18.12, -16.16, -16.16, 23.16, 23.16, -11.12, -26.24, 18.12, 33.24, -6.08, -36.32, 13.08, 43.32, -12.0, 0.0, 19.0, 7.0, -7.0, -2.0, 14.0, 9.0, -4.0, -4.0, 11.0, 11.0, -2.0, -8.0, 9.0, 15.0, 0.0, -12.0, 7.0, 19.0, -21.4, -2.35, 28.4, 9.35, -13.46, -5.52, 20.46, 12.52, -8.7, -8.7, 15.7, 15.7, -5.52, -15.05, 12.52, 22.05, -2.35, -21.4, 9.35, 28.4, -36.32, -6.08, 43.32, 13.08, -23.72, -11.12, 30.72, 18.12, -16.16, -16.16, 23.16, 23.16, -11.12, -26.24, 18.12, 33.24, -6.08, -36.32, 13.08, 43.32},
81 + {-24.0, 0.0, 39.0, 15.0, -15.0, -4.0, 30.0, 19.0, -8.0, -8.0, 23.0, 23.0, -3.0, -14.0, 18.0, 29.0, 0.0, -24.0, 15.0, 39.0, -42.8, -4.7, 57.8, 19.7, -28.51, -11.05, 43.51, 26.05, -17.4, -17.4, 32.4, 32.4, -9.46, -26.92, 24.46, 41.92, -4.7, -42.8, 19.7, 57.8, -72.63, -12.16, 87.63, 27.16, -49.96, -22.24, 64.96, 37.24, -32.32, -32.32, 47.32, 47.32, -19.72, -47.44, 34.72, 62.44, -12.16, -72.63, 27.16, 87.63, -24.0, 0.0, 39.0, 15.0, -15.0, -4.0, 30.0, 19.0, -8.0, -8.0, 23.0, 23.0, -3.0, -14.0, 18.0, 29.0, 0.0, -24.0, 15.0, 39.0, -42.8, -4.7, 57.8, 19.7, -28.51, -11.05, 43.51, 26.05, -17.4, -17.4, 32.4, 32.4, -9.46, -26.92, 24.46, 41.92, -4.7, -42.8, 19.7, 57.8, -72.63, -12.16, 87.63, 27.16, -49.96, -22.24, 64.96, 37.24, -32.32, -32.32, 47.32, 47.32, -19.72, -47.44, 34.72, 62.44, -12.16, -72.63, 27.16, 87.63, -24.0, 0.0, 39.0, 15.0, -15.0, -4.0, 30.0, 19.0, -8.0, -8.0, 23.0, 23.0, -3.0, -14.0, 18.0, 29.0, 0.0, -24.0, 15.0, 39.0, -42.8, -4.7, 57.8, 19.7, -28.51, -11.05, 43.51, 26.05, -17.4, -17.4, 32.4, 32.4, -9.46, -26.92, 24.46, 41.92, -4.7, -42.8, 19.7, 57.8, -72.63, -12.16, 87.63, 27.16, -49.96, -22.24, 64.96, 37.24, -32.32, -32.32, 47.32, 47.32, -19.72, -47.44, 34.72, 62.44, -12.16, -72.63, 27.16, 87.63},
82 + {-48.0, 0.0, 79.0, 31.0, -29.0, -6.0, 60.0, 37.0, -16.0, -16.0, 47.0, 47.0, -7.0, -30.0, 38.0, 61.0, 0.0, -48.0, 31.0, 79.0, -85.59, -9.4, 116.59, 40.4, -55.43, -18.92, 86.43, 49.92, -34.8, -34.8, 65.8, 65.8, -20.51, -57.02, 51.51, 88.02, -9.4, -85.59, 40.4, 116.59, -145.27, -24.32, 176.27, 55.32, -97.39, -39.44, 128.39, 70.44, -64.63, -64.63, 95.63, 95.63, -41.96, -99.91, 72.96, 130.91, -24.32, -145.27, 55.32, 176.27, -48.0, 0.0, 79.0, 31.0, -29.0, -6.0, 60.0, 37.0, -16.0, -16.0, 47.0, 47.0, -7.0, -30.0, 38.0, 61.0, 0.0, -48.0, 31.0, 79.0, -85.59, -9.4, 116.59, 40.4, -55.43, -18.92, 86.43, 49.92, -34.8, -34.8, 65.8, 65.8, -20.51, -57.02, 51.51, 88.02, -9.4, -85.59, 40.4, 116.59, -145.27, -24.32, 176.27, 55.32, -97.39, -39.44, 128.39, 70.44, -64.63, -64.63, 95.63, 95.63, -41.96, -99.91, 72.96, 130.91, -24.32, -145.27, 55.32, 176.27, -48.0, 0.0, 79.0, 31.0, -29.0, -6.0, 60.0, 37.0, -16.0, -16.0, 47.0, 47.0, -7.0, -30.0, 38.0, 61.0, 0.0, -48.0, 31.0, 79.0, -85.59, -9.4, 116.59, 40.4, -55.43, -18.92, 86.43, 49.92, -34.8, -34.8, 65.8, 65.8, -20.51, -57.02, 51.51, 88.02, -9.4, -85.59, 40.4, 116.59, -145.27, -24.32, 176.27, 55.32, -97.39, -39.44, 128.39, 70.44, -64.63, -64.63, 95.63, 95.63, -41.96, -99.91, 72.96, 130.91, -24.32, -145.27, 55.32, 176.27},
83 + {-96.0, 0.0, 159.0, 63.0, -59.0, -14.0, 122.0, 77.0, -32.0, -32.0, 95.0, 95.0, -13.0, -58.0, 76.0, 121.0, 0.0, -96.0, 63.0, 159.0, -171.19, -18.8, 234.19, 81.8, -112.45, -41.02, 175.45, 104.02, -69.59, -69.59, 132.59, 132.59, -39.43, -110.87, 102.43, 173.87, -18.8, -171.19, 81.8, 234.19, -290.54, -48.63, 353.54, 111.63, -197.31, -83.91, 260.31, 146.91, -129.27, -129.27, 192.27, 192.27, -81.39, -194.79, 144.39, 257.79, -48.63, -290.54, 111.63, 353.54, -96.0, 0.0, 159.0, 63.0, -59.0, -14.0, 122.0, 77.0, -32.0, -32.0, 95.0, 95.0, -13.0, -58.0, 76.0, 121.0, 0.0, -96.0, 63.0, 159.0, -171.19, -18.8, 234.19, 81.8, -112.45, -41.02, 175.45, 104.02, -69.59, -69.59, 132.59, 132.59, -39.43, -110.87, 102.43, 173.87, -18.8, -171.19, 81.8, 234.19, -290.54, -48.63, 353.54, 111.63, -197.31, -83.91, 260.31, 146.91, -129.27, -129.27, 192.27, 192.27, -81.39, -194.79, 144.39, 257.79, -48.63, -290.54, 111.63, 353.54, -96.0, 0.0, 159.0, 63.0, -59.0, -14.0, 122.0, 77.0, -32.0, -32.0, 95.0, 95.0, -13.0, -58.0, 76.0, 121.0, 0.0, -96.0, 63.0, 159.0, -171.19, -18.8, 234.19, 81.8, -112.45, -41.02, 175.45, 104.02, -69.59, -69.59, 132.59, 132.59, -39.43, -110.87, 102.43, 173.87, -18.8, -171.19, 81.8, 234.19, -290.54, -48.63, 353.54, 111.63, -197.31, -83.91, 260.31, 146.91, -129.27, -129.27, 192.27, 192.27, -81.39, -194.79, 144.39, 257.79, -48.63, -290.54, 111.63, 353.54},
84 + {-192.0, 0.0, 319.0, 127.0, -117.0, -26.0, 244.0, 153.0, -64.0, -64.0, 191.0, 191.0, -27.0, -118.0, 154.0, 245.0, 0.0, -192.0, 127.0, 319.0, -342.37, -37.59, 469.37, 164.59, -223.32, -78.87, 350.32, 205.87, -139.19, -139.19, 266.19, 266.19, -80.45, -224.91, 207.45, 351.91, -37.59, -342.37, 164.59, 469.37, -581.08, -97.27, 708.08, 224.27, -392.09, -162.79, 519.09, 289.79, -258.54, -258.54, 385.54, 385.54, -165.31, -394.61, 292.31, 521.61, -97.27, -581.08, 224.27, 708.08, -192.0, 0.0, 319.0, 127.0, -117.0, -26.0, 244.0, 153.0, -64.0, -64.0, 191.0, 191.0, -27.0, -118.0, 154.0, 245.0, 0.0, -192.0, 127.0, 319.0, -342.37, -37.59, 469.37, 164.59, -223.32, -78.87, 350.32, 205.87, -139.19, -139.19, 266.19, 266.19, -80.45, -224.91, 207.45, 351.91, -37.59, -342.37, 164.59, 469.37, -581.08, -97.27, 708.08, 224.27, -392.09, -162.79, 519.09, 289.79, -258.54, -258.54, 385.54, 385.54, -165.31, -394.61, 292.31, 521.61, -97.27, -581.08, 224.27, 708.08, -192.0, 0.0, 319.0, 127.0, -117.0, -26.0, 244.0, 153.0, -64.0, -64.0, 191.0, 191.0, -27.0, -118.0, 154.0, 245.0, 0.0, -192.0, 127.0, 319.0, -342.37, -37.59, 469.37, 164.59, -223.32, -78.87, 350.32, 205.87, -139.19, -139.19, 266.19, 266.19, -80.45, -224.91, 207.45, 351.91, -37.59, -342.37, 164.59, 469.37, -581.08, -97.27, 708.08, 224.27, -392.09, -162.79, 519.09, 289.79, -258.54, -258.54, 385.54, 385.54, -165.31, -394.61, 292.31, 521.61, -97.27, -581.08, 224.27, 708.08}
85 + };
86 + }
87 +
88 + // For INT8 calibration, after setting COCO_PATH on line 10:
89 + // const vector<string> calibration_files = glob(calibration_batches*dynamic_batch_opts[1]);
90 + const vector<string> calibration_files;
91 + string model_name = "";
92 + string calibration_table = argc == 4 ? string(argv[3]) : "";
93 +
94 + // Use FP16 precision by default, use INT8 if calibration table is provided
95 + string precision = "FP16";
96 + if (argc == 4)
97 + precision = "INT8";
98 +
99 + cout << "Building engine..." << endl;
100 + auto engine = odtk::Engine(buffer, size, dynamic_batch_opts, precision, score_thresh, top_n,
101 + anchors, ROTATED, nms_thresh, detections_per_im, calibration_files, model_name, calibration_table, verbose, workspace_size);
102 + engine.save(string(argv[2]));
103 +
104 + delete [] buffer;
105 +
106 + return 0;
107 +}
1 +import numpy as np
2 +from odtk.box import generate_anchors, generate_anchors_rotated
3 +
4 +# Generates anchors for export.cpp
5 +
6 +# ratios = [1.0, 2.0, 0.5]
7 +# scales = [4 * 2 ** (i / 3) for i in range(3)]
8 +ratios = [0.25, 0.5, 1.0, 2.0, 4.0]
9 +scales = [2 * 2**(2 * i/3) for i in range(3)]
10 +angles = [-np.pi / 6, 0, np.pi / 6]
11 +strides = [2**i for i in range(3,8)]
12 +
13 +axis = str(np.round([generate_anchors(stride, ratios, scales,
14 + angles).view(-1).tolist() for stride in strides], decimals=2).tolist()
15 + ).replace('[', '{').replace(']', '}').replace('}, ', '},\n')
16 +
17 +rot = str(np.round([generate_anchors_rotated(stride, ratios, scales,
18 + angles)[0].view(-1).tolist() for stride in strides], decimals=2).tolist()
19 + ).replace('[', '{').replace(']', '}').replace('}, ', '},\n')
20 +
21 +print("Axis-aligned:\n"+axis+'\n')
22 +print("Rotated:\n"+rot)
1 +#include <iostream>
2 +#include <stdexcept>
3 +#include <fstream>
4 +#include <vector>
5 +#include <chrono>
6 +
7 +#include <opencv2/opencv.hpp>
8 +#include <opencv2/opencv.hpp>
9 +#include <opencv2/core/core.hpp>
10 +#include <opencv2/highgui/highgui.hpp>
11 +
12 +#include <cuda_runtime.h>
13 +
14 +#include "../../csrc/engine.h"
15 +
16 +using namespace std;
17 +using namespace cv;
18 +
19 +int main(int argc, char *argv[]) {
20 + if (argc<3 || argc>4) {
21 + cerr << "Usage: " << argv[0] << " engine.plan image.jpg [<OUTPUT>.png]" << endl;
22 + return 1;
23 + }
24 +
25 + cout << "Loading engine..." << endl;
26 + auto engine = odtk::Engine(argv[1]);
27 +
28 + cout << "Preparing data..." << endl;
29 + auto image = imread(argv[2], IMREAD_COLOR);
30 + auto inputSize = engine.getInputSize();
31 + cv::resize(image, image, Size(inputSize[1], inputSize[0]));
32 + cv::Mat pixels;
33 + image.convertTo(pixels, CV_32FC3, 1.0 / 255, 0);
34 +
35 + int channels = 3;
36 + vector<float> img;
37 + vector<float> data (channels * inputSize[0] * inputSize[1]);
38 +
39 + if (pixels.isContinuous())
40 + img.assign((float*)pixels.datastart, (float*)pixels.dataend);
41 + else {
42 + cerr << "Error reading image " << argv[2] << endl;
43 + return -1;
44 + }
45 +
46 + vector<float> mean {0.485, 0.456, 0.406};
47 + vector<float> std {0.229, 0.224, 0.225};
48 +
49 + for (int c = 0; c < channels; c++) {
50 + for (int j = 0, hw = inputSize[0] * inputSize[1]; j < hw; j++) {
51 + data[c * hw + j] = (img[channels * j + 2 - c] - mean[c]) / std[c];
52 + }
53 + }
54 +
55 + // Create device buffers
56 + void *data_d, *scores_d, *boxes_d, *classes_d;
57 + auto num_det = engine.getMaxDetections();
58 + cudaMalloc(&data_d, 3 * inputSize[0] * inputSize[1] * sizeof(float));
59 + cudaMalloc(&scores_d, num_det * sizeof(float));
60 + cudaMalloc(&boxes_d, num_det * 4 * sizeof(float));
61 + cudaMalloc(&classes_d, num_det * sizeof(float));
62 +
63 + // Copy image to device
64 + size_t dataSize = data.size() * sizeof(float);
65 + cudaMemcpy(data_d, data.data(), dataSize, cudaMemcpyHostToDevice);
66 +
67 + // Run inference n times
68 + cout << "Running inference..." << endl;
69 + const int count = 100;
70 + auto start = chrono::steady_clock::now();
71 + vector<void *> buffers = { data_d, scores_d, boxes_d, classes_d };
72 + for (int i = 0; i < count; i++) {
73 + engine.infer(buffers, 1);
74 + }
75 + auto stop = chrono::steady_clock::now();
76 + auto timing = chrono::duration_cast<chrono::duration<double>>(stop - start);
77 + cout << "Took " << timing.count() / count << " seconds per inference." << endl;
78 +
79 + cudaFree(data_d);
80 +
81 + // Get back the bounding boxes
82 + unique_ptr<float[]> scores(new float[num_det]);
83 + unique_ptr<float[]> boxes(new float[num_det * 4]);
84 + unique_ptr<float[]> classes(new float[num_det]);
85 + cudaMemcpy(scores.get(), scores_d, sizeof(float) * num_det, cudaMemcpyDeviceToHost);
86 + cudaMemcpy(boxes.get(), boxes_d, sizeof(float) * num_det * 4, cudaMemcpyDeviceToHost);
87 + cudaMemcpy(classes.get(), classes_d, sizeof(float) * num_det, cudaMemcpyDeviceToHost);
88 +
89 + cudaFree(scores_d);
90 + cudaFree(boxes_d);
91 + cudaFree(classes_d);
92 +
93 + for (int i = 0; i < num_det; i++) {
94 + // Show results over confidence threshold
95 + if (scores[i] >= 0.3f) {
96 + float x1 = boxes[i*4+0];
97 + float y1 = boxes[i*4+1];
98 + float x2 = boxes[i*4+2];
99 + float y2 = boxes[i*4+3];
100 + cout << "Found box {" << x1 << ", " << y1 << ", " << x2 << ", " << y2
101 + << "} with score " << scores[i] << " and class " << classes[i] << endl;
102 +
103 + // Draw bounding box on image
104 + cv::rectangle(image, Point(x1, y1), Point(x2, y2), cv::Scalar(0, 255, 0));
105 + }
106 + }
107 +
108 + // Write image
109 + string out_file = argc == 4 ? string(argv[3]) : "detections.png";
110 + cout << "Saving result to " << out_file << endl;
111 + imwrite(out_file, image);
112 +
113 + return 0;
114 +}
1 +#include <iostream>
2 +#include <stdexcept>
3 +#include <fstream>
4 +#include <vector>
5 +#include <chrono>
6 +
7 +#include <opencv2/opencv.hpp>
8 +#include <opencv2/opencv.hpp>
9 +#include <opencv2/core/core.hpp>
10 +#include <opencv2/highgui/highgui.hpp>
11 +
12 +#include <cuda_runtime.h>
13 +
14 +#include "../../csrc/engine.h"
15 +
16 +using namespace std;
17 +using namespace cv;
18 +
19 +int main(int argc, char *argv[]) {
20 + if (argc != 4) {
21 + cerr << "Usage: " << argv[0] << " engine.plan input.mov output.mp4" << endl;
22 + return 1;
23 + }
24 +
25 + cout << "Loading engine..." << endl;
26 + auto engine = odtk::Engine(argv[1]);
27 + VideoCapture src(argv[2]);
28 +
29 + if (!src.isOpened()){
30 + cerr << "Could not read " << argv[2] << endl;
31 + return 1;
32 + }
33 +
34 + auto fh=src.get(CAP_PROP_FRAME_HEIGHT);
35 + auto fw=src.get(CAP_PROP_FRAME_WIDTH);
36 + auto fps=src.get(CAP_PROP_FPS);
37 + auto nframes=src.get(CAP_PROP_FRAME_COUNT);
38 +
39 + VideoWriter sink;
40 + sink.open(argv[3], 0x31637661, fps, Size(fw, fh));
41 + Mat frame;
42 + Mat resized_frame;
43 + Mat inferred_frame;
44 + int count=1;
45 +
46 + auto inputSize = engine.getInputSize();
47 + // Create device buffers
48 + void *data_d, *scores_d, *boxes_d, *classes_d;
49 + auto num_det = engine.getMaxDetections();
50 + cudaMalloc(&data_d, 3 * inputSize[0] * inputSize[1] * sizeof(float));
51 + cudaMalloc(&scores_d, num_det * sizeof(float));
52 + cudaMalloc(&boxes_d, num_det * 4 * sizeof(float));
53 + cudaMalloc(&classes_d, num_det * sizeof(float));
54 +
55 + unique_ptr<float[]> scores(new float[num_det]);
56 + unique_ptr<float[]> boxes(new float[num_det * 4]);
57 + unique_ptr<float[]> classes(new float[num_det]);
58 +
59 + vector<float> mean {0.485, 0.456, 0.406};
60 + vector<float> std {0.229, 0.224, 0.225};
61 +
62 + vector<uint8_t> blues {0,63,127,191,255,0}; //colors for bonuding boxes
63 + vector<uint8_t> greens {0,255,191,127,63,0};
64 + vector<uint8_t> reds {191,255,0,0,63,127};
65 +
66 + int channels = 3;
67 + vector<float> img;
68 + vector<float> data (channels * inputSize[0] * inputSize[1]);
69 +
70 + while(1)
71 + {
72 + src >> frame;
73 + if (frame.empty()){
74 + cout << "Finished inference!" << endl;
75 + break;
76 + }
77 +
78 + cv::resize(frame, resized_frame, Size(inputSize[1], inputSize[0]));
79 + cv::Mat pixels;
80 + resized_frame.convertTo(pixels, CV_32FC3, 1.0 / 255, 0);
81 +
82 + img.assign((float*)pixels.datastart, (float*)pixels.dataend);
83 +
84 + for (int c = 0; c < channels; c++) {
85 + for (int j = 0, hw = inputSize[0] * inputSize[1]; j < hw; j++) {
86 + data[c * hw + j] = (img[channels * j + 2 - c] - mean[c]) / std[c];
87 + }
88 + }
89 +
90 + // Copy image to device
91 + size_t dataSize = data.size() * sizeof(float);
92 + cudaMemcpy(data_d, data.data(), dataSize, cudaMemcpyHostToDevice);
93 +
94 + //Do inference
95 + cout << "Inferring on frame: " << count <<"/" << nframes << endl;
96 + count++;
97 + vector<void *> buffers = { data_d, scores_d, boxes_d, classes_d };
98 + engine.infer(buffers, 1);
99 +
100 + cudaMemcpy(scores.get(), scores_d, sizeof(float) * num_det, cudaMemcpyDeviceToHost);
101 + cudaMemcpy(boxes.get(), boxes_d, sizeof(float) * num_det * 4, cudaMemcpyDeviceToHost);
102 + cudaMemcpy(classes.get(), classes_d, sizeof(float) * num_det, cudaMemcpyDeviceToHost);
103 +
104 + // Get back the bounding boxes
105 + for (int i = 0; i < num_det; i++) {
106 + // Show results over confidence threshold
107 + if (scores[i] >= 0.2f) {
108 + float x1 = boxes[i*4+0];
109 + float y1 = boxes[i*4+1];
110 + float x2 = boxes[i*4+2];
111 + float y2 = boxes[i*4+3];
112 + int cls=classes[i];
113 + // Draw bounding box on image
114 + cv::rectangle(resized_frame, Point(x1, y1), Point(x2, y2), cv::Scalar(blues[cls], greens[cls], reds[cls]));
115 + }
116 + }
117 + cv::resize(resized_frame, inferred_frame, Size(fw, fh));
118 + sink.write(inferred_frame);
119 + }
120 + src.release();
121 + sink.release();
122 + cudaFree(data_d);
123 + cudaFree(scores_d);
124 + cudaFree(boxes_d);
125 + cudaFree(classes_d);
126 + return 0;
127 +}
1 +# Deploying RetinaNet in DeepStream 4.0
2 +
3 +This shows how to export a trained RetinaNet model to TensorRT and deploy it in a video analytics application using NVIDIA DeepStream 4.0.
4 +
5 +## Prerequisites
6 +* A GPU supported by DeepStream: Jetson Xavier, Tesla P4/P40/V100/T4
7 +* A trained PyTorch RetinaNet model.
8 +* A video source, either `.mp4` files or a webcam.
9 +
10 +## Tesla GPUs
11 +Setup instructions:
12 +
13 +#### 1. Download DeepStream 4.0
14 +Download DeepStream 4.0 SDK for Tesla "Download .tar" from [https://developer.nvidia.com/deepstream-download](https://developer.nvidia.com/deepstream-download) and place in the `extras/deepstream` directory.
15 +
16 +This file should be called `deepstream_sdk_v4.0.2_x86_64.tbz2`.
17 +
18 +#### 2. Unpack DeepStream
19 +You may need to adjust the permissions on the `.tbz2` file before you can extract it.
20 +
21 +```
22 +cd extras/deepstream
23 +mkdir DeepStream_Release
24 +tar -xvf deepstream_sdk_v4.0.2_x86_64.tbz2 -C DeepStream_Release/
25 +```
26 +
27 +#### 3. Build and enter the DeepStream docker container
28 +```
29 +docker build -f <your_path>/retinanet-examples/Dockerfile.deepstream -t ds_odtk:latest <your_path>/retinanet-examples
30 +docker run --gpus all -it --rm --ipc=host -v <dir containing your data>:/data ds_odtk:latest
31 +```
32 +
33 +#### 4. Export your trained PyTorch RetinaNet model to TensorRT per the [INFERENCE](https://github.com/NVIDIA/retinanet-examples/blob/master/INFERENCE.md) instructions:
34 +```
35 +odtk export <PyTorch model> <engine> --batch n
36 +
37 +OR
38 +
39 +odtk export <PyTorch model> <engine> --int8 --calibration-images <example images> --batch n
40 +```
41 +
42 +#### 5. Run deepstream-app
43 +Once all of the config files have been modified, launch the DeepStream application:
44 +```
45 +cd /workspace/retinanet-examples/extras/deepstream/deepstream-sample/
46 +LD_PRELOAD=build/libnvdsparsebbox_odtk.so deepstream-app -c <config file>
47 +```
48 +
49 +## Jetson AGX Xavier
50 +Setup instructions.
51 +
52 +#### 1. Flash Jetson Xavier with [Jetpack 4.3](https://developer.nvidia.com/embedded/jetpack)
53 +
54 +**Ensure that you tick the DeepStream box, under Additional SDKs**
55 +
56 +#### 2. (on host) Covert PyTorch model to ONNX.
57 +
58 +```bash
59 +odtk export model.pth model.onnx
60 +```
61 +
62 +#### 3. Copy ONNX RetinaNet model and config files to Jetson Xavier
63 +
64 +Use `scp` or a memory card.
65 +
66 +#### 4. (on Jetson) Make the C++ API
67 +
68 +```bash
69 +cd extras/cppapi
70 +mkdir build && cd build
71 +cmake -DCMAKE_CUDA_FLAGS="--expt-extended-lambda -std=c++14" ..
72 +make
73 +```
74 +
75 +#### 5. (on Jetson) Make the RetinaNet plugin
76 +
77 +```bash
78 +cd extras/deepstream/deepstream-sample
79 +mkdir build && cd build
80 +cmake -DDeepStream_DIR=/opt/nvidia/deepstream/deepstream-4.0 .. && make -j
81 +```
82 +
83 +#### 6. (on Jetson) Build the TensorRT Engine
84 +
85 +```bash
86 +cd extras/cppapi/build
87 +./export model.onnx engine.plan
88 +```
89 +
90 +#### 7. (on Jetson) Modify the DeepStream config files
91 +As described in the "preparing the DeepStream config file" section below.
92 +
93 +#### 8. (on Jetson) Run deepstream-app
94 +Once all of the config files have been modified, launch the DeepStream application:
95 +```
96 +cd extras/deepstream/deepstream-sample
97 +LD_PRELOAD=build/libnvdsparsebbox_odtk.so deepstream-app -c <config file>
98 +```
99 +
100 +## Preparing the DeepStream config file:
101 +We have included two example DeepStream config files in `deepstream-sample`.
102 +- `ds_config_1vids.txt`: Performs detection on a single video, using the detector specified by `infer_config_batch1.txt`.
103 +- `ds_config_8vids.txt`: Performs detection on multiple video streams simultaneously, using the detector specified by `infer_config_batch8.txt`. Frames from each video are combined into a single batch and passed to the detector for inference.
104 +
105 +The `ds_config_*` files are DeepStream config files. They describe the overall processing. `infer_config_*` files define the individual detectors, which can be chained in series.
106 +
107 +Before they can be used, these config files must be modified to specify the correct paths to the input and output videos files, and the TensorRT engines.
108 +
109 +* **Input files** are specified in the deepstream config files by the `uri=file://<path>` parameter.
110 +
111 +* **Output files** are specified in the deepstream config files by the `output-file=<path>` parameter.
112 +
113 +* **TensorRT engines** are specified in both the DeepStream config files, and also the detector config files, by the `model-engine-file=<path>` parameters.
114 +
115 +On Xavier, you can optionally set `enable=1` to `[sink1]` in `ds_config_*` files to display the processed video stream.
116 +
117 +
118 +## Convert output video file to mp4
119 +You can convert the outputted `.mkv` file to `.mp4` using `ffmpeg`.
120 +```
121 +ffmpeg -i /data/output/file1.mkv -c copy /data/output/file2.mp4
122 +```
1 +cmake_minimum_required(VERSION 3.5.1)
2 +
3 +project(deepstream-odtk)
4 +enable_language(CXX)
5 +include(FindCUDA)
6 +
7 +set(CMAKE_CXX_STANDARD 14)
8 +find_package(CUDA REQUIRED)
9 +find_package(OpenCV REQUIRED)
10 +
11 +if(DEFINED TensorRT_DIR)
12 + include_directories("${TensorRT_DIR}/include")
13 + link_directories("${TensorRT_DIR}/lib")
14 +endif(DEFINED TensorRT_DIR)
15 +if(DEFINED DeepStream_DIR)
16 + include_directories("${DeepStream_DIR}/sources/includes")
17 +endif(DEFINED DeepStream_DIR)
18 +include_directories(${CUDA_INCLUDE_DIRS})
19 +
20 +if(NOT DEFINED ARCH)
21 + set(ARCH "sm_70")
22 +endif(NOT DEFINED ARCH)
23 +
24 +cuda_add_library(nvdsparsebbox_odtk SHARED
25 + ../../../csrc/cuda/decode.h
26 + ../../../csrc/cuda/decode.cu
27 + ../../../csrc/cuda/nms.h
28 + ../../../csrc/cuda/nms.cu
29 + ../../../csrc/cuda/utils.h
30 + ../../../csrc/engine.cpp
31 + nvdsparsebbox_odtk.cpp
32 + OPTIONS -arch ${ARCH} -std=c++14 --expt-extended-lambda
33 +)
34 +include_directories(${OpenCV_INCLUDE_DIRS})
35 +target_link_libraries(nvdsparsebbox_odtk ${CUDA_LIBRARIES} nvinfer nvinfer_plugin nvonnxparser ${OpenCV_LIBS})
1 +# Copyright (c) 2018 NVIDIA Corporation. All rights reserved.
2 +#
3 +# NVIDIA Corporation and its licensors retain all intellectual property
4 +# and proprietary rights in and to this software, related documentation
5 +# and any modifications thereto. Any use, reproduction, disclosure or
6 +# distribution of this software and related documentation without an express
7 +# license agreement from NVIDIA Corporation is strictly prohibited.
8 +
9 +[application]
10 +enable-perf-measurement=1
11 +perf-measurement-interval-sec=1
12 +
13 +[tiled-display]
14 +enable=0
15 +rows=1
16 +columns=1
17 +width=1280
18 +height=720
19 +gpu-id=0
20 +
21 +[source0]
22 +enable=1
23 +type=2
24 +num-sources=1
25 +uri=file://<path>
26 +gpu-id=0
27 +
28 +[streammux]
29 +gpu-id=0
30 +batch-size=1
31 +batched-push-timeout=-1
32 +## Set muxer output width and height
33 +width=1280
34 +height=720
35 +cuda-memory-type=1
36 +enable-padding=1
37 +
38 +[sink0]
39 +enable=1
40 +type=3
41 +#1=mp4 2=mkv
42 +container=1
43 +#1=h264 2=h265 3=mpeg4
44 +## only SW mpeg4 is supported right now.
45 +codec=3
46 +sync=1
47 +bitrate=80000000
48 +output-file=<path>
49 +source-id=0
50 +
51 +[sink1]
52 +enable=0
53 +#Type - 1=FakeSink 2=EglSink 3=File
54 +type=2
55 +sync=1
56 +source-id=0
57 +gpu-id=0
58 +cuda-memory-type=1
59 +
60 +
61 +[osd]
62 +enable=1
63 +gpu-id=0
64 +border-width=2
65 +text-size=12
66 +text-color=1;1;1;1;
67 +text-bg-color=0.3;0.3;0.3;1
68 +font=Arial
69 +show-clock=0
70 +clock-x-offset=800
71 +clock-y-offset=820
72 +clock-text-size=12
73 +clock-color=1;0;0;0
74 +
75 +[primary-gie]
76 +enable=1
77 +gpu-id=0
78 +batch-size=1
79 +gie-unique-id=1
80 +interval=0
81 +labelfile-path=labels_coco.txt
82 +model-engine-file=<path>
83 +config-file=infer_config_batch1.txt
1 +# Copyright (c) 2018 NVIDIA Corporation. All rights reserved.
2 +#
3 +# NVIDIA Corporation and its licensors retain all intellectual property
4 +# and proprietary rights in and to this software, related documentation
5 +# and any modifications thereto. Any use, reproduction, disclosure or
6 +# distribution of this software and related documentation without an express
7 +# license agreement from NVIDIA Corporation is strictly prohibited.
8 +
9 +[application]
10 +enable-perf-measurement=1
11 +perf-measurement-interval-sec=5
12 +
13 +[tiled-display]
14 +enable=1
15 +rows=2
16 +columns=4
17 +width=1280
18 +height=720
19 +gpu-id=0
20 +cuda-memory-type=1
21 +
22 +[source0]
23 +enable=1
24 +type=3
25 +num-sources=4
26 +uri=file://<path>
27 +gpu-id=0
28 +cuda-memory-type=1
29 +
30 +[source1]
31 +enable=1
32 +type=3
33 +num-sources=4
34 +uri=file://<path>
35 +gpu-id=0
36 +cuda-memory-type=1
37 +
38 +[streammux]
39 +gpu-id=0
40 +batched-push-timeout=-1
41 +## Set muxer output width and height
42 +width=1280
43 +height=720
44 +cuda-memory-type=1
45 +enable-padding=1
46 +batch-size=8
47 +
48 +[sink0]
49 +enable=1
50 +type=3
51 +#1=mp4 2=mkv
52 +container=1
53 +#1=h264 2=h265 3=mpeg4
54 +## only SW mpeg4 is supported right now.
55 +codec=3
56 +sync=0
57 +bitrate=32000000
58 +output-file=<path>
59 +source-id=0
60 +cuda-memory-type=1
61 +
62 +[sink1]
63 +enable=0
64 +#Type - 1=FakeSink 2=EglSink 3=File
65 +type=2
66 +sync=1
67 +source-id=0
68 +gpu-id=0
69 +cuda-memory-type=1
70 +
71 +
72 +[osd]
73 +enable=1
74 +gpu-id=0
75 +border-width=2
76 +text-size=12
77 +text-color=1;1;1;1;
78 +text-bg-color=0.3;0.3;0.3;1
79 +font=Arial
80 +show-clock=0
81 +clock-x-offset=800
82 +clock-y-offset=820
83 +clock-text-size=12
84 +clock-color=1;0;0;0
85 +
86 +[primary-gie]
87 +enable=1
88 +gpu-id=0
89 +batch-size=8
90 +gie-unique-id=1
91 +interval=0
92 +labelfile-path=labels_coco.txt
93 +model-engine-file=<path>
94 +config-file=infer_config_batch8.txt
95 +cuda-memory-type=1
1 +# Copyright (c) 2018 NVIDIA Corporation. All rights reserved.
2 +# NVIDIA Corporation and its licensors retain all intellectual property
3 +# and proprietary rights in and to this software, related documentation
4 +# and any modifications thereto. Any use, reproduction, disclosure or
5 +# distribution of this software and related documentation without an express
6 +# license agreement from NVIDIA Corporation is strictly prohibited.
7 +
8 +# Following properties are mandatory when engine files are not specified:
9 +# int8-calib-file(Only in INT8)
10 +# Caffemodel mandatory properties: model-file, proto-file, output-blob-names
11 +# UFF: uff-file, input-dims, uff-input-blob-name, output-blob-names
12 +# ONNX: onnx-file
13 +#
14 +# Mandatory properties for detectors:
15 +# parse-func, num-detected-classes,
16 +# custom-lib-path (when parse-func=0 i.e. custom),
17 +# parse-bbox-func-name (when parse-func=0)
18 +#
19 +# Optional properties for detectors:
20 +# enable-dbscan(Default=false), interval(Primary mode only, Default=0)
21 +#
22 +# Mandatory properties for classifiers:
23 +# classifier-threshold, is-classifier
24 +#
25 +# Optional properties for classifiers:
26 +# classifier-async-mode(Secondary mode only, Default=false)
27 +#
28 +# Optional properties in secondary mode:
29 +# operate-on-gie-id(Default=0), operate-on-class-ids(Defaults to all classes),
30 +# input-object-min-width, input-object-min-height, input-object-max-width,
31 +# input-object-max-height
32 +#
33 +# Following properties are always recommended:
34 +# batch-size(Default=1)
35 +#
36 +# Other optional properties:
37 +# net-scale-factor(Default=1), network-mode(Default=0 i.e FP32),
38 +# model-color-format(Default=0 i.e. RGB) model-engine-file, labelfile-path,
39 +# mean-file, gie-unique-id(Default=0), offsets, gie-mode (Default=1 i.e. primary),
40 +# custom-lib-path, network-mode(Default=0 i.e FP32)
41 +#
42 +# The values in the config file are overridden by values set through GObject
43 +# properties.
44 +
45 +[property]
46 +gpu-id=0
47 +net-scale-factor=0.017352074
48 +offsets=123.675;116.28;103.53
49 +model-engine-file=<path>
50 +labelfile-path=labels_coco.txt
51 +batch-size=1
52 +## 0=FP32, 1=INT8, 2=FP16 mode
53 +network-mode=2
54 +num-detected-classes=80
55 +interval=0
56 +gie-unique-id=1
57 +parse-func=0
58 +is-classifier=0
59 +output-blob-names=boxes;scores;classes
60 +parse-bbox-func-name=NvDsInferParseRetinaNet
61 +custom-lib-path=build/libnvdsparsebbox_odtk.so
62 +#enable-dbscan=1
63 +
64 +
65 +[class-attrs-all]
66 +threshold=0.5
67 +group-threshold=0
68 +## Set eps=0.7 and minBoxes for enable-dbscan=1
69 +#eps=0.2
70 +##minBoxes=3
71 +#roi-top-offset=0
72 +#roi-bottom-offset=0
73 +detected-min-w=4
74 +detected-min-h=4
75 +#detected-max-w=0
76 +#detected-max-h=0
77 +
78 +## Per class configuration
79 +#[class-attrs-2]
80 +#threshold=0.6
81 +#eps=0.5
82 +#group-threshold=3
83 +#roi-top-offset=20
84 +#roi-bottom-offset=10
85 +#detected-min-w=40
86 +#detected-min-h=40
87 +#detected-max-w=400
88 +#detected-max-h=800
1 +# Copyright (c) 2018 NVIDIA Corporation. All rights reserved.
2 +# NVIDIA Corporation and its licensors retain all intellectual property
3 +# and proprietary rights in and to this software, related documentation
4 +# and any modifications thereto. Any use, reproduction, disclosure or
5 +# distribution of this software and related documentation without an express
6 +# license agreement from NVIDIA Corporation is strictly prohibited.
7 +
8 +# Following properties are mandatory when engine files are not specified:
9 +# int8-calib-file(Only in INT8)
10 +# Caffemodel mandatory properties: model-file, proto-file, output-blob-names
11 +# UFF: uff-file, input-dims, uff-input-blob-name, output-blob-names
12 +# ONNX: onnx-file
13 +#
14 +# Mandatory properties for detectors:
15 +# parse-func, num-detected-classes,
16 +# custom-lib-path (when parse-func=0 i.e. custom),
17 +# parse-bbox-func-name (when parse-func=0)
18 +#
19 +# Optional properties for detectors:
20 +# enable-dbscan(Default=false), interval(Primary mode only, Default=0)
21 +#
22 +# Mandatory properties for classifiers:
23 +# classifier-threshold, is-classifier
24 +#
25 +# Optional properties for classifiers:
26 +# classifier-async-mode(Secondary mode only, Default=false)
27 +#
28 +# Optional properties in secondary mode:
29 +# operate-on-gie-id(Default=0), operate-on-class-ids(Defaults to all classes),
30 +# input-object-min-width, input-object-min-height, input-object-max-width,
31 +# input-object-max-height
32 +#
33 +# Following properties are always recommended:
34 +# batch-size(Default=1)
35 +#
36 +# Other optional properties:
37 +# net-scale-factor(Default=1), network-mode(Default=0 i.e FP32),
38 +# model-color-format(Default=0 i.e. RGB) model-engine-file, labelfile-path,
39 +# mean-file, gie-unique-id(Default=0), offsets, gie-mode (Default=1 i.e. primary),
40 +# custom-lib-path, network-mode(Default=0 i.e FP32)
41 +#
42 +# The values in the config file are overridden by values set through GObject
43 +# properties.
44 +
45 +[property]
46 +gpu-id=0
47 +net-scale-factor=0.017352074
48 +offsets=123.675;116.28;103.53
49 +model-engine-file=<path>
50 +labelfile-path=labels_coco.txt
51 +#int8-calib-file=cal_trt4.bin
52 +batch-size=8
53 +## 0=FP32, 1=INT8, 2=FP16 mode
54 +network-mode=2
55 +num-detected-classes=80
56 +interval=0
57 +gie-unique-id=1
58 +parse-func=0
59 +is-classifier=0
60 +output-blob-names=boxes;scores;classes
61 +parse-bbox-func-name=NvDsInferParseRetinaNet
62 +custom-lib-path=build/libnvdsparsebbox_odtk.so
63 +#enable-dbscan=1
64 +
65 +
66 +[class-attrs-all]
67 +threshold=0.5
68 +group-threshold=0
69 +## Set eps=0.7 and minBoxes for enable-dbscan=1
70 +#eps=0.2
71 +##minBoxes=3
72 +#roi-top-offset=0
73 +#roi-bottom-offset=0
74 +detected-min-w=4
75 +detected-min-h=4
76 +#detected-max-w=0
77 +#detected-max-h=0
78 +
79 +## Per class configuration
80 +#[class-attrs-2]
81 +#threshold=0.6
82 +#eps=0.5
83 +#group-threshold=3
84 +#roi-top-offset=20
85 +#roi-bottom-offset=10
86 +#detected-min-w=40
87 +#detected-min-h=40
88 +#detected-max-w=400
89 +#detected-max-h=800
1 +person
2 +bicycle
3 +car
4 +motorcycle
5 +airplane
6 +bus
7 +train
8 +truck
9 +boat
10 +traffic light
11 +fire hydrant
12 +stop sign
13 +parking meter
14 +bench
15 +bird
16 +cat
17 +dog
18 +horse
19 +sheep
20 +cow
21 +elephant
22 +bear
23 +zebra
24 +giraffe
25 +backpack
26 +umbrella
27 +handbag
28 +tie
29 +suitcase
30 +frisbee
31 +skis
32 +snowboard
33 +sports ball
34 +kite
35 +baseball bat
36 +baseball glove
37 +skateboard
38 +surfboard
39 +tennis racket
40 +bottle
41 +wine glass
42 +cup
43 +fork
44 +knife
45 +spoon
46 +bowl
47 +banana
48 +apple
49 +sandwich
50 +orange
51 +broccoli
52 +carrot
53 +hot dog
54 +pizza
55 +donut
56 +cake
57 +chair
58 +couch
59 +potted plant
60 +bed
61 +dining table
62 +toilet
63 +tv
64 +laptop
65 +mouse
66 +remote
67 +keyboard
68 +cell phone
69 +microwave
70 +oven
71 +toaster
72 +sink
73 +refrigerator
74 +book
75 +clock
76 +vase
77 +scissors
78 +teddy bear
79 +hair drier
80 +toothbrush
...\ No newline at end of file ...\ No newline at end of file
1 +/**
2 + * Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
3 + *
4 + * NVIDIA Corporation and its licensors retain all intellectual property
5 + * and proprietary rights in and to this software, related documentation
6 + * and any modifications thereto. Any use, reproduction, disclosure or
7 + * distridbution of this software and related documentation without an express
8 + * license agreement from NVIDIA Corporation is strictly prohibited.
9 + *
10 + */
11 +
12 +#include <cstring>
13 +#include <iostream>
14 +#include "nvdsinfer_custom_impl.h"
15 +
16 +#define MIN(a,b) ((a) < (b) ? (a) : (b))
17 +
18 +/* This is a sample bounding box parsing function for the sample Resnet10
19 + * detector model provided with the SDK. */
20 +
21 +/* C-linkage to prevent name-mangling */
22 +extern "C"
23 +bool NvDsInferParseRetinaNet (std::vector<NvDsInferLayerInfo> const &outputLayersInfo,
24 + NvDsInferNetworkInfo const &networkInfo,
25 + NvDsInferParseDetectionParams const &detectionParams,
26 + std::vector<NvDsInferParseObjectInfo> &objectList)
27 +{
28 + static int bboxLayerIndex = -1;
29 + static int classesLayerIndex = -1;
30 + static int scoresLayerIndex = -1;
31 + static NvDsInferDimsCHW scoresLayerDims;
32 + int numDetsToParse;
33 +
34 + /* Find the bbox layer */
35 + if (bboxLayerIndex == -1) {
36 + for (unsigned int i = 0; i < outputLayersInfo.size(); i++) {
37 + if (strcmp(outputLayersInfo[i].layerName, "boxes") == 0) {
38 + bboxLayerIndex = i;
39 + break;
40 + }
41 + }
42 + if (bboxLayerIndex == -1) {
43 + std::cerr << "Could not find bbox layer buffer while parsing" << std::endl;
44 + return false;
45 + }
46 + }
47 +
48 + /* Find the scores layer */
49 + if (scoresLayerIndex == -1) {
50 + for (unsigned int i = 0; i < outputLayersInfo.size(); i++) {
51 + if (strcmp(outputLayersInfo[i].layerName, "scores") == 0) {
52 + scoresLayerIndex = i;
53 + getDimsCHWFromDims(scoresLayerDims, outputLayersInfo[i].dims);
54 + break;
55 + }
56 + }
57 + if (scoresLayerIndex == -1) {
58 + std::cerr << "Could not find scores layer buffer while parsing" << std::endl;
59 + return false;
60 + }
61 + }
62 +
63 + /* Find the classes layer */
64 + if (classesLayerIndex == -1) {
65 + for (unsigned int i = 0; i < outputLayersInfo.size(); i++) {
66 + if (strcmp(outputLayersInfo[i].layerName, "classes") == 0) {
67 + classesLayerIndex = i;
68 + break;
69 + }
70 + }
71 + if (classesLayerIndex == -1) {
72 + std::cerr << "Could not find classes layer buffer while parsing" << std::endl;
73 + return false;
74 + }
75 + }
76 +
77 +
78 + /* Calculate the number of detections to parse */
79 + numDetsToParse = scoresLayerDims.c;
80 +
81 + float *bboxes = (float *) outputLayersInfo[bboxLayerIndex].buffer;
82 + float *classes = (float *) outputLayersInfo[classesLayerIndex].buffer;
83 + float *scores = (float *) outputLayersInfo[scoresLayerIndex].buffer;
84 +
85 + for (int indx = 0; indx < numDetsToParse; indx++)
86 + {
87 + float outputX1 = bboxes[indx * 4];
88 + float outputY1 = bboxes[indx * 4 + 1];
89 + float outputX2 = bboxes[indx * 4 + 2];
90 + float outputY2 = bboxes[indx * 4 + 3];
91 + float this_class = classes[indx];
92 + float this_score = scores[indx];
93 + float threshold = detectionParams.perClassThreshold[this_class];
94 +
95 + if (this_score >= threshold)
96 + {
97 + NvDsInferParseObjectInfo object;
98 +
99 + object.classId = this_class;
100 + object.detectionConfidence = this_score;
101 +
102 + object.left = outputX1;
103 + object.top = outputY1;
104 + object.width = outputX2 - outputX1;
105 + object.height = outputY2 - outputY1;
106 +
107 + objectList.push_back(object);
108 + }
109 + }
110 + return true;
111 +}
112 +
113 +/* Check that the custom function has been defined correctly */
114 +CHECK_CUSTOM_PARSE_FUNC_PROTOTYPE(NvDsInferParseRetinaNet);
1 +#!/usr/bin/env bash
2 +
3 +if [ $# -ne 2 ]; then
4 + echo "Usage: $0 images_path annotations.json"
5 + exit 1
6 +fi
7 +
8 +tmp="/tmp/odtk"
9 +
10 +tests=(
11 + "odtk train ${tmp}/model.pth --images $1 --annotations $2 --max-size 640 --override --iters 100 --backbone ResNet18FPN ResNet50FPN"
12 + "odtk train ${tmp}/model.pth --images $1 --annotations $2 --max-size 640 --override --iters 100"
13 + "odtk train ${tmp}/model.pth --fine-tune ${tmp}/model.pth --images $1 --annotations $2 --max-size 640 --override --iters 100"
14 + "odtk infer ${tmp}/model.pth --images ${tmp}/test_images --max-size 640"
15 + "odtk export ${tmp}/model.pth ${tmp}/engine.plan --size 640"
16 + "odtk infer ${tmp}/engine.plan --images ${tmp}/test_images --max-size 640"
17 +)
18 +
19 +start=`date +%s`
20 +
21 +# Prepare small image folder for inference
22 +if [ ! -d ${tmp}/test_images ]; then
23 + mkdir -p ${tmp}/test_images
24 + cp $(find $1 | tail -n 10) ${tmp}/test_images
25 +fi
26 +
27 +# Run all tests
28 +for test in "${tests[@]}"; do
29 + echo "Running \"${test}\""
30 + ${test}
31 + if [ $? -ne 0 ]; then
32 + echo "Test failed!"
33 + exit 1
34 + fi
35 +done
36 +
37 +end=`date +%s`
38 +
39 +echo "All test succeeded in $((end-start)) seconds!"
...\ No newline at end of file ...\ No newline at end of file
1 +import sys
2 +
3 +from .resnet import *
4 +from .mobilenet import *
5 +from .fpn import *
1 +import torch.nn as nn
2 +import torch.nn.functional as F
3 +from torchvision.models import resnet as vrn
4 +from torchvision.models import mobilenet as vmn
5 +
6 +from .resnet import ResNet
7 +from .mobilenet import MobileNet
8 +from .utils import register
9 +
10 +
11 +class FPN(nn.Module):
12 + 'Feature Pyramid Network - https://arxiv.org/abs/1612.03144'
13 +
14 + def __init__(self, features):
15 + super().__init__()
16 +
17 + self.stride = 128
18 + self.features = features
19 +
20 + if isinstance(features, ResNet):
21 + is_light = features.bottleneck == vrn.BasicBlock
22 + channels = [128, 256, 512] if is_light else [512, 1024, 2048]
23 + elif isinstance(features, MobileNet):
24 + channels = [32, 96, 320]
25 +
26 + self.lateral3 = nn.Conv2d(channels[0], 256, 1)
27 + self.lateral4 = nn.Conv2d(channels[1], 256, 1)
28 + self.lateral5 = nn.Conv2d(channels[2], 256, 1)
29 + self.pyramid6 = nn.Conv2d(channels[2], 256, 3, stride=2, padding=1)
30 + self.pyramid7 = nn.Conv2d(256, 256, 3, stride=2, padding=1)
31 + self.smooth3 = nn.Conv2d(256, 256, 3, padding=1)
32 + self.smooth4 = nn.Conv2d(256, 256, 3, padding=1)
33 + self.smooth5 = nn.Conv2d(256, 256, 3, padding=1)
34 +
35 + def initialize(self):
36 + def init_layer(layer):
37 + if isinstance(layer, nn.Conv2d):
38 + nn.init.xavier_uniform_(layer.weight)
39 + if layer.bias is not None:
40 + nn.init.constant_(layer.bias, val=0)
41 + self.apply(init_layer)
42 +
43 + self.features.initialize()
44 +
45 + def forward(self, x):
46 + c3, c4, c5 = self.features(x)
47 +
48 + p5 = self.lateral5(c5)
49 + p4 = self.lateral4(c4)
50 + p4 = F.interpolate(p5, scale_factor=2) + p4
51 + p3 = self.lateral3(c3)
52 + p3 = F.interpolate(p4, scale_factor=2) + p3
53 +
54 + p6 = self.pyramid6(c5)
55 + p7 = self.pyramid7(F.relu(p6))
56 +
57 + p3 = self.smooth3(p3)
58 + p4 = self.smooth4(p4)
59 + p5 = self.smooth5(p5)
60 +
61 + return [p3, p4, p5, p6, p7]
62 +
63 +@register
64 +def ResNet18FPN():
65 + return FPN(ResNet(layers=[2, 2, 2, 2], bottleneck=vrn.BasicBlock, outputs=[3, 4, 5], url=vrn.model_urls['resnet18']))
66 +
67 +@register
68 +def ResNet34FPN():
69 + return FPN(ResNet(layers=[3, 4, 6, 3], bottleneck=vrn.BasicBlock, outputs=[3, 4, 5], url=vrn.model_urls['resnet34']))
70 +
71 +@register
72 +def ResNet50FPN():
73 + return FPN(ResNet(layers=[3, 4, 6, 3], bottleneck=vrn.Bottleneck, outputs=[3, 4, 5], url=vrn.model_urls['resnet50']))
74 +
75 +@register
76 +def ResNet101FPN():
77 + return FPN(ResNet(layers=[3, 4, 23, 3], bottleneck=vrn.Bottleneck, outputs=[3, 4, 5], url=vrn.model_urls['resnet101']))
78 +
79 +@register
80 +def ResNet152FPN():
81 + return FPN(ResNet(layers=[3, 8, 36, 3], bottleneck=vrn.Bottleneck, outputs=[3, 4, 5], url=vrn.model_urls['resnet152']))
82 +
83 +@register
84 +def ResNeXt50_32x4dFPN():
85 + return FPN(ResNet(layers=[3, 4, 6, 3], bottleneck=vrn.Bottleneck, outputs=[3, 4, 5], groups=32, width_per_group=4, url=vrn.model_urls['resnext50_32x4d']))
86 +
87 +@register
88 +def ResNeXt101_32x8dFPN():
89 + return FPN(ResNet(layers=[3, 4, 23, 3], bottleneck=vrn.Bottleneck, outputs=[3, 4, 5], groups=32, width_per_group=8, url=vrn.model_urls['resnext101_32x8d']))
90 +
91 +@register
92 +def MobileNetV2FPN():
93 + return FPN(MobileNet(outputs=[6, 13, 17], url=vmn.model_urls['mobilenet_v2']))
1 +import torch
2 +from torch import nn
3 +import torch.nn.functional as F
4 +
5 +class FixedBatchNorm2d(nn.Module):
6 + 'BatchNorm2d where the batch statistics and the affine parameters are fixed'
7 +
8 + def __init__(self, n):
9 + super().__init__()
10 + self.register_buffer("weight", torch.ones(n))
11 + self.register_buffer("bias", torch.zeros(n))
12 + self.register_buffer("running_mean", torch.zeros(n))
13 + self.register_buffer("running_var", torch.ones(n))
14 +
15 + def forward(self, x):
16 + return F.batch_norm(x, running_mean=self.running_mean, running_var=self.running_var, weight=self.weight, bias=self.bias)
17 +
18 +def convert_fixedbn_model(module):
19 + 'Convert batch norm layers to fixed'
20 +
21 + mod = module
22 + if isinstance(module, nn.BatchNorm2d):
23 + mod = FixedBatchNorm2d(module.num_features)
24 + mod.running_mean = module.running_mean
25 + mod.running_var = module.running_var
26 + if module.affine:
27 + mod.weight.data = module.weight.data.clone().detach()
28 + mod.bias.data = module.bias.data.clone().detach()
29 + for name, child in module.named_children():
30 + mod.add_module(name, convert_fixedbn_model(child))
31 +
32 + return mod
1 +import torch.nn as nn
2 +from torchvision.models import mobilenet as vmn
3 +import torch.utils.model_zoo as model_zoo
4 +
5 +class MobileNet(vmn.MobileNetV2):
6 + 'MobileNetV2: Inverted Residuals and Linear Bottlenecks - https://arxiv.org/abs/1801.04381'
7 +
8 + def __init__(self, outputs=[18], url=None):
9 + self.stride = 128
10 + self.url = url
11 + super().__init__()
12 + self.outputs = outputs
13 + self.unused_modules = ['features.18', 'classifier']
14 +
15 + def initialize(self):
16 + if self.url:
17 + self.load_state_dict(model_zoo.load_url(self.url))
18 +
19 + def forward(self, x):
20 + outputs = []
21 + for indx, feat in enumerate(self.features[:-1]):
22 + x = feat(x)
23 + if indx in self.outputs:
24 + outputs.append(x)
25 + return outputs
1 +import torchvision
2 +from torchvision.models import resnet as vrn
3 +import torch.utils.model_zoo as model_zoo
4 +
5 +from .utils import register
6 +
7 +class ResNet(vrn.ResNet):
8 + 'Deep Residual Network - https://arxiv.org/abs/1512.03385'
9 +
10 + def __init__(self, layers=[3, 4, 6, 3], bottleneck=vrn.Bottleneck, outputs=[5], groups=1, width_per_group=64, url=None):
11 + self.stride = 128
12 + self.bottleneck = bottleneck
13 + self.outputs = outputs
14 + self.url = url
15 +
16 + kwargs = {'block': bottleneck, 'layers': layers, 'groups': groups, 'width_per_group': width_per_group}
17 + super().__init__(**kwargs)
18 + self.unused_modules = ['fc']
19 +
20 + def initialize(self):
21 + if self.url:
22 + self.load_state_dict(model_zoo.load_url(self.url))
23 +
24 + def forward(self, x):
25 + x = self.conv1(x)
26 + x = self.bn1(x)
27 + x = self.relu(x)
28 + x = self.maxpool(x)
29 +
30 + outputs = []
31 + for i, layer in enumerate([self.layer1, self.layer2, self.layer3, self.layer4]):
32 + level = i + 2
33 + if level > max(self.outputs):
34 + break
35 + x = layer(x)
36 + if level in self.outputs:
37 + outputs.append(x)
38 +
39 + return outputs
40 +
41 +@register
42 +def ResNet18C4():
43 + return ResNet(layers=[2, 2, 2, 2], bottleneck=vrn.BasicBlock, outputs=[4], url=vrn.model_urls['resnet18'])
44 +
45 +@register
46 +def ResNet34C4():
47 + return ResNet(layers=[3, 4, 6, 3], bottleneck=vrn.BasicBlock, outputs=[4], url=vrn.model_urls['resnet34'])
1 +import sys
2 +import torchvision
3 +
4 +def register(f):
5 + all = sys.modules[f.__module__].__dict__.setdefault('__all__', [])
6 + if f.__name__ in all:
7 + raise RuntimeError('{} already exist!'.format(f.__name__))
8 + all.append(f.__name__)
9 + return f
1 +import torch
2 +from ._C import decode as decode_cuda
3 +from ._C import iou as iou_cuda
4 +from ._C import nms as nms_cuda
5 +import numpy as np
6 +from .utils import order_points, rotate_boxes
7 +
8 +def generate_anchors(stride, ratio_vals, scales_vals, angles_vals=None):
9 + 'Generate anchors coordinates from scales/ratios'
10 +
11 + scales = torch.FloatTensor(scales_vals).repeat(len(ratio_vals), 1)
12 + scales = scales.transpose(0, 1).contiguous().view(-1, 1)
13 + ratios = torch.FloatTensor(ratio_vals * len(scales_vals))
14 +
15 + wh = torch.FloatTensor([stride]).repeat(len(ratios), 2)
16 + ws = torch.sqrt(wh[:, 0] * wh[:, 1] / ratios)
17 + dwh = torch.stack([ws, ws * ratios], dim=1)
18 + xy1 = 0.5 * (wh - dwh * scales)
19 + xy2 = 0.5 * (wh + dwh * scales)
20 + return torch.cat([xy1, xy2], dim=1)
21 +
22 +
23 +def generate_anchors_rotated(stride, ratio_vals, scales_vals, angles_vals):
24 + 'Generate anchors coordinates from scales/ratios/angles'
25 + scales = torch.FloatTensor(scales_vals).repeat(len(ratio_vals), 1)
26 + scales = scales.transpose(0, 1).contiguous().view(-1, 1)
27 + ratios = torch.FloatTensor(ratio_vals * len(scales_vals))
28 +
29 + wh = torch.FloatTensor([stride]).repeat(len(ratios), 2)
30 + ws = torch.round(torch.sqrt(wh[:, 0] * wh[:, 1] / ratios))
31 + dwh = torch.stack([ws, torch.round(ws * ratios)], dim=1)
32 +
33 + xy0 = 0.5 * (wh - dwh * scales)
34 + xy2 = 0.5 * (wh + dwh * scales) - 1
35 + xy1 = xy0 + (xy2 - xy0) * torch.FloatTensor([0,1])
36 + xy3 = xy0 + (xy2 - xy0) * torch.FloatTensor([1,0])
37 +
38 + angles = torch.FloatTensor(angles_vals)
39 + theta = angles.repeat(xy0.size(0),1)
40 + theta = theta.transpose(0,1).contiguous().view(-1,1)
41 +
42 + xmin_ymin = xy0.repeat(int(theta.size(0)/xy0.size(0)),1)
43 + xmax_ymax = xy2.repeat(int(theta.size(0)/xy2.size(0)),1)
44 + widths_heights = dwh * scales
45 + widths_heights = widths_heights.repeat(int(theta.size(0)/widths_heights.size(0)),1)
46 +
47 + u = torch.stack([torch.cos(angles), torch.sin(angles)], dim=1)
48 + l = torch.stack([-torch.sin(angles), torch.cos(angles)], dim=1)
49 + R = torch.stack([u, l], dim=1)
50 +
51 + xy0R = torch.matmul(R,xy0.transpose(1,0) - stride/2 + 0.5) + stride/2 - 0.5
52 + xy1R = torch.matmul(R,xy1.transpose(1,0) - stride/2 + 0.5) + stride/2 - 0.5
53 + xy2R = torch.matmul(R,xy2.transpose(1,0) - stride/2 + 0.5) + stride/2 - 0.5
54 + xy3R = torch.matmul(R,xy3.transpose(1,0) - stride/2 + 0.5) + stride/2 - 0.5
55 +
56 + xy0R = xy0R.permute(0,2,1).contiguous().view(-1,2)
57 + xy1R = xy1R.permute(0,2,1).contiguous().view(-1,2)
58 + xy2R = xy2R.permute(0,2,1).contiguous().view(-1,2)
59 + xy3R = xy3R.permute(0,2,1).contiguous().view(-1,2)
60 +
61 + anchors_axis = torch.cat([xmin_ymin, xmax_ymax], dim=1)
62 + anchors_rotated = order_points(torch.stack([xy0R,xy1R,xy2R,xy3R],dim = 1)).view(-1,8)
63 +
64 + return anchors_axis, anchors_rotated
65 +
66 +
67 +def box2delta(boxes, anchors):
68 + 'Convert boxes to deltas from anchors'
69 +
70 + anchors_wh = anchors[:, 2:] - anchors[:, :2] + 1
71 + anchors_ctr = anchors[:, :2] + 0.5 * anchors_wh
72 + boxes_wh = boxes[:, 2:] - boxes[:, :2] + 1
73 + boxes_ctr = boxes[:, :2] + 0.5 * boxes_wh
74 +
75 + return torch.cat([
76 + (boxes_ctr - anchors_ctr) / anchors_wh,
77 + torch.log(boxes_wh / anchors_wh)
78 + ], 1)
79 +
80 +
81 +def box2delta_rotated(boxes, anchors):
82 + 'Convert boxes to deltas from anchors'
83 +
84 + anchors_wh = anchors[:, 2:4] - anchors[:, :2] + 1
85 + anchors_ctr = anchors[:, :2] + 0.5 * anchors_wh
86 + boxes_wh = boxes[:, 2:4] - boxes[:, :2] + 1
87 + boxes_ctr = boxes[:, :2] + 0.5 * boxes_wh
88 + boxes_sin = boxes[:, 4]
89 + boxes_cos = boxes[:, 5]
90 +
91 + return torch.cat([
92 + (boxes_ctr - anchors_ctr) / anchors_wh,
93 + torch.log(boxes_wh / anchors_wh), boxes_sin[:, None], boxes_cos[:, None]
94 + ], 1)
95 +
96 +
97 +def delta2box(deltas, anchors, size, stride):
98 + 'Convert deltas from anchors to boxes'
99 +
100 + anchors_wh = anchors[:, 2:] - anchors[:, :2] + 1
101 + ctr = anchors[:, :2] + 0.5 * anchors_wh
102 + pred_ctr = deltas[:, :2] * anchors_wh + ctr
103 + pred_wh = torch.exp(deltas[:, 2:]) * anchors_wh
104 +
105 + m = torch.zeros([2], device=deltas.device, dtype=deltas.dtype)
106 + M = (torch.tensor([size], device=deltas.device, dtype=deltas.dtype) * stride - 1)
107 + clamp = lambda t: torch.max(m, torch.min(t, M))
108 + return torch.cat([
109 + clamp(pred_ctr - 0.5 * pred_wh),
110 + clamp(pred_ctr + 0.5 * pred_wh - 1)
111 + ], 1)
112 +
113 +
114 +def delta2box_rotated(deltas, anchors, size, stride):
115 + 'Convert deltas from anchors to boxes'
116 +
117 + anchors_wh = anchors[:, 2:4] - anchors[:, :2] + 1
118 + ctr = anchors[:, :2] + 0.5 * anchors_wh
119 + pred_ctr = deltas[:, :2] * anchors_wh + ctr
120 + pred_wh = torch.exp(deltas[:, 2:4]) * anchors_wh
121 + pred_sin = deltas[:, 4]
122 + pred_cos = deltas[:, 5]
123 +
124 + m = torch.zeros([2], device=deltas.device, dtype=deltas.dtype)
125 + M = (torch.tensor([size], device=deltas.device, dtype=deltas.dtype) * stride - 1)
126 + clamp = lambda t: torch.max(m, torch.min(t, M))
127 + return torch.cat([
128 + clamp(pred_ctr - 0.5 * pred_wh),
129 + clamp(pred_ctr + 0.5 * pred_wh - 1),
130 + torch.atan2(pred_sin, pred_cos)[:, None]
131 + ], 1)
132 +
133 +
134 +def snap_to_anchors(boxes, size, stride, anchors, num_classes, device, anchor_ious):
135 + 'Snap target boxes (x, y, w, h) to anchors'
136 +
137 + num_anchors = anchors.size()[0] if anchors is not None else 1
138 + width, height = (int(size[0] / stride), int(size[1] / stride))
139 +
140 + if boxes.nelement() == 0:
141 + return (torch.zeros([num_anchors, num_classes, height, width], device=device),
142 + torch.zeros([num_anchors, 4, height, width], device=device),
143 + torch.zeros([num_anchors, 1, height, width], device=device))
144 +
145 + boxes, classes = boxes.split(4, dim=1)
146 +
147 + # Generate anchors
148 + x, y = torch.meshgrid([torch.arange(0, size[i], stride, device=device, dtype=classes.dtype) for i in range(2)])
149 + xyxy = torch.stack((x, y, x, y), 2).unsqueeze(0)
150 + anchors = anchors.view(-1, 1, 1, 4).to(dtype=classes.dtype)
151 + anchors = (xyxy + anchors).contiguous().view(-1, 4)
152 +
153 + # Compute overlap between boxes and anchors
154 + boxes = torch.cat([boxes[:, :2], boxes[:, :2] + boxes[:, 2:] - 1], 1)
155 + xy1 = torch.max(anchors[:, None, :2], boxes[:, :2])
156 + xy2 = torch.min(anchors[:, None, 2:], boxes[:, 2:])
157 + inter = torch.prod((xy2 - xy1 + 1).clamp(0), 2)
158 + boxes_area = torch.prod(boxes[:, 2:] - boxes[:, :2] + 1, 1)
159 + anchors_area = torch.prod(anchors[:, 2:] - anchors[:, :2] + 1, 1)
160 + overlap = inter / (anchors_area[:, None] + boxes_area - inter)
161 +
162 + # Keep best box per anchor
163 + overlap, indices = overlap.max(1)
164 + box_target = box2delta(boxes[indices], anchors)
165 + box_target = box_target.view(num_anchors, 1, width, height, 4)
166 + box_target = box_target.transpose(1, 4).transpose(2, 3)
167 + box_target = box_target.squeeze().contiguous()
168 +
169 + depth = torch.ones_like(overlap) * -1
170 + depth[overlap < anchor_ious[0]] = 0 # background
171 + depth[overlap >= anchor_ious[1]] = classes[indices][overlap >= anchor_ious[1]].squeeze() + 1 # objects
172 + depth = depth.view(num_anchors, width, height).transpose(1, 2).contiguous()
173 +
174 + # Generate target classes
175 + cls_target = torch.zeros((anchors.size()[0], num_classes + 1), device=device, dtype=boxes.dtype)
176 + if classes.nelement() == 0:
177 + classes = torch.LongTensor([num_classes], device=device).expand_as(indices)
178 + else:
179 + classes = classes[indices].long()
180 + classes = classes.view(-1, 1)
181 + classes[overlap < anchor_ious[0]] = num_classes # background has no class
182 + cls_target.scatter_(1, classes, 1)
183 + cls_target = cls_target[:, :num_classes].view(-1, 1, width, height, num_classes)
184 + cls_target = cls_target.transpose(1, 4).transpose(2, 3)
185 + cls_target = cls_target.squeeze().contiguous()
186 +
187 + return (cls_target.view(num_anchors, num_classes, height, width),
188 + box_target.view(num_anchors, 4, height, width),
189 + depth.view(num_anchors, 1, height, width))
190 +
191 +
192 +def snap_to_anchors_rotated(boxes, size, stride, anchors, num_classes, device, anchor_ious):
193 + 'Snap target boxes (x, y, w, h, a) to anchors'
194 +
195 + anchors_axis, anchors_rotated = anchors
196 +
197 + num_anchors = anchors_rotated.size()[0] if anchors_rotated is not None else 1
198 + width, height = (int(size[0] / stride), int(size[1] / stride))
199 +
200 + if boxes.nelement() == 0:
201 + return (torch.zeros([num_anchors, num_classes, height, width], device=device),
202 + torch.zeros([num_anchors, 6, height, width], device=device),
203 + torch.zeros([num_anchors, 1, height, width], device=device))
204 +
205 + boxes, classes = boxes.split(5, dim=1)
206 + boxes_axis, boxes_rotated = rotate_boxes(boxes)
207 +
208 + boxes_axis = boxes_axis.to(device)
209 + boxes_rotated = boxes_rotated.to(device)
210 + anchors_axis = anchors_axis.to(device)
211 + anchors_rotated = anchors_rotated.to(device)
212 +
213 + # Generate anchors
214 + x, y = torch.meshgrid([torch.arange(0, size[i], stride, device=device, dtype=classes.dtype) for i in range(2)])
215 + xy_2corners = torch.stack((x, y, x, y), 2).unsqueeze(0)
216 + xy_4corners = torch.stack((x, y, x, y, x, y, x, y), 2).unsqueeze(0)
217 + anchors_axis = (xy_2corners.to(torch.float) + anchors_axis.view(-1, 1, 1, 4)).contiguous().view(-1, 4)
218 + anchors_rotated = (xy_4corners.to(torch.float) + anchors_rotated.view(-1, 1, 1, 8)).contiguous().view(-1, 8)
219 +
220 + if torch.cuda.is_available():
221 + iou = iou_cuda
222 +
223 + overlap = iou(boxes_rotated.contiguous().view(-1), anchors_rotated.contiguous().view(-1))[0]
224 +
225 + # Keep best box per anchor
226 + overlap, indices = overlap.max(1)
227 + box_target = box2delta_rotated(boxes_axis[indices], anchors_axis)
228 + box_target = box_target.view(num_anchors, 1, width, height, 6)
229 + box_target = box_target.transpose(1, 4).transpose(2, 3)
230 + box_target = box_target.squeeze().contiguous()
231 +
232 + depth = torch.ones_like(overlap, device=device) * -1
233 + depth[overlap < anchor_ious[0]] = 0 # background
234 + depth[overlap >= anchor_ious[1]] = classes[indices][overlap >= anchor_ious[1]].squeeze() + 1 # objects
235 + depth = depth.view(num_anchors, width, height).transpose(1, 2).contiguous()
236 +
237 + # Generate target classes
238 + cls_target = torch.zeros((anchors_axis.size()[0], num_classes + 1), device=device, dtype=boxes_axis.dtype)
239 + if classes.nelement() == 0:
240 + classes = torch.LongTensor([num_classes], device=device).expand_as(indices)
241 + else:
242 + classes = classes[indices].long()
243 + classes = classes.view(-1, 1)
244 + classes[overlap < anchor_ious[0]] = num_classes # background has no class
245 + cls_target.scatter_(1, classes, 1)
246 + cls_target = cls_target[:, :num_classes].view(-1, 1, width, height, num_classes)
247 + cls_target = cls_target.transpose(1, 4).transpose(2, 3)
248 + cls_target = cls_target.squeeze().contiguous()
249 +
250 + return (cls_target.view(num_anchors, num_classes, height, width),
251 + box_target.view(num_anchors, 6, height, width),
252 + depth.view(num_anchors, 1, height, width))
253 +
254 +
255 +def decode(all_cls_head, all_box_head, stride=1, threshold=0.05, top_n=1000, anchors=None, rotated=False):
256 + 'Box Decoding and Filtering'
257 +
258 + if rotated:
259 + anchors = anchors[0]
260 + num_boxes = 4 if not rotated else 6
261 +
262 + if torch.cuda.is_available():
263 + return decode_cuda(all_cls_head.float(), all_box_head.float(),
264 + anchors.view(-1).tolist(), stride, threshold, top_n, rotated)
265 +
266 + device = all_cls_head.device
267 + anchors = anchors.to(device).type(all_cls_head.type())
268 + num_anchors = anchors.size()[0] if anchors is not None else 1
269 + num_classes = all_cls_head.size()[1] // num_anchors
270 + height, width = all_cls_head.size()[-2:]
271 +
272 + batch_size = all_cls_head.size()[0]
273 + out_scores = torch.zeros((batch_size, top_n), device=device)
274 + out_boxes = torch.zeros((batch_size, top_n, num_boxes), device=device)
275 + out_classes = torch.zeros((batch_size, top_n), device=device)
276 +
277 + # Per item in batch
278 + for batch in range(batch_size):
279 + cls_head = all_cls_head[batch, :, :, :].contiguous().view(-1)
280 + box_head = all_box_head[batch, :, :, :].contiguous().view(-1, num_boxes)
281 +
282 + # Keep scores over threshold
283 + keep = (cls_head >= threshold).nonzero().view(-1)
284 + if keep.nelement() == 0:
285 + continue
286 +
287 + # Gather top elements
288 + scores = torch.index_select(cls_head, 0, keep)
289 + scores, indices = torch.topk(scores, min(top_n, keep.size()[0]), dim=0)
290 + indices = torch.index_select(keep, 0, indices).view(-1)
291 + classes = (indices / width / height) % num_classes
292 + classes = classes.type(all_cls_head.type())
293 +
294 + # Infer kept bboxes
295 + x = indices % width
296 + y = (indices / width) % height
297 + a = indices / num_classes / height / width
298 + box_head = box_head.view(num_anchors, num_boxes, height, width)
299 + boxes = box_head[a, :, y, x]
300 +
301 + if anchors is not None:
302 + grid = torch.stack([x, y, x, y], 1).type(all_cls_head.type()) * stride + anchors[a, :]
303 + boxes = delta2box(boxes, grid, [width, height], stride)
304 +
305 + out_scores[batch, :scores.size()[0]] = scores
306 + out_boxes[batch, :boxes.size()[0], :] = boxes
307 + out_classes[batch, :classes.size()[0]] = classes
308 +
309 + return out_scores, out_boxes, out_classes
310 +
311 +
312 +def nms(all_scores, all_boxes, all_classes, nms=0.5, ndetections=100):
313 + 'Non Maximum Suppression'
314 +
315 + if torch.cuda.is_available():
316 + return nms_cuda(all_scores.float(), all_boxes.float(), all_classes.float(),
317 + nms, ndetections, False)
318 +
319 + device = all_scores.device
320 + batch_size = all_scores.size()[0]
321 + out_scores = torch.zeros((batch_size, ndetections), device=device)
322 + out_boxes = torch.zeros((batch_size, ndetections, 4), device=device)
323 + out_classes = torch.zeros((batch_size, ndetections), device=device)
324 +
325 + # Per item in batch
326 + for batch in range(batch_size):
327 + # Discard null scores
328 + keep = (all_scores[batch, :].view(-1) > 0).nonzero()
329 + scores = all_scores[batch, keep].view(-1)
330 + boxes = all_boxes[batch, keep, :].view(-1, 4)
331 + classes = all_classes[batch, keep].view(-1)
332 +
333 + if scores.nelement() == 0:
334 + continue
335 +
336 + # Sort boxes
337 + scores, indices = torch.sort(scores, descending=True)
338 + boxes, classes = boxes[indices], classes[indices]
339 + areas = (boxes[:, 2] - boxes[:, 0] + 1) * (boxes[:, 3] - boxes[:, 1] + 1).view(-1)
340 + keep = torch.ones(scores.nelement(), device=device, dtype=torch.uint8).view(-1)
341 +
342 + for i in range(ndetections):
343 + if i >= keep.nonzero().nelement() or i >= scores.nelement():
344 + i -= 1
345 + break
346 +
347 + # Find overlapping boxes with lower score
348 + xy1 = torch.max(boxes[:, :2], boxes[i, :2])
349 + xy2 = torch.min(boxes[:, 2:], boxes[i, 2:])
350 + inter = torch.prod((xy2 - xy1 + 1).clamp(0), 1)
351 + criterion = ((scores > scores[i]) |
352 + (inter / (areas + areas[i] - inter) <= nms) |
353 + (classes != classes[i]))
354 + criterion[i] = 1
355 +
356 + # Only keep relevant boxes
357 + scores = scores[criterion.nonzero()].view(-1)
358 + boxes = boxes[criterion.nonzero(), :].view(-1, 4)
359 + classes = classes[criterion.nonzero()].view(-1)
360 + areas = areas[criterion.nonzero()].view(-1)
361 + keep[(~criterion).nonzero()] = 0
362 +
363 + out_scores[batch, :i + 1] = scores[:i + 1]
364 + out_boxes[batch, :i + 1, :] = boxes[:i + 1, :]
365 + out_classes[batch, :i + 1] = classes[:i + 1]
366 +
367 + return out_scores, out_boxes, out_classes
368 +
369 +
370 +def nms_rotated(all_scores, all_boxes, all_classes, nms=0.5, ndetections=100):
371 + 'Non Maximum Suppression'
372 +
373 + if torch.cuda.is_available():
374 + return nms_cuda(all_scores.float(), all_boxes.float(), all_classes.float(),
375 + nms, ndetections, True)
376 +
377 + device = all_scores.device
378 + batch_size = all_scores.size()[0]
379 + out_scores = torch.zeros((batch_size, ndetections), device=device)
380 + out_boxes = torch.zeros((batch_size, ndetections, 6), device=device)
381 + out_classes = torch.zeros((batch_size, ndetections), device=device)
382 +
383 + # Per item in batch
384 + for batch in range(batch_size):
385 + # Discard null scores
386 + keep = (all_scores[batch, :].view(-1) > 0).nonzero()
387 + scores = all_scores[batch, keep].view(-1)
388 + boxes = all_boxes[batch, keep, :].view(-1, 6)
389 + classes = all_classes[batch, keep].view(-1)
390 + theta = torch.atan2(boxes[:, -2], boxes[:, -1])
391 + boxes_theta = torch.cat([boxes[:, :-2], theta[:, None]], dim=1)
392 +
393 + if scores.nelement() == 0:
394 + continue
395 +
396 + # Sort boxes
397 + scores, indices = torch.sort(scores, descending=True)
398 + boxes, boxes_theta, classes = boxes[indices], boxes_theta[indices], classes[indices]
399 + areas = (boxes_theta[:, 2] - boxes_theta[:, 0] + 1) * (boxes_theta[:, 3] - boxes_theta[:, 1] + 1).view(-1)
400 + keep = torch.ones(scores.nelement(), device=device, dtype=torch.uint8).view(-1)
401 +
402 + for i in range(ndetections):
403 + if i >= keep.nonzero().nelement() or i >= scores.nelement():
404 + i -= 1
405 + break
406 +
407 + boxes_axis, boxes_rotated = rotate_boxes(boxes_theta, points=True)
408 + overlap, inter = iou(boxes_rotated.contiguous().view(-1), boxes_rotated[i, :].contiguous().view(-1))
409 + inter = inter.squeeze()
410 + criterion = ((scores > scores[i]) |
411 + (inter / (areas + areas[i] - inter) <= nms) |
412 + (classes != classes[i]))
413 + criterion[i] = 1
414 +
415 + # Only keep relevant boxes
416 + scores = scores[criterion.nonzero()].view(-1)
417 + boxes = boxes[criterion.nonzero(), :].view(-1, 6)
418 + boxes_theta = boxes_theta[criterion.nonzero(), :].view(-1, 5)
419 + classes = classes[criterion.nonzero()].view(-1)
420 + areas = areas[criterion.nonzero()].view(-1)
421 + keep[(~criterion).nonzero()] = 0
422 +
423 + out_scores[batch, :i + 1] = scores[:i + 1]
424 + out_boxes[batch, :i + 1, :] = boxes[:i + 1, :]
425 + out_classes[batch, :i + 1] = classes[:i + 1]
426 +
427 + return out_scores, out_boxes, out_classes
1 +from contextlib import redirect_stdout
2 +from math import ceil
3 +import ctypes
4 +import torch
5 +from nvidia.dali import pipeline, ops, types
6 +from pycocotools.coco import COCO
7 +
8 +class COCOPipeline(pipeline.Pipeline):
9 + 'Dali pipeline for COCO'
10 +
11 + def __init__(self, batch_size, num_threads, path, training, annotations, world, device_id, mean, std, resize,
12 + max_size, stride, rotate_augment=False,
13 + augment_brightness=0.0,
14 + augment_contrast=0.0, augment_hue=0.0,
15 + augment_saturation=0.0):
16 + super().__init__(batch_size=batch_size, num_threads=num_threads, device_id=device_id,
17 + prefetch_queue_depth=num_threads, seed=42)
18 + self.path = path
19 + self.training = training
20 + self.stride = stride
21 + self.iter = 0
22 +
23 + self.rotate_augment = rotate_augment
24 + self.augment_brightness = augment_brightness
25 + self.augment_contrast = augment_contrast
26 + self.augment_hue = augment_hue
27 + self.augment_saturation = augment_saturation
28 +
29 + self.reader = ops.COCOReader(annotations_file=annotations, file_root=path, num_shards=world,
30 + shard_id=torch.cuda.current_device(),
31 + ltrb=True, ratio=True, shuffle_after_epoch=True, save_img_ids=True)
32 +
33 + self.decode_train = ops.ImageDecoderSlice(device="mixed", output_type=types.RGB)
34 + self.decode_infer = ops.ImageDecoder(device="mixed", output_type=types.RGB)
35 + self.bbox_crop = ops.RandomBBoxCrop(device='cpu', bbox_layout="xyXY", scaling=[0.3, 1.0],
36 + thresholds=[0.1, 0.3, 0.5, 0.7, 0.9])
37 +
38 + self.bbox_flip = ops.BbFlip(device='cpu', ltrb=True)
39 + self.img_flip = ops.Flip(device='gpu')
40 + self.coin_flip = ops.CoinFlip(probability=0.5)
41 + self.bc = ops.BrightnessContrast(device='gpu')
42 + self.hsv = ops.Hsv(device='gpu')
43 +
44 + # Random number generation for augmentation
45 + self.brightness_dist = ops.NormalDistribution(mean=1.0, stddev=augment_brightness)
46 + self.contrast_dist = ops.NormalDistribution(mean=1.0, stddev=augment_contrast)
47 + self.hue_dist = ops.NormalDistribution(mean=0.0, stddev=augment_hue)
48 + self.saturation_dist = ops.NormalDistribution(mean=1.0, stddev=augment_saturation)
49 +
50 + if rotate_augment:
51 + raise RuntimeWarning("--augment-rotate current has no effect when using the DALI data loader.")
52 +
53 + if isinstance(resize, list): resize = max(resize)
54 + self.rand_resize = ops.Uniform(range=[resize, float(max_size)])
55 +
56 + self.resize_train = ops.Resize(device='gpu', interp_type=types.DALIInterpType.INTERP_CUBIC, save_attrs=True)
57 + self.resize_infer = ops.Resize(device='gpu', interp_type=types.DALIInterpType.INTERP_CUBIC,
58 + resize_longer=max_size, save_attrs=True)
59 +
60 + padded_size = max_size + ((self.stride - max_size % self.stride) % self.stride)
61 +
62 + self.pad = ops.Paste(device='gpu', fill_value=0, ratio=1.1, min_canvas_size=padded_size, paste_x=0, paste_y=0)
63 + self.normalize = ops.CropMirrorNormalize(device='gpu', mean=mean, std=std, crop=(padded_size, padded_size),
64 + crop_pos_x=0, crop_pos_y=0)
65 +
66 + def define_graph(self):
67 +
68 + images, bboxes, labels, img_ids = self.reader()
69 +
70 + if self.training:
71 + crop_begin, crop_size, bboxes, labels = self.bbox_crop(bboxes, labels)
72 + images = self.decode_train(images, crop_begin, crop_size)
73 + resize = self.rand_resize()
74 + images, attrs = self.resize_train(images, resize_longer=resize)
75 +
76 + flip = self.coin_flip()
77 + bboxes = self.bbox_flip(bboxes, horizontal=flip)
78 + images = self.img_flip(images, horizontal=flip)
79 +
80 + if self.augment_brightness or self.augment_contrast:
81 + images = self.bc(images, brightness=self.brightness_dist(), contrast=self.contrast_dist())
82 + if self.augment_hue or self.augment_saturation:
83 + images = self.hsv(images, hue=self.hue_dist(), saturation=self.saturation_dist())
84 +
85 + else:
86 + images = self.decode_infer(images)
87 + images, attrs = self.resize_infer(images)
88 +
89 + resized_images = images
90 + images = self.normalize(self.pad(images))
91 +
92 + return images, bboxes, labels, img_ids, attrs, resized_images
93 +
94 +
95 +class DaliDataIterator():
96 + 'Data loader for data parallel using Dali'
97 +
98 + def __init__(self, path, resize, max_size, batch_size, stride, world, annotations, training=False,
99 + rotate_augment=False, augment_brightness=0.0,
100 + augment_contrast=0.0, augment_hue=0.0, augment_saturation=0.0):
101 + self.training = training
102 + self.resize = resize
103 + self.max_size = max_size
104 + self.stride = stride
105 + self.batch_size = batch_size // world
106 + self.mean = [255. * x for x in [0.485, 0.456, 0.406]]
107 + self.std = [255. * x for x in [0.229, 0.224, 0.225]]
108 + self.world = world
109 + self.path = path
110 +
111 + # Setup COCO
112 + with redirect_stdout(None):
113 + self.coco = COCO(annotations)
114 + self.ids = list(self.coco.imgs.keys())
115 + if 'categories' in self.coco.dataset:
116 + self.categories_inv = {k: i for i, k in enumerate(self.coco.getCatIds())}
117 +
118 + self.pipe = COCOPipeline(batch_size=self.batch_size, num_threads=2,
119 + path=path, training=training, annotations=annotations, world=world,
120 + device_id=torch.cuda.current_device(), mean=self.mean, std=self.std, resize=resize,
121 + max_size=max_size, stride=self.stride, rotate_augment=rotate_augment,
122 + augment_brightness=augment_brightness,
123 + augment_contrast=augment_contrast, augment_hue=augment_hue,
124 + augment_saturation=augment_saturation)
125 +
126 + self.pipe.build()
127 +
128 + def __repr__(self):
129 + return '\n'.join([
130 + ' loader: dali',
131 + ' resize: {}, max: {}'.format(self.resize, self.max_size),
132 + ])
133 +
134 + def __len__(self):
135 + return ceil(len(self.ids) // self.world / self.batch_size)
136 +
137 + def __iter__(self):
138 + for _ in range(self.__len__()):
139 +
140 + data, ratios, ids, num_detections = [], [], [], []
141 + dali_data, dali_boxes, dali_labels, dali_ids, dali_attrs, dali_resize_img = self.pipe.run()
142 +
143 + for l in range(len(dali_boxes)):
144 + num_detections.append(dali_boxes.at(l).shape[0])
145 +
146 + pyt_targets = -1 * torch.ones([len(dali_boxes), max(max(num_detections), 1), 5])
147 +
148 + for batch in range(self.batch_size):
149 + id = int(dali_ids.at(batch)[0])
150 +
151 + # Convert dali tensor to pytorch
152 + dali_tensor = dali_data[batch]
153 + tensor_shape = dali_tensor.shape()
154 +
155 + datum = torch.zeros(dali_tensor.shape(), dtype=torch.float, device=torch.device('cuda'))
156 + c_type_pointer = ctypes.c_void_p(datum.data_ptr())
157 + dali_tensor.copy_to_external(c_type_pointer)
158 +
159 + # Calculate image resize ratio to rescale boxes
160 + prior_size = dali_attrs.as_cpu().at(batch)
161 + resized_size = dali_resize_img[batch].shape()
162 + ratio = max(resized_size) / max(prior_size)
163 +
164 + if self.training:
165 + # Rescale boxes
166 + b_arr = dali_boxes.at(batch)
167 + num_dets = b_arr.shape[0]
168 + if num_dets!=0:
169 + pyt_bbox = torch.from_numpy(b_arr).float()
170 +
171 + pyt_bbox[:, 0] *= float(prior_size[1])
172 + pyt_bbox[:, 1] *= float(prior_size[0])
173 + pyt_bbox[:, 2] *= float(prior_size[1])
174 + pyt_bbox[:, 3] *= float(prior_size[0])
175 + # (l,t,r,b) -> (x,y,w,h) == (l,r, r-l, b-t)
176 + pyt_bbox[:, 2] -= pyt_bbox[:, 0]
177 + pyt_bbox[:, 3] -= pyt_bbox[:, 1]
178 + pyt_targets[batch, :num_dets, :4] = pyt_bbox * ratio
179 +
180 + # Arrange labels in target tensor
181 + l_arr = dali_labels.at(batch)
182 + if num_dets!=0:
183 + pyt_label = torch.from_numpy(l_arr).float()
184 + pyt_label -= 1 # Rescale labels to [0,79] instead of [1,80]
185 + pyt_targets[batch, :num_dets, 4] = pyt_label.squeeze()
186 +
187 + ids.append(id)
188 + data.append(datum.unsqueeze(0))
189 + ratios.append(ratio)
190 +
191 + data = torch.cat(data, dim=0)
192 +
193 + if self.training:
194 + pyt_targets = pyt_targets.cuda(non_blocking=True)
195 + yield data, pyt_targets
196 +
197 + else:
198 + ids = torch.Tensor(ids).int().cuda(non_blocking=True)
199 + ratios = torch.Tensor(ratios).cuda(non_blocking=True)
200 + yield data, ids, ratios
201 +
1 +import os
2 +import random
3 +from contextlib import redirect_stdout
4 +from PIL import Image
5 +import torch
6 +import torch.nn.functional as F
7 +from torch.utils import data
8 +from pycocotools.coco import COCO
9 +import math
10 +from torchvision.transforms.functional import adjust_brightness, adjust_contrast, adjust_hue, adjust_saturation
11 +
12 +
13 +class CocoDataset(data.dataset.Dataset):
14 + 'Dataset looping through a set of images'
15 +
16 + def __init__(self, path, resize, max_size, stride, annotations=None, training=False, rotate_augment=False,
17 + augment_brightness=0.0, augment_contrast=0.0,
18 + augment_hue=0.0, augment_saturation=0.0):
19 + super().__init__()
20 +
21 + self.path = os.path.expanduser(path)
22 + self.resize = resize
23 + self.max_size = max_size
24 + self.stride = stride
25 + self.mean = [0.485, 0.456, 0.406]
26 + self.std = [0.229, 0.224, 0.225]
27 + self.training = training
28 + self.rotate_augment = rotate_augment
29 + self.augment_brightness = augment_brightness
30 + self.augment_contrast = augment_contrast
31 + self.augment_hue = augment_hue
32 + self.augment_saturation = augment_saturation
33 +
34 + with redirect_stdout(None):
35 + self.coco = COCO(annotations)
36 + self.ids = list(self.coco.imgs.keys())
37 + if 'categories' in self.coco.dataset:
38 + self.categories_inv = {k: i for i, k in enumerate(self.coco.getCatIds())}
39 +
40 + def __len__(self):
41 + return len(self.ids)
42 +
43 + def __getitem__(self, index):
44 + ' Get sample'
45 +
46 + # Load image
47 + id = self.ids[index]
48 + if self.coco:
49 + image = self.coco.loadImgs(id)[0]['file_name']
50 + im = Image.open('{}/{}'.format(self.path, image)).convert("RGB")
51 +
52 + # Randomly sample scale for resize during training
53 + resize = self.resize
54 + if isinstance(resize, list):
55 + resize = random.randint(self.resize[0], self.resize[-1])
56 +
57 + ratio = resize / min(im.size)
58 + if ratio * max(im.size) > self.max_size:
59 + ratio = self.max_size / max(im.size)
60 + im = im.resize((int(ratio * d) for d in im.size), Image.BILINEAR)
61 +
62 + if self.training:
63 + # Get annotations
64 + boxes, categories = self._get_target(id)
65 + boxes *= ratio
66 +
67 + # Random rotation, if self.rotate_augment
68 + random_angle = random.randint(0, 3) * 90
69 + if self.rotate_augment and random_angle != 0:
70 + # rotate by random_angle degrees.
71 + im = im.rotate(random_angle)
72 + x, y, w, h = boxes[:, 0].clone(), boxes[:, 1].clone(), boxes[:, 2].clone(), boxes[:, 3].clone()
73 + if random_angle == 90:
74 + boxes[:, 0] = y - im.size[1] / 2 + im.size[0] / 2
75 + boxes[:, 1] = im.size[0] / 2 + im.size[1] / 2 - x - w
76 + boxes[:, 2] = h
77 + boxes[:, 3] = w
78 + elif random_angle == 180:
79 + boxes[:, 0] = im.size[0] - x - w
80 + boxes[:, 1] = im.size[1] - y - h
81 + elif random_angle == 270:
82 + boxes[:, 0] = im.size[0] / 2 + im.size[1] / 2 - y - h
83 + boxes[:, 1] = x - im.size[0] / 2 + im.size[1] / 2
84 + boxes[:, 2] = h
85 + boxes[:, 3] = w
86 +
87 + # Random horizontal flip
88 + if random.randint(0, 1):
89 + im = im.transpose(Image.FLIP_LEFT_RIGHT)
90 + boxes[:, 0] = im.size[0] - boxes[:, 0] - boxes[:, 2]
91 +
92 + # Apply image brightness, contrast etc augmentation
93 + if self.augment_brightness:
94 + brightness_factor = random.normalvariate(1, self.augment_brightness)
95 + brightness_factor = max(0, brightness_factor)
96 + im = adjust_brightness(im, brightness_factor)
97 + if self.augment_contrast:
98 + contrast_factor = random.normalvariate(1, self.augment_contrast)
99 + contrast_factor = max(0, contrast_factor)
100 + im = adjust_contrast(im, contrast_factor)
101 + if self.augment_hue:
102 + hue_factor = random.normalvariate(0, self.augment_hue)
103 + hue_factor = max(-0.5, hue_factor)
104 + hue_factor = min(0.5, hue_factor)
105 + im = adjust_hue(im, hue_factor)
106 + if self.augment_saturation:
107 + saturation_factor = random.normalvariate(1, self.augment_saturation)
108 + saturation_factor = max(0, saturation_factor)
109 + im = adjust_saturation(im, saturation_factor)
110 +
111 + target = torch.cat([boxes, categories], dim=1)
112 +
113 + # Convert to tensor and normalize
114 + data = torch.ByteTensor(torch.ByteStorage.from_buffer(im.tobytes()))
115 + data = data.float().div(255).view(*im.size[::-1], len(im.mode))
116 + data = data.permute(2, 0, 1)
117 +
118 + for t, mean, std in zip(data, self.mean, self.std):
119 + t.sub_(mean).div_(std)
120 +
121 + # Apply padding
122 + pw, ph = ((self.stride - d % self.stride) % self.stride for d in im.size)
123 + data = F.pad(data, (0, pw, 0, ph))
124 +
125 + if self.training:
126 + return data, target
127 +
128 + return data, id, ratio
129 +
130 + def _get_target(self, id):
131 + 'Get annotations for sample'
132 +
133 + ann_ids = self.coco.getAnnIds(imgIds=id)
134 + annotations = self.coco.loadAnns(ann_ids)
135 +
136 + boxes, categories = [], []
137 + for ann in annotations:
138 + if ann['bbox'][2] < 1 and ann['bbox'][3] < 1:
139 + continue
140 + boxes.append(ann['bbox'])
141 + cat = ann['category_id']
142 + if 'categories' in self.coco.dataset:
143 + cat = self.categories_inv[cat]
144 + categories.append(cat)
145 +
146 + if boxes:
147 + target = (torch.FloatTensor(boxes),
148 + torch.FloatTensor(categories).unsqueeze(1))
149 + else:
150 + target = (torch.ones([1, 4]), torch.ones([1, 1]) * -1)
151 +
152 + return target
153 +
154 + def collate_fn(self, batch):
155 + 'Create batch from multiple samples'
156 +
157 + if self.training:
158 + data, targets = zip(*batch)
159 + max_det = max([t.size()[0] for t in targets])
160 + targets = [torch.cat([t, torch.ones([max_det - t.size()[0], 5]) * -1]) for t in targets]
161 + targets = torch.stack(targets, 0)
162 + else:
163 + data, indices, ratios = zip(*batch)
164 +
165 + # Pad data to match max batch dimensions
166 + sizes = [d.size()[-2:] for d in data]
167 + w, h = (max(dim) for dim in zip(*sizes))
168 +
169 + data_stack = []
170 + for datum in data:
171 + pw, ph = w - datum.size()[-2], h - datum.size()[-1]
172 + data_stack.append(
173 + F.pad(datum, (0, ph, 0, pw)) if max(ph, pw) > 0 else datum)
174 +
175 + data = torch.stack(data_stack)
176 +
177 + if self.training:
178 + return data, targets
179 +
180 + ratios = torch.FloatTensor(ratios).view(-1, 1, 1)
181 + return data, torch.IntTensor(indices), ratios
182 +
183 +
184 +class DataIterator():
185 + 'Data loader for data parallel'
186 +
187 + def __init__(self, path, resize, max_size, batch_size, stride, world, annotations, training=False,
188 + rotate_augment=False, augment_brightness=0.0,
189 + augment_contrast=0.0, augment_hue=0.0, augment_saturation=0.0):
190 + self.resize = resize
191 + self.max_size = max_size
192 +
193 + self.dataset = CocoDataset(path, resize=resize, max_size=max_size,
194 + stride=stride, annotations=annotations, training=training,
195 + rotate_augment=rotate_augment,
196 + augment_brightness=augment_brightness,
197 + augment_contrast=augment_contrast, augment_hue=augment_hue,
198 + augment_saturation=augment_saturation)
199 + self.ids = self.dataset.ids
200 + self.coco = self.dataset.coco
201 +
202 + self.sampler = data.distributed.DistributedSampler(self.dataset) if world > 1 else None
203 + self.dataloader = data.DataLoader(self.dataset, batch_size=batch_size // world,
204 + sampler=self.sampler, collate_fn=self.dataset.collate_fn, num_workers=2,
205 + pin_memory=True)
206 +
207 + def __repr__(self):
208 + return '\n'.join([
209 + ' loader: pytorch',
210 + ' resize: {}, max: {}'.format(self.resize, self.max_size),
211 + ])
212 +
213 + def __len__(self):
214 + return len(self.dataloader)
215 +
216 + def __iter__(self):
217 + for output in self.dataloader:
218 + if self.dataset.training:
219 + data, target = output
220 + else:
221 + data, ids, ratio = output
222 +
223 + if torch.cuda.is_available():
224 + data = data.cuda(non_blocking=True)
225 +
226 + if self.dataset.training:
227 + if torch.cuda.is_available():
228 + target = target.cuda(non_blocking=True)
229 + yield data, target
230 + else:
231 + if torch.cuda.is_available():
232 + ids = ids.cuda(non_blocking=True)
233 + ratio = ratio.cuda(non_blocking=True)
234 + yield data, ids, ratio
235 +
236 +
237 +class RotatedCocoDataset(data.dataset.Dataset):
238 + 'Dataset looping through a set of images'
239 +
240 + def __init__(self, path, resize, max_size, stride, annotations=None, training=False, rotate_augment=False,
241 + augment_brightness=0.0, augment_contrast=0.0,
242 + augment_hue=0.0, augment_saturation=0.0, absolute_angle=False):
243 + super().__init__()
244 +
245 + self.path = os.path.expanduser(path)
246 + self.resize = resize
247 + self.max_size = max_size
248 + self.stride = stride
249 + self.mean = [0.485, 0.456, 0.406]
250 + self.std = [0.229, 0.224, 0.225]
251 + self.training = training
252 + self.rotate_augment = rotate_augment
253 + self.augment_brightness = augment_brightness
254 + self.augment_contrast = augment_contrast
255 + self.augment_hue = augment_hue
256 + self.augment_saturation = augment_saturation
257 + self.absolute_angle=absolute_angle
258 +
259 + with redirect_stdout(None):
260 + self.coco = COCO(annotations)
261 + self.ids = list(self.coco.imgs.keys())
262 + if 'categories' in self.coco.dataset:
263 + self.categories_inv = {k: i for i, k in enumerate(self.coco.getCatIds())}
264 +
265 + def __len__(self):
266 + return len(self.ids)
267 +
268 + def __getitem__(self, index):
269 + ' Get sample'
270 +
271 + # Load image
272 + id = self.ids[index]
273 + if self.coco:
274 + image = self.coco.loadImgs(id)[0]['file_name']
275 + im = Image.open('{}/{}'.format(self.path, image)).convert("RGB")
276 +
277 + # Randomly sample scale for resize during training
278 + resize = self.resize
279 + if isinstance(resize, list):
280 + resize = random.randint(self.resize[0], self.resize[-1])
281 +
282 + ratio = resize / min(im.size)
283 + if ratio * max(im.size) > self.max_size:
284 + ratio = self.max_size / max(im.size)
285 + im = im.resize((int(ratio * d) for d in im.size), Image.BILINEAR)
286 +
287 + if self.training:
288 + # Get annotations
289 + boxes, categories = self._get_target(id)
290 + # boxes *= ratio
291 + boxes[:, :4] *= ratio
292 +
293 + # Random rotation, if self.rotate_augment
294 + random_angle = random.randint(0, 3) * 90
295 + if self.rotate_augment and random_angle != 0:
296 + # rotate by random_angle degrees.
297 + original_size = im.size
298 + im = im.rotate(random_angle, expand=True)
299 + x, y, w, h, t = boxes[:, 0].clone(), boxes[:, 1].clone(), boxes[:, 2].clone(), \
300 + boxes[:, 3].clone(), boxes[:, 4].clone()
301 + if random_angle == 90:
302 + boxes[:, 0] = y
303 + boxes[:, 1] = original_size[0] - x - w
304 + if not self.absolute_angle:
305 + boxes[:, 2] = h
306 + boxes[:, 3] = w
307 + elif random_angle == 180:
308 + boxes[:, 0] = original_size[0] - x - w
309 + boxes[:, 1] = original_size[1] - y - h
310 +
311 + elif random_angle == 270:
312 + boxes[:, 0] = original_size[1] - y - h
313 + boxes[:, 1] = x
314 + if not self.absolute_angle:
315 + boxes[:, 2] = h
316 + boxes[:, 3] = w
317 +
318 + pass
319 +
320 + # Adjust theta
321 + if self.absolute_angle:
322 + # This is only needed in absolute angle mode.
323 + t += math.radians(random_angle)
324 + rem = torch.remainder(torch.abs(t), math.pi)
325 + sign = torch.sign(t)
326 + t = rem * sign
327 +
328 + boxes[:, 4] = t
329 +
330 + # Random horizontal flip
331 + if random.randint(0, 1):
332 + im = im.transpose(Image.FLIP_LEFT_RIGHT)
333 + boxes[:, 0] = im.size[0] - boxes[:, 0] - boxes[:, 2]
334 + boxes[:, 1] = boxes[:, 1]
335 + boxes[:, 4] = -boxes[:, 4]
336 +
337 + # Apply image brightness, contrast etc augmentation
338 + if self.augment_brightness:
339 + brightness_factor = random.normalvariate(1, self.augment_brightness)
340 + brightness_factor = max(0, brightness_factor)
341 + im = adjust_brightness(im, brightness_factor)
342 + if self.augment_contrast:
343 + contrast_factor = random.normalvariate(1, self.augment_contrast)
344 + contrast_factor = max(0, contrast_factor)
345 + im = adjust_contrast(im, contrast_factor)
346 + if self.augment_hue:
347 + hue_factor = random.normalvariate(0, self.augment_hue)
348 + hue_factor = max(-0.5, hue_factor)
349 + hue_factor = min(0.5, hue_factor)
350 + im = adjust_hue(im, hue_factor)
351 + if self.augment_saturation:
352 + saturation_factor = random.normalvariate(1, self.augment_saturation)
353 + saturation_factor = max(0, saturation_factor)
354 + im = adjust_saturation(im, saturation_factor)
355 +
356 + target = torch.cat([boxes, categories], dim=1)
357 +
358 + # Convert to tensor and normalize
359 + data = torch.ByteTensor(torch.ByteStorage.from_buffer(im.tobytes()))
360 + data = data.float().div(255).view(*im.size[::-1], len(im.mode))
361 + data = data.permute(2, 0, 1)
362 +
363 + for t, mean, std in zip(data, self.mean, self.std):
364 + t.sub_(mean).div_(std)
365 +
366 + # Apply padding
367 + pw, ph = ((self.stride - d % self.stride) % self.stride for d in im.size)
368 + data = F.pad(data, (0, pw, 0, ph))
369 +
370 + if self.training:
371 + return data, target
372 +
373 + return data, id, ratio
374 +
375 + def _get_target(self, id):
376 + 'Get annotations for sample'
377 +
378 + ann_ids = self.coco.getAnnIds(imgIds=id)
379 + annotations = self.coco.loadAnns(ann_ids)
380 +
381 + boxes, categories = [], []
382 + for ann in annotations:
383 + if ann['bbox'][2] < 1 and ann['bbox'][3] < 1:
384 + continue
385 + final_bbox = ann['bbox']
386 + if len(final_bbox) == 4:
387 + final_bbox.append(0.0) # add theta of zero.
388 + assert len(ann['bbox']) == 5, "Bounding box for id %i does not contain five entries." % id
389 + boxes.append(final_bbox)
390 + cat = ann['category_id']
391 + if 'categories' in self.coco.dataset:
392 + cat = self.categories_inv[cat]
393 + categories.append(cat)
394 +
395 + if boxes:
396 + target = (torch.FloatTensor(boxes),
397 + torch.FloatTensor(categories).unsqueeze(1))
398 + else:
399 + target = (torch.ones([1, 5]), torch.ones([1, 1]) * -1)
400 +
401 + return target
402 +
403 + def collate_fn(self, batch):
404 + 'Create batch from multiple samples'
405 +
406 + if self.training:
407 + data, targets = zip(*batch)
408 + max_det = max([t.size()[0] for t in targets])
409 + targets = [torch.cat([t, torch.ones([max_det - t.size()[0], 6]) * -1]) for t in targets]
410 + targets = torch.stack(targets, 0)
411 + else:
412 + data, indices, ratios = zip(*batch)
413 +
414 + # Pad data to match max batch dimensions
415 + sizes = [d.size()[-2:] for d in data]
416 + w, h = (max(dim) for dim in zip(*sizes))
417 +
418 + data_stack = []
419 + for datum in data:
420 + pw, ph = w - datum.size()[-2], h - datum.size()[-1]
421 + data_stack.append(
422 + F.pad(datum, (0, ph, 0, pw)) if max(ph, pw) > 0 else datum)
423 +
424 + data = torch.stack(data_stack)
425 +
426 + if self.training:
427 + return data, targets
428 +
429 + ratios = torch.FloatTensor(ratios).view(-1, 1, 1)
430 + return data, torch.IntTensor(indices), ratios
431 +
432 +
433 +class RotatedDataIterator():
434 + 'Data loader for data parallel'
435 +
436 + def __init__(self, path, resize, max_size, batch_size, stride, world, annotations, training=False,
437 + rotate_augment=False, augment_brightness=0.0,
438 + augment_contrast=0.0, augment_hue=0.0, augment_saturation=0.0, absolute_angle=False
439 + ):
440 + self.resize = resize
441 + self.max_size = max_size
442 +
443 + self.dataset = RotatedCocoDataset(path, resize=resize, max_size=max_size,
444 + stride=stride, annotations=annotations, training=training,
445 + rotate_augment=rotate_augment,
446 + augment_brightness=augment_brightness,
447 + augment_contrast=augment_contrast, augment_hue=augment_hue,
448 + augment_saturation=augment_saturation, absolute_angle=absolute_angle)
449 + self.ids = self.dataset.ids
450 + self.coco = self.dataset.coco
451 +
452 + self.sampler = data.distributed.DistributedSampler(self.dataset) if world > 1 else None
453 + self.dataloader = data.DataLoader(self.dataset, batch_size=batch_size // world,
454 + sampler=self.sampler, collate_fn=self.dataset.collate_fn, num_workers=2,
455 + pin_memory=True)
456 +
457 + def __repr__(self):
458 + return '\n'.join([
459 + ' loader: pytorch',
460 + ' resize: {}, max: {}'.format(self.resize, self.max_size),
461 + ])
462 +
463 + def __len__(self):
464 + return len(self.dataloader)
465 +
466 + def __iter__(self):
467 + for output in self.dataloader:
468 + if self.dataset.training:
469 + data, target = output
470 + else:
471 + data, ids, ratio = output
472 +
473 + if torch.cuda.is_available():
474 + data = data.cuda(non_blocking=True)
475 +
476 + if self.dataset.training:
477 + if torch.cuda.is_available():
478 + target = target.cuda(non_blocking=True)
479 + yield data, target
480 + else:
481 + if torch.cuda.is_available():
482 + ids = ids.cuda(non_blocking=True)
483 + ratio = ratio.cuda(non_blocking=True)
484 + yield data, ids, ratio
1 +import os
2 +import json
3 +import tempfile
4 +from contextlib import redirect_stdout
5 +import torch
6 +from apex import amp
7 +from apex.parallel import DistributedDataParallel as ADDP
8 +from torch.nn.parallel import DistributedDataParallel
9 +from pycocotools.cocoeval import COCOeval
10 +import numpy as np
11 +
12 +from .data import DataIterator, RotatedDataIterator
13 +from .dali import DaliDataIterator
14 +from .model import Model
15 +from .utils import Profiler, rotate_box
16 +
17 +
18 +def infer(model, path, detections_file, resize, max_size, batch_size, mixed_precision=True, is_master=True, world=0,
19 + annotations=None, no_apex=False, use_dali=True, is_validation=False, verbose=True, rotated_bbox=False):
20 + 'Run inference on images from path'
21 +
22 + DDP = DistributedDataParallel if no_apex else ADDP
23 + backend = 'pytorch' if isinstance(model, Model) or isinstance(model, DDP) else 'tensorrt'
24 +
25 + stride = model.module.stride if isinstance(model, DDP) else model.stride
26 +
27 + # Create annotations if none was provided
28 + if not annotations:
29 + annotations = tempfile.mktemp('.json')
30 + images = [{'id': i, 'file_name': f} for i, f in enumerate(os.listdir(path))]
31 + json.dump({'images': images}, open(annotations, 'w'))
32 +
33 + # TensorRT only supports fixed input sizes, so override input size accordingly
34 + if backend == 'tensorrt': max_size = max(model.input_size)
35 +
36 + # Prepare dataset
37 + if verbose: print('Preparing dataset...')
38 + if rotated_bbox:
39 + if use_dali: raise NotImplementedError("This repo does not currently support DALI for rotated bbox.")
40 + data_iterator = RotatedDataIterator(path, resize, max_size, batch_size, stride,
41 + world, annotations, training=False)
42 + else:
43 + data_iterator = (DaliDataIterator if use_dali else DataIterator)(
44 + path, resize, max_size, batch_size, stride,
45 + world, annotations, training=False)
46 + if verbose: print(data_iterator)
47 +
48 + # Prepare model
49 + if backend == 'pytorch':
50 + # If we are doing validation during training,
51 + # no need to register model with AMP again
52 + if not is_validation:
53 + if torch.cuda.is_available(): model = model.to(memory_format=torch.channels_last).cuda()
54 + if not no_apex:
55 + model = amp.initialize(model, None,
56 + opt_level='O2' if mixed_precision else 'O0',
57 + keep_batchnorm_fp32=True,
58 + verbosity=0)
59 +
60 + model.eval()
61 +
62 + if verbose:
63 + print(' backend: {}'.format(backend))
64 + print(' device: {} {}'.format(
65 + world, 'cpu' if not torch.cuda.is_available() else 'GPU' if world == 1 else 'GPUs'))
66 + print(' batch: {}, precision: {}'.format(batch_size,
67 + 'unknown' if backend == 'tensorrt' else 'mixed' if mixed_precision else 'full'))
68 + print(' BBOX type:', 'rotated' if rotated_bbox else 'axis aligned')
69 + print('Running inference...')
70 +
71 + results = []
72 + profiler = Profiler(['infer', 'fw'])
73 + with torch.no_grad():
74 + for i, (data, ids, ratios) in enumerate(data_iterator):
75 + # Forward pass
76 + if backend=='pytorch': data = data.contiguous(memory_format=torch.channels_last)
77 + profiler.start('fw')
78 + scores, boxes, classes = model(data, rotated_bbox) #Need to add model size (B, 3, W, H)
79 + profiler.stop('fw')
80 +
81 + results.append([scores, boxes, classes, ids, ratios])
82 +
83 + profiler.bump('infer')
84 + if verbose and (profiler.totals['infer'] > 60 or i == len(data_iterator) - 1):
85 + size = len(data_iterator.ids)
86 + msg = '[{:{len}}/{}]'.format(min((i + 1) * batch_size,
87 + size), size, len=len(str(size)))
88 + msg += ' {:.3f}s/{}-batch'.format(profiler.means['infer'], batch_size)
89 + msg += ' (fw: {:.3f}s)'.format(profiler.means['fw'])
90 + msg += ', {:.1f} im/s'.format(batch_size / profiler.means['infer'])
91 + print(msg, flush=True)
92 +
93 + profiler.reset()
94 +
95 + # Gather results from all devices
96 + if verbose: print('Gathering results...')
97 + results = [torch.cat(r, dim=0) for r in zip(*results)]
98 + if world > 1:
99 + for r, result in enumerate(results):
100 + all_result = [torch.ones_like(result, device=result.device) for _ in range(world)]
101 + torch.distributed.all_gather(list(all_result), result)
102 + results[r] = torch.cat(all_result, dim=0)
103 +
104 + if is_master:
105 + # Copy buffers back to host
106 + results = [r.cpu() for r in results]
107 +
108 + # Collect detections
109 + detections = []
110 + processed_ids = set()
111 + for scores, boxes, classes, image_id, ratios in zip(*results):
112 + image_id = image_id.item()
113 + if image_id in processed_ids:
114 + continue
115 + processed_ids.add(image_id)
116 +
117 + keep = (scores > 0).nonzero(as_tuple=False)
118 + scores = scores[keep].view(-1)
119 + if rotated_bbox:
120 + boxes = boxes[keep, :].view(-1, 6)
121 + boxes[:, :4] /= ratios
122 + else:
123 + boxes = boxes[keep, :].view(-1, 4) / ratios
124 + classes = classes[keep].view(-1).int()
125 +
126 + for score, box, cat in zip(scores, boxes, classes):
127 + if rotated_bbox:
128 + x1, y1, x2, y2, sin, cos = box.data.tolist()
129 + theta = np.arctan2(sin, cos)
130 + w = x2 - x1 + 1
131 + h = y2 - y1 + 1
132 + seg = rotate_box([x1, y1, w, h, theta])
133 + else:
134 + x1, y1, x2, y2 = box.data.tolist()
135 + cat = cat.item()
136 + if 'annotations' in data_iterator.coco.dataset:
137 + cat = data_iterator.coco.getCatIds()[cat]
138 + this_det = {
139 + 'image_id': image_id,
140 + 'score': score.item(),
141 + 'category_id': cat}
142 + if rotated_bbox:
143 + this_det['bbox'] = [x1, y1, x2 - x1 + 1, y2 - y1 + 1, theta]
144 + this_det['segmentation'] = [seg]
145 + else:
146 + this_det['bbox'] = [x1, y1, x2 - x1 + 1, y2 - y1 + 1]
147 +
148 + detections.append(this_det)
149 +
150 + if detections:
151 + # Save detections
152 + if detections_file and verbose: print('Writing {}...'.format(detections_file))
153 + detections = {'annotations': detections}
154 + detections['images'] = data_iterator.coco.dataset['images']
155 + if 'categories' in data_iterator.coco.dataset:
156 + detections['categories'] = data_iterator.coco.dataset['categories']
157 + if detections_file:
158 + for d_file in detections_file:
159 + json.dump(detections, open(d_file, 'w'), indent=4)
160 +
161 + # Evaluate model on dataset
162 + if 'annotations' in data_iterator.coco.dataset:
163 + if verbose: print('Evaluating model...')
164 + with redirect_stdout(None):
165 + coco_pred = data_iterator.coco.loadRes(detections['annotations'])
166 + if rotated_bbox:
167 + coco_eval = COCOeval(data_iterator.coco, coco_pred, 'segm')
168 + else:
169 + coco_eval = COCOeval(data_iterator.coco, coco_pred, 'bbox')
170 + coco_eval.evaluate()
171 + coco_eval.accumulate()
172 + coco_eval.summarize()
173 + return coco_eval.stats # mAP and mAR
174 + else:
175 + print('No detections!')
176 + return None
177 + return 0
1 +import torch
2 +import torch.nn as nn
3 +import torch.nn.functional as F
4 +
5 +class FocalLoss(nn.Module):
6 + 'Focal Loss - https://arxiv.org/abs/1708.02002'
7 +
8 + def __init__(self, alpha=0.25, gamma=2):
9 + super().__init__()
10 + self.alpha = alpha
11 + self.gamma = gamma
12 +
13 + def forward(self, pred_logits, target):
14 + pred = pred_logits.sigmoid()
15 + ce = F.binary_cross_entropy_with_logits(pred_logits, target, reduction='none')
16 + alpha = target * self.alpha + (1. - target) * (1. - self.alpha)
17 + pt = torch.where(target == 1, pred, 1 - pred)
18 + return alpha * (1. - pt) ** self.gamma * ce
19 +
20 +class SmoothL1Loss(nn.Module):
21 + 'Smooth L1 Loss'
22 +
23 + def __init__(self, beta=0.11):
24 + super().__init__()
25 + self.beta = beta
26 +
27 + def forward(self, pred, target):
28 + x = (pred - target).abs()
29 + l1 = x - 0.5 * self.beta
30 + l2 = 0.5 * x ** 2 / self.beta
31 + return torch.where(x >= self.beta, l1, l2)
1 +#!/usr/bin/env python3
2 +import sys
3 +import os
4 +import argparse
5 +import random
6 +import torch.cuda
7 +import torch.distributed
8 +import torch.multiprocessing
9 +
10 +from odtk import infer, train, utils
11 +from odtk.model import Model
12 +from odtk._C import Engine
13 +
14 +
15 +def parse(args):
16 + parser = argparse.ArgumentParser(description='ODTK: Object Detection Toolkit.')
17 + parser.add_argument('--master', metavar='address:port', type=str, help='Address and port of the master worker',
18 + default='127.0.0.1:29500')
19 +
20 + subparsers = parser.add_subparsers(help='sub-command', dest='command')
21 + subparsers.required = True
22 +
23 + devcount = max(1, torch.cuda.device_count())
24 +
25 + parser_train = subparsers.add_parser('train', help='train a network')
26 + parser_train.add_argument('model', type=str, help='path to output model or checkpoint to resume from')
27 + parser_train.add_argument('--annotations', metavar='path', type=str, help='path to COCO style annotations',
28 + required=True)
29 + parser_train.add_argument('--images', metavar='path', type=str, help='path to images', default='.')
30 + parser_train.add_argument('--backbone', action='store', type=str, nargs='+', help='backbone model (or list of)',
31 + default=['ResNet50FPN'])
32 + parser_train.add_argument('--classes', metavar='num', type=int, help='number of classes', default=80)
33 + parser_train.add_argument('--batch', metavar='size', type=int, help='batch size', default=2 * devcount)
34 + parser_train.add_argument('--resize', metavar='scale', type=int, help='resize to given size', default=800)
35 + parser_train.add_argument('--max-size', metavar='max', type=int, help='maximum resizing size', default=1333)
36 + parser_train.add_argument('--jitter', metavar='min max', type=int, nargs=2, help='jitter size within range',
37 + default=[640, 1024])
38 + parser_train.add_argument('--iters', metavar='number', type=int, help='number of iterations to train for',
39 + default=90000)
40 + parser_train.add_argument('--milestones', action='store', type=int, nargs='*',
41 + help='list of iteration indices where learning rate decays', default=[60000, 80000])
42 + parser_train.add_argument('--schedule', metavar='scale', type=float,
43 + help='scale schedule (affecting iters and milestones)', default=1)
44 + parser_train.add_argument('--full-precision', help='train in full precision', action='store_true')
45 + parser_train.add_argument('--lr', metavar='value', help='learning rate', type=float, default=0.01)
46 + parser_train.add_argument('--warmup', metavar='iterations', help='numer of warmup iterations', type=int,
47 + default=1000)
48 + parser_train.add_argument('--gamma', metavar='value', type=float,
49 + help='multiplicative factor of learning rate decay', default=0.1)
50 + parser_train.add_argument('--override', help='override model', action='store_true')
51 + parser_train.add_argument('--val-annotations', metavar='path', type=str,
52 + help='path to COCO style validation annotations')
53 + parser_train.add_argument('--val-images', metavar='path', type=str, help='path to validation images')
54 + parser_train.add_argument('--post-metrics', metavar='url', type=str, help='post metrics to specified url')
55 + parser_train.add_argument('--fine-tune', metavar='path', type=str, help='fine tune a pretrained model')
56 + parser_train.add_argument('--logdir', metavar='logdir', type=str, help='directory where to write logs')
57 + parser_train.add_argument('--val-iters', metavar='number', type=int,
58 + help='number of iterations between each validation', default=8000)
59 + parser_train.add_argument('--no-apex', help='use Pytorch native AMP and DDP', action='store_true')
60 + parser_train.add_argument('--with-dali', help='use dali for data loading', action='store_true')
61 + parser_train.add_argument('--augment-rotate', help='use four-fold rotational augmentation', action='store_true')
62 + parser_train.add_argument('--augment-free-rotate', type=float, metavar='value value', nargs=2, default=[0, 0],
63 + help='rotate images by an arbitrary angle, between min and max (in degrees)')
64 + parser_train.add_argument('--augment-brightness', metavar='value', type=float,
65 + help='adjust the brightness of the image.', default=0.002)
66 + parser_train.add_argument('--augment-contrast', metavar='value', type=float,
67 + help='adjust the contrast of the image.', default=0.002)
68 + parser_train.add_argument('--augment-hue', metavar='value', type=float,
69 + help='adjust the hue of the image.', default=0.0002)
70 + parser_train.add_argument('--augment-saturation', metavar='value', type=float,
71 + help='adjust the saturation of the image.', default=0.002)
72 + parser_train.add_argument('--regularization-l2', metavar='value', type=float, help='L2 regularization for optim',
73 + default=0.0001)
74 + parser_train.add_argument('--rotated-bbox', help='detect rotated bounding boxes [x, y, w, h, theta]',
75 + action='store_true')
76 + parser_train.add_argument('--anchor-ious', metavar='value value', type=float, nargs=2,
77 + help='anchor/bbox overlap threshold', default=[0.4, 0.5])
78 + parser_train.add_argument('--absolute-angle', help='regress absolute angle (rather than -45 to 45 degrees.',
79 + action='store_true')
80 +
81 + parser_infer = subparsers.add_parser('infer', help='run inference')
82 + parser_infer.add_argument('model', type=str, help='path to model')
83 + parser_infer.add_argument('--images', metavar='path', type=str, help='path to images', default='.')
84 + parser_infer.add_argument('--annotations', metavar='annotations', type=str,
85 + help='evaluate using provided annotations')
86 + parser_infer.add_argument('--output', metavar='file', type=str, nargs='+',
87 + help='save detections to specified JSON file(s)', default=['detections.json'])
88 + parser_infer.add_argument('--batch', metavar='size', type=int, help='batch size', default=2 * devcount)
89 + parser_infer.add_argument('--resize', metavar='scale', type=int, help='resize to given size', default=800)
90 + parser_infer.add_argument('--max-size', metavar='max', type=int, help='maximum resizing size', default=1333)
91 + parser_infer.add_argument('--no-apex', help='use Pytorch native AMP and DDP', action='store_true')
92 + parser_infer.add_argument('--with-dali', help='use dali for data loading', action='store_true')
93 + parser_infer.add_argument('--full-precision', help='inference in full precision', action='store_true')
94 + parser_infer.add_argument('--rotated-bbox', help='inference using a rotated bounding box model',
95 + action='store_true')
96 +
97 + parser_export = subparsers.add_parser('export', help='export a model into a TensorRT engine')
98 + parser_export.add_argument('model', type=str, help='path to model')
99 + parser_export.add_argument('export', type=str, help='path to exported output')
100 + parser_export.add_argument('--size', metavar='height width', type=int, nargs='+',
101 + help='input size (square) or sizes (h w) to use when generating TensorRT engine',
102 + default=[1280])
103 + parser_export.add_argument('--full-precision', help='export in full instead of half precision', action='store_true')
104 + parser_export.add_argument('--int8', help='calibrate model and export in int8 precision', action='store_true')
105 + parser_export.add_argument('--calibration-batches', metavar='size', type=int,
106 + help='number of batches to use for int8 calibration', default=2)
107 + parser_export.add_argument('--calibration-images', metavar='path', type=str,
108 + help='path to calibration images to use for int8 calibration', default="")
109 + parser_export.add_argument('--calibration-table', metavar='path', type=str,
110 + help='path of existing calibration table to load from, or name of new calibration table',
111 + default="")
112 + parser_export.add_argument('--verbose', help='enable verbose logging', action='store_true')
113 + parser_export.add_argument('--rotated-bbox', help='inference using a rotated bounding box model',
114 + action='store_true')
115 + parser_export.add_argument('--dynamic-batch-opts', help='Profile batch sizes for tensorrt engine export (min, opt, max)',
116 + metavar='value value value', type=int, nargs=3, default=[1,8,16])
117 +
118 + return parser.parse_args(args)
119 +
120 +
121 +def load_model(args, verbose=False):
122 + if args.command != 'train' and not os.path.isfile(args.model):
123 + raise RuntimeError('Model file {} does not exist!'.format(args.model))
124 +
125 + model = None
126 + state = {}
127 + _, ext = os.path.splitext(args.model)
128 +
129 + if args.command == 'train' and (not os.path.exists(args.model) or args.override):
130 + if verbose: print('Initializing model...')
131 + model = Model(backbones=args.backbone, classes=args.classes, rotated_bbox=args.rotated_bbox,
132 + anchor_ious=args.anchor_ious)
133 + model.initialize(args.fine_tune)
134 + # Freeze unused params from training
135 + for n, p in model.named_parameters():
136 + if any(i in n for i in model.unused_modules):
137 + p.requires_grad = False
138 + if verbose: print(model)
139 +
140 + elif ext == '.pth' or ext == '.torch':
141 + if verbose: print('Loading model from {}...'.format(os.path.basename(args.model)))
142 + model, state = Model.load(filename=args.model, rotated_bbox=args.rotated_bbox)
143 + if verbose: print(model)
144 +
145 + elif args.command == 'infer' and ext in ['.engine', '.plan']:
146 + model = None
147 +
148 + else:
149 + raise RuntimeError('Invalid model format "{}"!'.format(ext))
150 +
151 + state['path'] = args.model
152 + return model, state
153 +
154 +
155 +def worker(rank, args, world, model, state):
156 + 'Per-device distributed worker'
157 +
158 + if torch.cuda.is_available():
159 + os.environ.update({
160 + 'MASTER_PORT': args.master.split(':')[-1],
161 + 'MASTER_ADDR': ':'.join(args.master.split(':')[:-1]),
162 + 'WORLD_SIZE': str(world),
163 + 'RANK': str(rank),
164 + 'CUDA_DEVICE': str(rank)
165 + })
166 +
167 + torch.cuda.set_device(rank)
168 + torch.distributed.init_process_group(backend='nccl', init_method='env://')
169 +
170 + if (args.command != 'export') and (args.batch % world != 0):
171 + raise RuntimeError('Batch size should be a multiple of the number of GPUs')
172 +
173 + if model and model.angles is not None:
174 + args.rotated_bbox = True
175 +
176 + if args.command == 'train':
177 + train.train(model, state, args.images, args.annotations,
178 + args.val_images or args.images, args.val_annotations, args.resize, args.max_size, args.jitter,
179 + args.batch, int(args.iters * args.schedule), args.val_iters, not args.full_precision, args.lr,
180 + args.warmup, [int(m * args.schedule) for m in args.milestones], args.gamma,
181 + rank, world=world, no_apex=args.no_apex, use_dali=args.with_dali,
182 + metrics_url=args.post_metrics, logdir=args.logdir, verbose=(rank == 0),
183 + rotate_augment=args.augment_rotate,
184 + augment_brightness=args.augment_brightness, augment_contrast=args.augment_contrast,
185 + augment_hue=args.augment_hue, augment_saturation=args.augment_saturation,
186 + regularization_l2=args.regularization_l2, rotated_bbox=args.rotated_bbox, absolute_angle=args.absolute_angle)
187 +
188 + elif args.command == 'infer':
189 + if model is None:
190 + if rank == 0: print('Loading CUDA engine from {}...'.format(os.path.basename(args.model)))
191 + model = Engine.load(args.model)
192 +
193 + infer.infer(model, args.images, args.output, args.resize, args.max_size, args.batch,
194 + annotations=args.annotations, mixed_precision=not args.full_precision,
195 + is_master=(rank == 0), world=world, no_apex=args.no_apex, use_dali=args.with_dali,
196 + verbose=(rank == 0), rotated_bbox=args.rotated_bbox)
197 +
198 + elif args.command == 'export':
199 + onnx_only = args.export.split('.')[-1] == 'onnx'
200 + input_size = args.size * 2 if len(args.size) == 1 else args.size
201 +
202 + calibration_files = []
203 + if args.int8:
204 + # Get list of images to use for calibration
205 + if os.path.isdir(args.calibration_images):
206 + import glob
207 + file_extensions = ['.jpg', '.JPG', '.jpeg', '.JPEG', '.png', '.PNG']
208 + for ex in file_extensions:
209 + calibration_files += glob.glob("{}/*{}".format(args.calibration_images, ex), recursive=True)
210 + # Only need enough images for specified num of calibration batches
211 + if len(calibration_files) >= args.calibration_batches * args.dynamic_batch_opts[1]:
212 + calibration_files = calibration_files[:(args.calibration_batches * args.dynamic_batch_opts[1])]
213 + else:
214 + # Number of images for calibration must be greater than or equal to the kOPT optimization profile
215 + if len(calibration_files) >= args.dynamic_batch_opts[1]:
216 + print('Only found enough images for {} batches. Continuing anyway...'.format(
217 + len(calibration_files) // args.dynamic_batch_opts[1]))
218 + else:
219 + raise RuntimeError('Not enough images found for calibration. ({} < {})'
220 + .format(len(calibration_files), args.dynamic_batch_opts[1]))
221 +
222 + random.shuffle(calibration_files)
223 +
224 + precision = "FP32"
225 + if args.int8:
226 + precision = "INT8"
227 + elif not args.full_precision:
228 + precision = "FP16"
229 +
230 + exported = model.export(input_size, args.dynamic_batch_opts, precision, calibration_files,
231 + args.calibration_table, args.verbose, onnx_only=onnx_only)
232 + if onnx_only:
233 + with open(args.export, 'wb') as out:
234 + out.write(exported)
235 + else:
236 + exported.save(args.export)
237 +
238 +
239 +def main(args=None):
240 + 'Entry point for the odtk command'
241 +
242 + args = parse(args or sys.argv[1:])
243 +
244 + model, state = load_model(args, verbose=True)
245 + if model: model.share_memory()
246 +
247 + world = torch.cuda.device_count()
248 + if args.command == 'export' or world <= 1:
249 + worker(0, args, 1, model, state)
250 + else:
251 + torch.multiprocessing.spawn(worker, args=(args, world, model, state), nprocs=world)
252 +
253 +
254 +if __name__ == '__main__':
255 + main()
1 +import os.path
2 +import io
3 +import numpy as np
4 +import math
5 +import torch
6 +import torch.nn as nn
7 +
8 +from . import backbones as backbones_mod
9 +from ._C import Engine
10 +from .box import generate_anchors, snap_to_anchors, decode, nms
11 +from .box import generate_anchors_rotated, snap_to_anchors_rotated, nms_rotated
12 +from .loss import FocalLoss, SmoothL1Loss
13 +
14 +
15 +class Model(nn.Module):
16 + 'RetinaNet - https://arxiv.org/abs/1708.02002'
17 +
18 + def __init__(
19 + self,
20 + backbones='ResNet50FPN',
21 + classes=80,
22 + ratios=[1.0, 2.0, 0.5],
23 + scales=[4 * 2 ** (i / 3) for i in range(3)],
24 + angles=None,
25 + rotated_bbox=False,
26 + anchor_ious=[0.4, 0.5],
27 + config={}
28 + ):
29 + super().__init__()
30 +
31 + if not isinstance(backbones, list):
32 + backbones = [backbones]
33 +
34 + self.backbones = nn.ModuleDict({b: getattr(backbones_mod, b)() for b in backbones})
35 + self.name = 'RetinaNet'
36 + self.unused_modules = []
37 + for b in backbones: self.unused_modules.extend(getattr(self.backbones, b).features.unused_modules)
38 + self.exporting = False
39 + self.rotated_bbox = rotated_bbox
40 + self.anchor_ious = anchor_ious
41 +
42 + self.ratios = ratios
43 + self.scales = scales
44 + self.angles = angles if angles is not None else \
45 + [-np.pi / 6, 0, np.pi / 6] if self.rotated_bbox else None
46 + self.anchors = {}
47 + self.classes = classes
48 +
49 + self.threshold = config.get('threshold', 0.05)
50 + self.top_n = config.get('top_n', 1000)
51 + self.nms = config.get('nms', 0.5)
52 + self.detections = config.get('detections', 100)
53 +
54 + self.stride = max([b.stride for _, b in self.backbones.items()])
55 +
56 + # classification and box regression heads
57 + def make_head(out_size):
58 + layers = []
59 + for _ in range(4):
60 + layers += [nn.Conv2d(256, 256, 3, padding=1), nn.ReLU()]
61 + layers += [nn.Conv2d(256, out_size, 3, padding=1)]
62 + return nn.Sequential(*layers)
63 +
64 + self.num_anchors = len(self.ratios) * len(self.scales)
65 + self.num_anchors = self.num_anchors if not self.rotated_bbox else (self.num_anchors * len(self.angles))
66 + self.cls_head = make_head(classes * self.num_anchors)
67 + self.box_head = make_head(4 * self.num_anchors) if not self.rotated_bbox \
68 + else make_head(6 * self.num_anchors) # theta -> cos(theta), sin(theta)
69 +
70 + self.cls_criterion = FocalLoss()
71 + self.box_criterion = SmoothL1Loss(beta=0.11)
72 +
73 + def __repr__(self):
74 + return '\n'.join([
75 + ' model: {}'.format(self.name),
76 + ' backbone: {}'.format(', '.join([k for k, _ in self.backbones.items()])),
77 + ' classes: {}, anchors: {}'.format(self.classes, self.num_anchors)
78 + ])
79 +
80 + def initialize(self, pre_trained):
81 + if pre_trained:
82 + # Initialize using weights from pre-trained model
83 + if not os.path.isfile(pre_trained):
84 + raise ValueError('No checkpoint {}'.format(pre_trained))
85 +
86 + print('Fine-tuning weights from {}...'.format(os.path.basename(pre_trained)))
87 + state_dict = self.state_dict()
88 + chk = torch.load(pre_trained, map_location=lambda storage, loc: storage)
89 + ignored = ['cls_head.8.bias', 'cls_head.8.weight']
90 + if self.rotated_bbox:
91 + ignored += ['box_head.8.bias', 'box_head.8.weight']
92 + weights = {k: v for k, v in chk['state_dict'].items() if k not in ignored}
93 + state_dict.update(weights)
94 + self.load_state_dict(state_dict)
95 +
96 + del chk, weights
97 + torch.cuda.empty_cache()
98 +
99 + else:
100 + # Initialize backbone(s)
101 + for _, backbone in self.backbones.items():
102 + backbone.initialize()
103 +
104 + # Initialize heads
105 + def initialize_layer(layer):
106 + if isinstance(layer, nn.Conv2d):
107 + nn.init.normal_(layer.weight, std=0.01)
108 + if layer.bias is not None:
109 + nn.init.constant_(layer.bias, val=0)
110 +
111 + self.cls_head.apply(initialize_layer)
112 + self.box_head.apply(initialize_layer)
113 +
114 + # Initialize class head prior
115 + def initialize_prior(layer):
116 + pi = 0.01
117 + b = - math.log((1 - pi) / pi)
118 + nn.init.constant_(layer.bias, b)
119 + nn.init.normal_(layer.weight, std=0.01)
120 +
121 + self.cls_head[-1].apply(initialize_prior)
122 + if self.rotated_bbox:
123 + self.box_head[-1].apply(initialize_prior)
124 +
125 + def forward(self, x, rotated_bbox=None):
126 + if self.training: x, targets = x
127 +
128 + # Backbones forward pass
129 + features = []
130 + for _, backbone in self.backbones.items():
131 + features.extend(backbone(x))
132 +
133 + # Heads forward pass
134 + cls_heads = [self.cls_head(t) for t in features]
135 + box_heads = [self.box_head(t) for t in features]
136 +
137 + if self.training:
138 + return self._compute_loss(x, cls_heads, box_heads, targets.float())
139 +
140 + cls_heads = [cls_head.sigmoid() for cls_head in cls_heads]
141 +
142 + if self.exporting:
143 + self.strides = [x.shape[-1] // cls_head.shape[-1] for cls_head in cls_heads]
144 + return cls_heads, box_heads
145 +
146 + global nms, generate_anchors
147 + if self.rotated_bbox:
148 + nms = nms_rotated
149 + generate_anchors = generate_anchors_rotated
150 +
151 + # Inference post-processing
152 + decoded = []
153 + for cls_head, box_head in zip(cls_heads, box_heads):
154 + # Generate level's anchors
155 + stride = x.shape[-1] // cls_head.shape[-1]
156 + if stride not in self.anchors:
157 + self.anchors[stride] = generate_anchors(stride, self.ratios, self.scales, self.angles)
158 +
159 + # Decode and filter boxes
160 + decoded.append(decode(cls_head.contiguous(), box_head.contiguous(), stride, self.threshold,
161 + self.top_n, self.anchors[stride], self.rotated_bbox))
162 +
163 + # Perform non-maximum suppression
164 + decoded = [torch.cat(tensors, 1) for tensors in zip(*decoded)]
165 + return nms(*decoded, self.nms, self.detections)
166 +
167 + def _extract_targets(self, targets, stride, size):
168 + global generate_anchors, snap_to_anchors
169 + if self.rotated_bbox:
170 + generate_anchors = generate_anchors_rotated
171 + snap_to_anchors = snap_to_anchors_rotated
172 + cls_target, box_target, depth = [], [], []
173 + for target in targets:
174 + target = target[target[:, -1] > -1]
175 + if stride not in self.anchors:
176 + self.anchors[stride] = generate_anchors(stride, self.ratios, self.scales, self.angles)
177 +
178 + anchors = self.anchors[stride]
179 + if not self.rotated_bbox:
180 + anchors = anchors.to(targets.device)
181 + snapped = snap_to_anchors(target, [s * stride for s in size[::-1]], stride,
182 + anchors, self.classes, targets.device, self.anchor_ious)
183 + for l, s in zip((cls_target, box_target, depth), snapped): l.append(s)
184 + return torch.stack(cls_target), torch.stack(box_target), torch.stack(depth)
185 +
186 + def _compute_loss(self, x, cls_heads, box_heads, targets):
187 + cls_losses, box_losses, fg_targets = [], [], []
188 + for cls_head, box_head in zip(cls_heads, box_heads):
189 + size = cls_head.shape[-2:]
190 + stride = x.shape[-1] / cls_head.shape[-1]
191 +
192 + cls_target, box_target, depth = self._extract_targets(targets, stride, size)
193 + fg_targets.append((depth > 0).sum().float().clamp(min=1))
194 +
195 + cls_head = cls_head.view_as(cls_target).float()
196 + cls_mask = (depth >= 0).expand_as(cls_target).float()
197 + cls_loss = self.cls_criterion(cls_head, cls_target)
198 + cls_loss = cls_mask * cls_loss
199 + cls_losses.append(cls_loss.sum())
200 +
201 + box_head = box_head.view_as(box_target).float()
202 + box_mask = (depth > 0).expand_as(box_target).float()
203 + box_loss = self.box_criterion(box_head, box_target)
204 + box_loss = box_mask * box_loss
205 + box_losses.append(box_loss.sum())
206 +
207 + fg_targets = torch.stack(fg_targets).sum()
208 + cls_loss = torch.stack(cls_losses).sum() / fg_targets
209 + box_loss = torch.stack(box_losses).sum() / fg_targets
210 + return cls_loss, box_loss
211 +
212 + def save(self, state):
213 + checkpoint = {
214 + 'backbone': [k for k, _ in self.backbones.items()],
215 + 'classes': self.classes,
216 + 'state_dict': self.state_dict(),
217 + 'ratios': self.ratios,
218 + 'scales': self.scales
219 + }
220 + if self.rotated_bbox and self.angles:
221 + checkpoint['angles'] = self.angles
222 +
223 + for key in ('iteration', 'optimizer', 'scheduler'):
224 + if key in state:
225 + checkpoint[key] = state[key]
226 +
227 + torch.save(checkpoint, state['path'])
228 +
229 + @classmethod
230 + def load(cls, filename, rotated_bbox=False):
231 + if not os.path.isfile(filename):
232 + raise ValueError('No checkpoint {}'.format(filename))
233 +
234 + checkpoint = torch.load(filename, map_location=lambda storage, loc: storage)
235 + kwargs = {}
236 + for i in ['ratios', 'scales', 'angles']:
237 + if i in checkpoint:
238 + kwargs[i] = checkpoint[i]
239 + if ('angles' in checkpoint) or rotated_bbox:
240 + kwargs['rotated_bbox'] = True
241 + # Recreate model from checkpoint instead of from individual backbones
242 + model = cls(backbones=checkpoint['backbone'], classes=checkpoint['classes'], **kwargs)
243 + model.load_state_dict(checkpoint['state_dict'])
244 +
245 + state = {}
246 + for key in ('iteration', 'optimizer', 'scheduler'):
247 + if key in checkpoint:
248 + state[key] = checkpoint[key]
249 +
250 + del checkpoint
251 + torch.cuda.empty_cache()
252 +
253 + return model, state
254 +
255 + def export(self, size, dynamic_batch_opts, precision, calibration_files, calibration_table, verbose, onnx_only=False):
256 +
257 + # import torch.onnx.symbolic_opset11 as onnx_symbolic
258 + # def upsample_nearest2d(g, input, output_size, *args):
259 + # # Currently, TRT 7.1 ONNX Parser does not support all ONNX ops
260 + # # needed to support dynamic upsampling ONNX forumlation
261 + # # Here we hardcode scale=2 as a temporary workaround
262 + # scales = g.op("Constant", value_t=torch.tensor([1., 1., 2., 2.]))
263 + # empty_tensor = g.op("Constant", value_t=torch.tensor([], dtype=torch.float32))
264 + # return g.op("Resize", input, empty_tensor, scales, mode_s="nearest", nearest_mode_s="floor")
265 +
266 + # onnx_symbolic.upsample_nearest2d = upsample_nearest2d
267 +
268 + # Export to ONNX
269 + print('Exporting to ONNX...')
270 + self.exporting = True
271 + onnx_bytes = io.BytesIO()
272 + zero_input = torch.zeros([1, 3, *size]).cuda()
273 + input_names = ['input_1']
274 + output_names = ['score_1', 'score_2', 'score_3', 'score_4', 'score_5',
275 + 'box_1', 'box_2', 'box_3', 'box_4', 'box_5']
276 + dynamic_axes = {input_names[0]: {0:'batch'}}
277 + for _, name in enumerate(output_names):
278 + dynamic_axes[name] = dynamic_axes[input_names[0]]
279 + extra_args = {'opset_version': 12, 'verbose': verbose,
280 + 'input_names': input_names, 'output_names': output_names,
281 + 'dynamic_axes': dynamic_axes}
282 + torch.onnx.export(self.cuda(), zero_input, onnx_bytes, **extra_args)
283 + self.exporting = False
284 +
285 + if onnx_only:
286 + return onnx_bytes.getvalue()
287 +
288 + # Build TensorRT engine
289 + model_name = '_'.join([k for k, _ in self.backbones.items()])
290 + anchors = []
291 + if not self.rotated_bbox:
292 + anchors = [generate_anchors(stride, self.ratios, self.scales,
293 + self.angles).view(-1).tolist() for stride in self.strides]
294 + else:
295 + anchors = [generate_anchors_rotated(stride, self.ratios, self.scales,
296 + self.angles)[0].view(-1).tolist() for stride in self.strides]
297 +
298 + return Engine(onnx_bytes.getvalue(), len(onnx_bytes.getvalue()), dynamic_batch_opts, precision,
299 + self.threshold, self.top_n, anchors, self.rotated_bbox, self.nms, self.detections,
300 + calibration_files, model_name, calibration_table, verbose)
1 +from statistics import mean
2 +from math import isfinite
3 +import torch
4 +from torch.optim import SGD, AdamW
5 +from torch.optim.lr_scheduler import LambdaLR, SAVE_STATE_WARNING
6 +from apex import amp, optimizers
7 +from apex.parallel import DistributedDataParallel as ADDP
8 +from torch.nn.parallel import DistributedDataParallel as DDP
9 +from torch.cuda.amp import GradScaler, autocast
10 +from .backbones.layers import convert_fixedbn_model
11 +
12 +from .data import DataIterator, RotatedDataIterator
13 +from .dali import DaliDataIterator
14 +from .utils import ignore_sigint, post_metrics, Profiler
15 +from .infer import infer
16 +
17 +import warnings
18 +warnings.filterwarnings('ignore', message=SAVE_STATE_WARNING, category=UserWarning)
19 +
20 +
21 +def train(model, state, path, annotations, val_path, val_annotations, resize, max_size, jitter, batch_size, iterations,
22 + val_iterations, mixed_precision, lr, warmup, milestones, gamma, rank=0, world=1, no_apex=False, use_dali=True,
23 + verbose=True, metrics_url=None, logdir=None, rotate_augment=False, augment_brightness=0.0,
24 + augment_contrast=0.0, augment_hue=0.0, augment_saturation=0.0, regularization_l2=0.0001, rotated_bbox=False,
25 + absolute_angle=False):
26 + 'Train the model on the given dataset'
27 +
28 + # Prepare model
29 + nn_model = model
30 + stride = model.stride
31 +
32 + model = convert_fixedbn_model(model)
33 + if torch.cuda.is_available():
34 + model = model.to(memory_format=torch.channels_last).cuda()
35 +
36 + # Setup optimizer and schedule
37 + optimizer = SGD(model.parameters(), lr=lr, weight_decay=regularization_l2, momentum=0.9)
38 +
39 + is_master = rank==0
40 + if not no_apex:
41 + loss_scale = "dynamic" if use_dali else "128.0"
42 + model, optimizer = amp.initialize(model, optimizer,
43 + opt_level='O2' if mixed_precision else 'O0',
44 + keep_batchnorm_fp32=True,
45 + loss_scale=loss_scale,
46 + verbosity=is_master)
47 +
48 + if world > 1:
49 + model = DDP(model, device_ids=[rank]) if no_apex else ADDP(model)
50 + model.train()
51 +
52 + if 'optimizer' in state:
53 + optimizer.load_state_dict(state['optimizer'])
54 +
55 + def schedule(train_iter):
56 + if warmup and train_iter <= warmup:
57 + return 0.9 * train_iter / warmup + 0.1
58 + return gamma ** len([m for m in milestones if m <= train_iter])
59 +
60 + scheduler = LambdaLR(optimizer, schedule)
61 + if 'scheduler' in state:
62 + scheduler.load_state_dict(state['scheduler'])
63 +
64 + # Prepare dataset
65 + if verbose: print('Preparing dataset...')
66 + if rotated_bbox:
67 + if use_dali: raise NotImplementedError("This repo does not currently support DALI for rotated bbox detections.")
68 + data_iterator = RotatedDataIterator(path, jitter, max_size, batch_size, stride,
69 + world, annotations, training=True, rotate_augment=rotate_augment,
70 + augment_brightness=augment_brightness,
71 + augment_contrast=augment_contrast, augment_hue=augment_hue,
72 + augment_saturation=augment_saturation, absolute_angle=absolute_angle)
73 + else:
74 + data_iterator = (DaliDataIterator if use_dali else DataIterator)(
75 + path, jitter, max_size, batch_size, stride,
76 + world, annotations, training=True, rotate_augment=rotate_augment, augment_brightness=augment_brightness,
77 + augment_contrast=augment_contrast, augment_hue=augment_hue, augment_saturation=augment_saturation)
78 + if verbose: print(data_iterator)
79 +
80 + if verbose:
81 + print(' device: {} {}'.format(
82 + world, 'cpu' if not torch.cuda.is_available() else 'GPU' if world == 1 else 'GPUs'))
83 + print(' batch: {}, precision: {}'.format(batch_size, 'mixed' if mixed_precision else 'full'))
84 + print(' BBOX type:', 'rotated' if rotated_bbox else 'axis aligned')
85 + print('Training model for {} iterations...'.format(iterations))
86 +
87 + # Create TensorBoard writer
88 + if is_master and logdir is not None:
89 + from torch.utils.tensorboard import SummaryWriter
90 + if verbose:
91 + print('Writing TensorBoard logs to: {}'.format(logdir))
92 + writer = SummaryWriter(log_dir=logdir)
93 +
94 + scaler = GradScaler()
95 + profiler = Profiler(['train', 'fw', 'bw'])
96 + iteration = state.get('iteration', 0)
97 + while iteration < iterations:
98 + cls_losses, box_losses = [], []
99 + for i, (data, target) in enumerate(data_iterator):
100 + if iteration>=iterations:
101 + break
102 +
103 + # Forward pass
104 + profiler.start('fw')
105 +
106 + optimizer.zero_grad()
107 + if not no_apex:
108 + cls_loss, box_loss = model([data.contiguous(memory_format=torch.channels_last), target])
109 + else:
110 + with autocast():
111 + cls_loss, box_loss = model([data.contiguous(memory_format=torch.channels_last), target])
112 + del data
113 + profiler.stop('fw')
114 +
115 + # Backward pass
116 + profiler.start('bw')
117 + if not no_apex:
118 + with amp.scale_loss(cls_loss + box_loss, optimizer) as scaled_loss:
119 + scaled_loss.backward()
120 + optimizer.step()
121 + else:
122 + scaler.scale(cls_loss + box_loss).backward()
123 + scaler.step(optimizer)
124 + scaler.update()
125 +
126 + scheduler.step()
127 +
128 + # Reduce all losses
129 + cls_loss, box_loss = cls_loss.mean().clone(), box_loss.mean().clone()
130 + if world > 1:
131 + torch.distributed.all_reduce(cls_loss)
132 + torch.distributed.all_reduce(box_loss)
133 + cls_loss /= world
134 + box_loss /= world
135 + if is_master:
136 + cls_losses.append(cls_loss)
137 + box_losses.append(box_loss)
138 +
139 + if is_master and not isfinite(cls_loss + box_loss):
140 + raise RuntimeError('Loss is diverging!\n{}'.format(
141 + 'Try lowering the learning rate.'))
142 +
143 + del cls_loss, box_loss
144 + profiler.stop('bw')
145 +
146 + iteration += 1
147 + profiler.bump('train')
148 + if is_master and (profiler.totals['train'] > 60 or iteration == iterations):
149 + focal_loss = torch.stack(list(cls_losses)).mean().item()
150 + box_loss = torch.stack(list(box_losses)).mean().item()
151 + learning_rate = optimizer.param_groups[0]['lr']
152 + if verbose:
153 + msg = '[{:{len}}/{}]'.format(iteration, iterations, len=len(str(iterations)))
154 + msg += ' focal loss: {:.3f}'.format(focal_loss)
155 + msg += ', box loss: {:.3f}'.format(box_loss)
156 + msg += ', {:.3f}s/{}-batch'.format(profiler.means['train'], batch_size)
157 + msg += ' (fw: {:.3f}s, bw: {:.3f}s)'.format(profiler.means['fw'], profiler.means['bw'])
158 + msg += ', {:.1f} im/s'.format(batch_size / profiler.means['train'])
159 + msg += ', lr: {:.2g}'.format(learning_rate)
160 + print(msg, flush=True)
161 +
162 + if is_master and logdir is not None:
163 + writer.add_scalar('focal_loss', focal_loss, iteration)
164 + writer.add_scalar('box_loss', box_loss, iteration)
165 + writer.add_scalar('learning_rate', learning_rate, iteration)
166 + del box_loss, focal_loss
167 +
168 + if metrics_url:
169 + post_metrics(metrics_url, {
170 + 'focal loss': mean(cls_losses),
171 + 'box loss': mean(box_losses),
172 + 'im_s': batch_size / profiler.means['train'],
173 + 'lr': learning_rate
174 + })
175 +
176 + # Save model weights
177 + state.update({
178 + 'iteration': iteration,
179 + 'optimizer': optimizer.state_dict(),
180 + 'scheduler': scheduler.state_dict(),
181 + })
182 + with ignore_sigint():
183 + nn_model.save(state)
184 +
185 + profiler.reset()
186 + del cls_losses[:], box_losses[:]
187 +
188 + if val_annotations and (iteration == iterations or iteration % val_iterations == 0):
189 + stats = infer(model, val_path, None, resize, max_size, batch_size, annotations=val_annotations,
190 + mixed_precision=mixed_precision, is_master=is_master, world=world, use_dali=use_dali,
191 + no_apex=no_apex, is_validation=True, verbose=False, rotated_bbox=rotated_bbox)
192 + model.train()
193 + if is_master and logdir is not None and stats is not None:
194 + writer.add_scalar(
195 + 'Validation_Precision/mAP', stats[0], iteration)
196 + writer.add_scalar(
197 + 'Validation_Precision/mAP@0.50IoU', stats[1], iteration)
198 + writer.add_scalar(
199 + 'Validation_Precision/mAP@0.75IoU', stats[2], iteration)
200 + writer.add_scalar(
201 + 'Validation_Precision/mAP (small)', stats[3], iteration)
202 + writer.add_scalar(
203 + 'Validation_Precision/mAP (medium)', stats[4], iteration)
204 + writer.add_scalar(
205 + 'Validation_Precision/mAP (large)', stats[5], iteration)
206 + writer.add_scalar(
207 + 'Validation_Recall/mAR (max 1 Dets)', stats[6], iteration)
208 + writer.add_scalar(
209 + 'Validation_Recall/mAR (max 10 Dets)', stats[7], iteration)
210 + writer.add_scalar(
211 + 'Validation_Recall/mAR (max 100 Dets)', stats[8], iteration)
212 + writer.add_scalar(
213 + 'Validation_Recall/mAR (small)', stats[9], iteration)
214 + writer.add_scalar(
215 + 'Validation_Recall/mAR (medium)', stats[10], iteration)
216 + writer.add_scalar(
217 + 'Validation_Recall/mAR (large)', stats[11], iteration)
218 +
219 + if (iteration==iterations and not rotated_bbox) or (iteration>iterations and rotated_bbox):
220 + break
221 +
222 + if is_master and logdir is not None:
223 + writer.close()
1 +import os.path
2 +import time
3 +import json
4 +import warnings
5 +import signal
6 +from datetime import datetime
7 +from contextlib import contextmanager
8 +from PIL import Image, ImageDraw
9 +import requests
10 +import numpy as np
11 +import math
12 +import torch
13 +
14 +
15 +def order_points(pts):
16 + pts_reorder = []
17 +
18 + for idx, pt in enumerate(pts):
19 + idx = torch.argsort(pt[:, 0])
20 + xSorted = pt[idx, :]
21 + leftMost = xSorted[:2, :]
22 + rightMost = xSorted[2:, :]
23 +
24 + leftMost = leftMost[torch.argsort(leftMost[:, 1]), :]
25 + (tl, bl) = leftMost
26 +
27 + D = torch.cdist(tl[np.newaxis], rightMost)[0]
28 + (br, tr) = rightMost[torch.argsort(D, descending=True), :]
29 + pts_reorder.append(torch.stack([tl, tr, br, bl]))
30 +
31 + return torch.stack([p for p in pts_reorder])
32 +
33 +def rotate_boxes(boxes, points=False):
34 + '''
35 + Rotate target bounding boxes
36 +
37 + Input:
38 + Target boxes (xmin_ymin, width_height, theta)
39 + Output:
40 + boxes_axis (xmin_ymin, xmax_ymax, theta)
41 + boxes_rotated (xy0, xy1, xy2, xy3)
42 + '''
43 +
44 + u = torch.stack([torch.cos(boxes[:,4]), torch.sin(boxes[:,4])], dim=1)
45 + l = torch.stack([-torch.sin(boxes[:,4]), torch.cos(boxes[:,4])], dim=1)
46 + R = torch.stack([u, l], dim=1)
47 +
48 + if points:
49 + cents = torch.stack([(boxes[:,0]+boxes[:,2])/2, (boxes[:,1]+boxes[:,3])/2],1).transpose(1,0)
50 + boxes_rotated = torch.stack([boxes[:,0],boxes[:,1],
51 + boxes[:,2], boxes[:,1],
52 + boxes[:,2], boxes[:,3],
53 + boxes[:,0], boxes[:,3],
54 + boxes[:,-2],
55 + boxes[:,-1]],1)
56 +
57 + else:
58 + cents = torch.stack([boxes[:,0]+(boxes[:,2])/2, boxes[:,1]+(boxes[:,3])/2],1).transpose(1,0)
59 + boxes_rotated = torch.stack([boxes[:,0],boxes[:,1],
60 + (boxes[:,0]+boxes[:,2]), boxes[:,1],
61 + (boxes[:,0]+boxes[:,2]), (boxes[:,1]+boxes[:,3]),
62 + boxes[:,0], (boxes[:,1]+boxes[:,3]),
63 + boxes[:,-2],
64 + boxes[:,-1]],1)
65 +
66 + xy0R = torch.matmul(R,boxes_rotated[:,:2].transpose(1,0) - cents) + cents
67 + xy1R = torch.matmul(R,boxes_rotated[:,2:4].transpose(1,0) - cents) + cents
68 + xy2R = torch.matmul(R,boxes_rotated[:,4:6].transpose(1,0) - cents) + cents
69 + xy3R = torch.matmul(R,boxes_rotated[:,6:8].transpose(1,0) - cents) + cents
70 +
71 + xy0R = torch.stack([xy0R[i,:,i] for i in range(xy0R.size(0))])
72 + xy1R = torch.stack([xy1R[i,:,i] for i in range(xy1R.size(0))])
73 + xy2R = torch.stack([xy2R[i,:,i] for i in range(xy2R.size(0))])
74 + xy3R = torch.stack([xy3R[i,:,i] for i in range(xy3R.size(0))])
75 +
76 + boxes_axis = torch.cat([boxes[:, :2], boxes[:, :2] + boxes[:, 2:4] - 1,
77 + torch.sin(boxes[:,-1, None]), torch.cos(boxes[:,-1, None])], 1)
78 + boxes_rotated = order_points(torch.stack([xy0R,xy1R,xy2R,xy3R],dim = 1)).view(-1,8)
79 +
80 + return boxes_axis, boxes_rotated
81 +
82 +
83 +def rotate_box(bbox):
84 + xmin, ymin, width, height, theta = bbox
85 +
86 + xy1 = xmin, ymin
87 + xy2 = xmin, ymin + height - 1
88 + xy3 = xmin + width - 1, ymin + height - 1
89 + xy4 = xmin + width - 1, ymin
90 +
91 + cents = np.array([xmin + (width - 1) / 2, ymin + (height - 1) / 2])
92 +
93 + corners = np.stack([xy1, xy2, xy3, xy4])
94 +
95 + u = np.stack([np.cos(theta), -np.sin(theta)])
96 + l = np.stack([np.sin(theta), np.cos(theta)])
97 + R = np.vstack([u, l])
98 +
99 + corners = np.matmul(R, (corners - cents).transpose(1, 0)).transpose(1, 0) + cents
100 +
101 + return corners.reshape(-1).tolist()
102 +
103 +
104 +def show_detections(detections):
105 + 'Show image with drawn detections'
106 +
107 + for image, detections in detections.items():
108 + im = Image.open(image).convert('RGBA')
109 + overlay = Image.new('RGBA', im.size, (255, 255, 255, 0))
110 + draw = ImageDraw.Draw(overlay)
111 + detections.sort(key=lambda d: d['score'])
112 + for detection in detections:
113 + box = detection['bbox']
114 + alpha = int(detection['score'] * 255)
115 + draw.rectangle(box, outline=(255, 255, 255, alpha))
116 + draw.text((box[0] + 2, box[1]), '[{}]'.format(detection['class']),
117 + fill=(255, 255, 255, alpha))
118 + draw.text((box[0] + 2, box[1] + 10), '{:.2}'.format(detection['score']),
119 + fill=(255, 255, 255, alpha))
120 + im = Image.alpha_composite(im, overlay)
121 + im.show()
122 +
123 +
124 +def save_detections(path, detections):
125 + print('Writing detections to {}...'.format(os.path.basename(path)))
126 + with open(path, 'w') as f:
127 + json.dump(detections, f)
128 +
129 +
130 +@contextmanager
131 +def ignore_sigint():
132 + handler = signal.getsignal(signal.SIGINT)
133 + signal.signal(signal.SIGINT, signal.SIG_IGN)
134 + try:
135 + yield
136 + finally:
137 + signal.signal(signal.SIGINT, handler)
138 +
139 +
140 +class Profiler(object):
141 + def __init__(self, names=['main']):
142 + self.names = names
143 + self.lasts = {k: 0 for k in names}
144 + self.totals = self.lasts.copy()
145 + self.counts = self.lasts.copy()
146 + self.means = self.lasts.copy()
147 + self.reset()
148 +
149 + def reset(self):
150 + last = time.time()
151 + for name in self.names:
152 + self.lasts[name] = last
153 + self.totals[name] = 0
154 + self.counts[name] = 0
155 + self.means[name] = 0
156 +
157 + def start(self, name='main'):
158 + self.lasts[name] = time.time()
159 +
160 + def stop(self, name='main'):
161 + self.totals[name] += time.time() - self.lasts[name]
162 + self.counts[name] += 1
163 + self.means[name] = self.totals[name] / self.counts[name]
164 +
165 + def bump(self, name='main'):
166 + self.stop(name)
167 + self.start(name)
168 +
169 +
170 +def post_metrics(url, metrics):
171 + try:
172 + for k, v in metrics.items():
173 + requests.post(url,
174 + data={'time': int(datetime.now().timestamp() * 1e9),
175 + 'metric': k, 'value': v})
176 + except Exception as e:
177 + warnings.warn('Warning: posting metrics failed: {}'.format(e))
1 +from setuptools import setup
2 +from torch.utils.cpp_extension import BuildExtension, CUDAExtension
3 +
4 +setup(
5 + name='odtk',
6 + version='0.2.6',
7 + description='Fast and accurate single shot object detector',
8 + author = 'NVIDIA Corporation',
9 + packages=['odtk', 'odtk.backbones'],
10 + ext_modules=[CUDAExtension('odtk._C',
11 + ['csrc/extensions.cpp', 'csrc/engine.cpp', 'csrc/cuda/decode.cu', 'csrc/cuda/decode_rotate.cu', 'csrc/cuda/nms.cu', 'csrc/cuda/nms_iou.cu'],
12 + extra_compile_args={
13 + 'cxx': ['-std=c++14', '-O2', '-Wall'],
14 + 'nvcc': [
15 + '-std=c++14', '--expt-extended-lambda', '--use_fast_math', '-Xcompiler', '-Wall,-fno-gnu-unique',
16 + '-gencode=arch=compute_60,code=sm_60', '-gencode=arch=compute_61,code=sm_61',
17 + '-gencode=arch=compute_70,code=sm_70', '-gencode=arch=compute_72,code=sm_72',
18 + '-gencode=arch=compute_75,code=sm_75', '-gencode=arch=compute_80,code=sm_80',
19 + '-gencode=arch=compute_86,code=sm_86', '-gencode=arch=compute_86,code=compute_86'
20 + ],
21 + },
22 + libraries=['nvinfer', 'nvinfer_plugin', 'nvonnxparser'])
23 + ],
24 + cmdclass={'build_ext': BuildExtension.with_options(no_python_abi_suffix=True)},
25 + install_requires=[
26 + 'torch>=1.0.0a0',
27 + 'torchvision',
28 + 'apex @ git+https://github.com/NVIDIA/apex',
29 + 'pycocotools @ git+https://github.com/nvidia/cocoapi.git#subdirectory=PythonAPI',
30 + 'pillow',
31 + 'requests',
32 + ],
33 + entry_points = {'console_scripts': ['odtk=odtk.main:main']}
34 +)
1 +# voc2coco
2 +
3 +This is script for converting VOC format XMLs to COCO format json(ex. coco_eval.json).
4 +
5 +### Why we need to convert VOC xmls to COCO format json ?
6 +
7 +We can use COCO API, this is very useful(ex. calculating mAP).
8 +
9 +## How to use
10 +
11 +### 1. Make labels.txt
12 +
13 +labels.txt if need for making dictionary for converting label to id.
14 +
15 +**Sample labels.txt**
16 +
17 +```txt
18 +Label1
19 +Label2
20 +...
21 +```
22 +
23 +### 2. Run script
24 +
25 +##### 2.1 Usage 1(Use ids list)
26 +
27 +```bash
28 +$ python voc2coco.py \
29 + --ann_dir /path/to/annotation/dir \
30 + --ann_ids /path/to/annotations/ids/list.txt \
31 + --labels /path/to/labels.txt \
32 + --output /path/to/output.json \
33 + <option> --ext xml
34 +```
35 +
36 +##### 2.2 Usage 2(Use annotation paths list)
37 +
38 +**Sample paths.txt**
39 +
40 +```txt
41 +/path/to/annotation/file.xml
42 +/path/to/annotation/file2.xml
43 +...
44 +```
45 +
46 +```bash
47 +$ python voc2coco.py \
48 + --ann_paths_list /path/to/annotation/paths.txt \
49 + --labels /path/to/labels.txt \
50 + --output /path/to/output.json \
51 + <option> --ext xml
52 +```
53 +
54 +### 3. Example of usage
55 +
56 +In this case, you can convert [Shenggan/BCCD_Dataset: BCCD Dataset is a small-scale dataset for blood cells detection.](https://github.com/Shenggan/BCCD_Dataset) by this script.
57 +
58 +```bash
59 +$ python voc2coco.py
60 + --ann_dir sample/Annotations \
61 + --ann_ids sample/dataset_ids/test.txt \
62 + --labels sample/labels.txt \
63 + --output sample/bccd_test_cocoformat.json \
64 + --ext xml
65 +
66 +# Check output
67 +$ ls sample/ | grep bccd_test_cocoformat.json
68 +bccd_test_cocoformat.json
69 +
70 +# Check output
71 +cut -f -4 -d , sample/bccd_test_cocoformat.json
72 +{"images": [{"file_name": "BloodImage_00007.jpg", "height": 480, "width": 640, "id": "BloodImage_00007"}
73 +```
1 +192.168.0.101_20190720-11362588_Motion
2 +192.168.0.101_20190720-11372337_Motion
3 +192.168.0.101_20190720-11374138_Motion
4 +192.168.0.101_20190720-11452888_Motion
5 +192.168.0.101_20190720-12105034_Motion
6 +192.168.0.101_20190720-12160838_Motion
7 +192.168.0.101_20190720-12170737_Motion
8 +192.168.0.101_20190720-12175937_Motion
9 +192.168.0.101_20190720-12214737_Motion
10 +192.168.0.101_20190720-12235083_Motion
11 +192.168.0.101_20190720-12244383_Motion
12 +192.168.0.101_20190720-12283436_Motion
13 +192.168.0.101_20190720-13080033_Motion
14 +192.168.0.101_20190720-13111980_Motion
15 +192.168.0.101_20190720-13182131_Motion
16 +192.168.0.101_20190720-15191163_Motion
17 +192.168.0.101_20190720-15413012_Motion
18 +192.168.0.101_20190720-16465005_Motion
19 +192.168.0.101_20190720-16583605_Motion
20 +192.168.0.101_20190720-17234402_Motion
21 +192.168.0.101_20190720-17443700_Motion
22 +192.168.0.101_20190720-17502000_Motion
23 +192.168.0.101_20190720-18152897_Motion
24 +192.168.0.101_20190720-19041042_Motion
25 +192.168.0.101_20190721-06271810_Motion
26 +192.168.0.101_20190721-08570042_Motion
27 +192.168.0.101_20190721-09163489_Motion
28 +192.168.0.101_20190721-09345536_Motion
29 +192.168.0.101_20190721-09581284_Motion
30 +192.168.0.101_20190721-10460577_Motion
31 +192.168.0.101_20190721-10492125_Motion
32 +192.168.0.101_20190721-13324903_Motion
33 +192.168.0.101_20190721-14053747_Motion
34 +192.168.0.101_20190721-15081190_Motion
35 +192.168.0.101_20190721-15240738_Motion
36 +192.168.0.101_20190721-15280487_Motion
37 +192.168.0.101_20190721-15335936_Motion
38 +192.168.0.101_20190721-16074833_Motion
39 +192.168.0.101_20190721-16284528_Motion
40 +192.168.0.101_20190721-17213572_Motion
41 +192.168.0.101_20190721-17371770_Motion
42 +192.168.0.101_20190721-17383420_Motion
43 +192.168.0.101_20190721-18123715_Motion
44 +192.168.0.101_20190721-19404804_Motion
45 +192.168.0.101_20190722-06242119_Motion
46 +192.168.0.101_20190722-06535264_Motion
47 +192.168.0.101_20190722-07063613_Motion
48 +192.168.0.101_20190722-07160110_Motion
49 +192.168.0.101_20190722-07223611_Motion
50 +192.168.0.101_20190722-07282159_Motion
51 +192.168.0.101_20190722-07285409_Motion
52 +192.168.0.101_20190722-07290708_Motion
53 +192.168.0.101_20190722-07302108_Motion
54 +192.168.0.101_20190722-07343258_Motion
55 +192.168.0.101_20190722-07364109_Motion
56 +192.168.0.101_20190722-07382807_Motion
57 +192.168.0.101_20190722-07395107_Motion
58 +192.168.0.101_20190722-07460456_Motion
59 +192.168.0.101_20190722-07463858_Motion
60 +192.168.0.101_20190722-07483156_Motion
61 +192.168.0.101_20190722-07522694_Motion
62 +192.168.0.101_20190722-07542938_Motion
63 +192.168.0.101_20190722-07580427_Motion
64 +192.168.0.101_20190722-08042508_Motion
65 +192.168.0.101_20190722-08095192_Motion
66 +192.168.0.101_20190722-08120435_Motion
67 +192.168.0.101_20190722-08123134_Motion
68 +192.168.0.101_20190722-08125783_Motion
69 +192.168.0.101_20190722-08191565_Motion
70 +192.168.0.101_20190722-08263256_Motion
71 +192.168.0.101_20190722-08274306_Motion
72 +192.168.0.101_20190722-08514048_Motion
73 +192.168.0.101_20190722-08532449_Motion
74 +192.168.0.101_20190722-09201242_Motion
75 +192.168.0.101_20190722-09230544_Motion
76 +192.168.0.101_20190722-09310393_Motion
77 +192.168.0.101_20190722-09505438_Motion
78 +192.168.0.101_20190722-10032136_Motion
79 +192.168.0.101_20190722-10071237_Motion
80 +192.168.0.101_20190722-10253882_Motion
81 +192.168.0.101_20190722-10471330_Motion
82 +192.168.0.101_20190722-11091289_Motion
83 +192.168.0.101_20190722-11092887_Motion
84 +192.168.0.101_20190722-11132426_Motion
85 +192.168.0.101_20190722-11300481_Motion
86 +192.168.0.101_20190722-11324728_Motion
87 +192.168.0.101_20190722-11381481_Motion
88 +192.168.0.101_20190722-11383430_Motion
89 +192.168.0.101_20190722-11460626_Motion
90 +192.168.0.101_20190722-11471926_Motion
91 +192.168.0.101_20190722-11570325_Motion
92 +192.168.0.101_20190722-12124523_Motion
93 +192.168.0.101_20190722-12190523_Motion
94 +192.168.0.101_20190722-13103041_Motion
95 +192.168.0.101_20190722-13190967_Motion
96 +192.168.0.101_20190722-13544612_Motion
97 +192.168.0.101_20190722-13582111_Motion
98 +192.168.0.101_20190722-14015211_Motion
99 +192.168.0.101_20190722-14030761_Motion
100 +192.168.0.101_20190722-14292457_Motion
101 +192.168.0.101_20190722-14295705_Motion
102 +192.168.0.101_20190722-15021452_Motion
103 +192.168.0.101_20190722-15425543_Motion
104 +192.168.0.101_20190722-15430593_Motion
105 +192.168.0.101_20190722-15464694_Motion
106 +192.168.0.101_20190722-15500793_Motion
107 +192.168.0.101_20190722-15540538_Motion
108 +192.168.0.101_20190722-15562932_Motion
109 +192.168.0.101_20190722-15595921_Motion
110 +192.168.0.101_20190722-16032910_Motion
111 +192.168.0.101_20190722-16034659_Motion
112 +192.168.0.101_20190722-16280535_Motion
113 +192.168.0.101_20190722-16350515_Motion
114 +192.168.0.101_20190722-16353862_Motion
115 +192.168.0.101_20190722-16400558_Motion
116 +192.168.0.101_20190722-16564458_Motion
117 +192.168.0.101_20190722-16590705_Motion
118 +192.168.0.101_20190722-17092304_Motion
119 +192.168.0.101_20190722-17340899_Motion
120 +192.168.0.101_20190722-17512647_Motion
121 +192.168.0.101_20190722-17540845_Motion
122 +192.168.0.101_20190722-17593146_Motion
123 +192.168.0.101_20190722-18131094_Motion
124 +192.168.0.101_20190722-18310790_Motion
125 +192.168.0.101_20190722-18333591_Motion
126 +192.168.0.101_20190722-19010888_Motion
127 +192.168.0.101_20190722-19083933_Motion
128 +192.168.0.101_20190722-19090232_Motion
129 +192.168.0.101_20190722-19123431_Motion
130 +192.168.0.101_20190722-19135831_Motion
131 +192.168.0.101_20190723-06162619_Motion
132 +192.168.0.101_20190723-06505211_Motion
133 +192.168.0.101_20190723-07045162_Motion
134 +192.168.0.101_20190723-07150360_Motion
135 +192.168.0.101_20190723-07163007_Motion
136 +192.168.0.101_20190723-07203855_Motion
137 +192.168.0.101_20190723-07230356_Motion
138 +192.168.0.101_20190723-07282654_Motion
139 +192.168.0.101_20190723-07284053_Motion
140 +192.168.0.101_20190723-07340155_Motion
141 +192.168.0.101_20190723-07345454_Motion
142 +192.168.0.101_20190723-07382153_Motion
143 +192.168.0.101_20190723-07391451_Motion
144 +192.168.0.101_20190723-07441501_Motion
145 +192.168.0.101_20190723-07462601_Motion
146 +192.168.0.101_20190723-07473003_Motion
147 +192.168.0.101_20190723-07482502_Motion
148 +192.168.0.101_20190723-07495650_Motion
149 +192.168.0.101_20190723-07505498_Motion
150 +192.168.0.101_20190723-07521200_Motion
151 +192.168.0.101_20190723-08022747_Motion
152 +192.168.0.101_20190723-08173296_Motion
153 +192.168.0.101_20190723-08212896_Motion
154 +192.168.0.101_20190723-08365541_Motion
155 +192.168.0.101_20190723-08432990_Motion
156 +192.168.0.101_20190723-08501190_Motion
157 +192.168.0.101_20190723-08594738_Motion
158 +192.168.0.101_20190723-09285531_Motion
159 +192.168.0.101_20190723-09420829_Motion
160 +192.168.0.101_20190723-10123474_Motion
161 +192.168.0.101_20190723-10183522_Motion
162 +192.168.0.101_20190723-10250419_Motion
163 +192.168.0.101_20190723-10433566_Motion
164 +192.168.0.101_20190723-11212308_Motion
165 +192.168.0.101_20190723-11224407_Motion
166 +192.168.0.101_20190723-11233257_Motion
167 +192.168.0.101_20190723-12082848_Motion
168 +192.168.0.101_20190723-12305993_Motion
169 +192.168.0.101_20190723-12310692_Motion
170 +192.168.0.101_20190723-12362990_Motion
171 +192.168.0.101_20190723-12402990_Motion
172 +192.168.0.101_20190723-12502827_Motion
173 +192.168.0.101_20190723-13110701_Motion
174 +192.168.0.101_20190723-13234549_Motion
175 +192.168.0.101_20190723-13241598_Motion
176 +192.168.0.101_20190723-14555527_Motion
177 +192.168.0.101_20190723-14594823_Motion
178 +192.168.0.101_20190723-15043174_Motion
179 +192.168.0.101_20190723-15124570_Motion
180 +192.168.0.101_20190723-15312018_Motion
181 +192.168.0.101_20190723-15425663_Motion
182 +192.168.0.101_20190723-15572809_Motion
183 +192.168.0.101_20190723-15574909_Motion
184 +192.168.0.101_20190723-16143255_Motion
185 +192.168.0.101_20190723-16151257_Motion
186 +192.168.0.101_20190723-16202053_Motion
187 +192.168.0.101_20190723-16202802_Motion
188 +192.168.0.101_20190723-16395644_Motion
189 +192.168.0.101_20190723-16465694_Motion
190 +192.168.0.101_20190723-16540292_Motion
191 +192.168.0.101_20190723-17025889_Motion
192 +192.168.0.101_20190723-17043538_Motion
193 +192.168.0.101_20190723-17123835_Motion
194 +192.168.0.101_20190723-17192435_Motion
195 +192.168.0.101_20190723-17383430_Motion
196 +192.168.0.101_20190723-18144521_Motion
197 +192.168.0.101_20190723-18194467_Motion
198 +192.168.0.101_20190723-18322865_Motion
199 +192.168.0.101_20190723-19085206_Motion
200 +192.168.0.101_20190723-19414050_Motion
1 +0
2 +1
3 +2
4 +3
5 +4
6 +5
7 +6
8 +7
9 +8
10 +9
11 +10
12 +11
13 +12
14 +13
15 +14
16 +15
17 +16
18 +17
19 +18
20 +19
21 +20
22 +21
23 +22
24 +23
25 +24
26 +25
27 +26
28 +27
29 +28
30 +29
31 +30
32 +31
33 +32
34 +33
35 +34
36 +35
37 +36
38 +37
39 +38
40 +39
41 +40
42 +41
43 +42
44 +43
45 +44
46 +45
47 +46
48 +47
49 +48
50 +49
51 +50
52 +51
53 +52
54 +53
55 +54
56 +55
57 +56
58 +57
59 +58
60 +59
61 +60
62 +61
63 +62
64 +63
65 +64
66 +65
67 +66
68 +67
69 +68
70 +69
71 +70
72 +71
73 +72
74 +73
75 +74
76 +75
77 +76
78 +77
79 +78
80 +79
81 +80
82 +81
83 +82
84 +83
85 +84
86 +85
87 +86
88 +87
89 +88
90 +89
91 +90
92 +91
93 +92
94 +93
95 +94
96 +95
97 +96
98 +97
99 +98
100 +99
101 +100
102 +101
103 +102
104 +103
105 +104
106 +105
107 +106
108 +107
109 +108
110 +109
111 +110
112 +111
113 +112
114 +113
115 +114
116 +115
117 +116
118 +117
119 +118
120 +119
121 +120
122 +121
123 +122
124 +123
125 +124
126 +125
127 +126
128 +127
129 +128
130 +129
131 +130
132 +131
133 +132
134 +133
135 +134
136 +135
137 +136
138 +137
139 +138
140 +139
141 +140
142 +141
143 +142
144 +143
145 +144
146 +145
147 +146
148 +147
149 +148
150 +149
151 +150
152 +151
153 +152
154 +153
155 +154
156 +155
157 +156
158 +157
159 +158
160 +159
161 +160
162 +161
163 +162
164 +163
165 +164
166 +165
167 +166
168 +167
169 +168
170 +169
171 +170
172 +171
173 +172
174 +173
175 +174
176 +175
177 +176
178 +177
179 +178
180 +179
181 +180
182 +181
183 +182
184 +183
185 +184
186 +185
187 +186
188 +187
189 +188
190 +189
191 +190
192 +191
193 +192
194 +193
195 +194
196 +195
197 +196
198 +197
199 +198
200 +199
1 +0
2 +1
3 +2
4 +3
5 +4
6 +5
7 +6
8 +7
9 +8
10 +9
11 +ba
12 +beo
13 +bo
14 +bu
15 +da
16 +deo
17 +do
18 +du
19 +eo
20 +ga
21 +geo
22 +go
23 +gu
24 +ha
25 +heo
26 +ho
27 +jeo
28 +jo
29 +ju
30 +ma
31 +meo
32 +mo
33 +mu
34 +na
35 +neo
36 +no
37 +nu
38 +o
39 +ra
40 +reo
41 +ro
42 +ru
43 +seo
44 +so
45 +su
46 +u
...\ No newline at end of file ...\ No newline at end of file
1 +0
2 +1
3 +2
4 +3
5 +4
6 +5
7 +6
8 +7
9 +8
10 +9
11 +51
12 +50
13 +49
14 +52
15 +39
16 +38
17 +37
18 +41
19 +33
20 +56
21 +55
22 +54
23 +57
24 +46
25 +46
26 +48
27 +71
28 +70
29 +74
30 +28
31 +27
32 +26
33 +30
34 +60
35 +59
36 +58
37 +63
38 +32
39 +44
40 +43
41 +42
42 +45
43 +65
44 +64
45 +68
46 +36
...\ No newline at end of file ...\ No newline at end of file
1 +import os
2 +import sys
3 +import json
4 +
5 +# path = sys.argv[1]
6 +
7 +# files = os.listdir(path)
8 +
9 +# f = open("ids.txt", 'w')
10 +# for filename in files:
11 +# filename = filename.rstrip('.xml')
12 +# f.writelines(filename+"\n")
13 +# f.close()
14 +
15 +f = open("ids2.txt", 'w')
16 +for i in range(200):
17 + f.write(str(i)+'\n')
18 +f.close()
...\ No newline at end of file ...\ No newline at end of file
1 +import os
2 +import sys
3 +import json
4 +from xml.etree.ElementTree import parse
5 +
6 +def get_class(xml_path):
7 + tree = parse(xml_path)
8 + root = tree.getroot()
9 + classes = root.findall("object")
10 + names = [x.findtext("name") for x in classes]
11 + return names
12 +
13 +
14 +
15 +path = sys.argv[1]
16 +files = os.listdir(path)
17 +classlist = []
18 +
19 +for file in files:
20 + classes = get_class(path+'\\'+file)
21 + classlist = list(set(classlist) | set(classes))
22 +
23 +classlist.sort()
24 +f = open("label.txt", 'w')
25 +for ca in classlist:
26 + f.write(ca+'\n')
27 +f.close()
...\ No newline at end of file ...\ No newline at end of file
File mode changed
1 +import os
2 +import argparse
3 +import json
4 +import xml.etree.ElementTree as ET
5 +from typing import Dict, List
6 +from tqdm import tqdm
7 +import re
8 +
9 +
10 +def get_label2id(labels_path: str, labeltable_path: str) -> Dict[str, int]:
11 + """id is 1 start"""
12 + with open(labels_path, 'r') as f:
13 + labels_str = f.read().split()
14 + with open(labeltable_path, 'r') as f2:
15 + table_str = f2.read().split()
16 + table_int = list(map(int, table_str))
17 + labels_ids = list(range(1, len(labels_str)+1))
18 + for i in range(0, len(labels_str)):
19 + labels_ids[i] = table_int[i]+1
20 + return dict(zip(labels_str, labels_ids))
21 +
22 +
23 +def get_annpaths(ann_dir_path: str = None,
24 + ann_ids_path: str = None,
25 + ext: str = '',
26 + annpaths_list_path: str = None) -> List[str]:
27 + # If use annotation paths list
28 + if annpaths_list_path is not None:
29 + with open(annpaths_list_path, 'r') as f:
30 + ann_paths = f.read().split()
31 + return ann_paths
32 +
33 + # If use annotaion ids list
34 + ext_with_dot = '.' + ext if ext != '' else ''
35 + with open(ann_ids_path, 'r') as f:
36 + ann_ids = f.read().split()
37 + ann_paths = [os.path.join(ann_dir_path, aid+ext_with_dot) for aid in ann_ids]
38 + return ann_paths
39 +
40 +
41 +def get_image_info(annotation_root, id, extract_num_from_imgid=True):
42 + path = annotation_root.findtext('path')
43 + if path is None:
44 + filename = annotation_root.findtext('filename')
45 + else:
46 + filename = os.path.basename(path)
47 + img_name = os.path.basename(filename)
48 + # img_id = os.path.splitext(img_name)[0]
49 + # if extract_num_from_imgid and isinstance(img_id, str):
50 + # img_id = int(re.findall(r'\d+', img_id)[0])
51 +
52 + img_id = id
53 + size = annotation_root.find('size')
54 + width = int(size.findtext('width'))
55 + height = int(size.findtext('height'))
56 +
57 + image_info = {
58 + 'file_name': filename,
59 + 'height': height,
60 + 'width': width,
61 + 'id': img_id
62 + }
63 + return image_info
64 +
65 +
66 +def get_coco_annotation_from_obj(obj, label2id):
67 + label = obj.findtext('name')
68 + assert label in label2id, f"Error: {label} is not in label2id !"
69 + category_id = label2id[label]
70 + bndbox = obj.find('bndbox')
71 + xmin = int(float(bndbox.findtext('xmin'))) - 1
72 + ymin = int(float(bndbox.findtext('ymin'))) - 1
73 + xmax = int(float(bndbox.findtext('xmax')))
74 + ymax = int(float(bndbox.findtext('ymax')))
75 + assert xmax > xmin and ymax > ymin, f"Box size error !: (xmin, ymin, xmax, ymax): {xmin, ymin, xmax, ymax}"
76 + o_width = xmax - xmin
77 + o_height = ymax - ymin
78 + ann = {
79 + 'area': o_width * o_height,
80 + 'iscrowd': 0,
81 + 'bbox': [xmin, ymin, o_width, o_height],
82 + 'category_id': category_id,
83 + 'ignore': 0,
84 + 'segmentation': [] # This script is not for segmentation
85 + }
86 + return ann
87 +
88 +
89 +def convert_xmls_to_cocojson(annotation_paths: List[str],
90 + label2id: Dict[str, int],
91 + output_jsonpath: str,
92 + extract_num_from_imgid: bool = True):
93 + output_json_dict = {
94 + "images": [],
95 + "type": "instances",
96 + "annotations": [],
97 + "categories": []
98 + }
99 + bnd_id = 1 # START_BOUNDING_BOX_ID, TODO input as args ?
100 + print('Start converting !')
101 + i = 1
102 + for a_path in tqdm(annotation_paths):
103 + # Read annotation xml
104 + ann_tree = ET.parse(a_path)
105 + ann_root = ann_tree.getroot()
106 +
107 + img_info = get_image_info(annotation_root=ann_root, id=i,
108 + extract_num_from_imgid=extract_num_from_imgid)
109 + img_id = img_info['id']
110 + output_json_dict['images'].append(img_info)
111 + i+=1
112 + for obj in ann_root.findall('object'):
113 + ann = get_coco_annotation_from_obj(obj=obj, label2id=label2id)
114 + ann.update({'image_id': img_id, 'id': bnd_id})
115 + output_json_dict['annotations'].append(ann)
116 + bnd_id = bnd_id + 1
117 +
118 + for label, label_id in label2id.items():
119 + category_info = {'supercategory': 'none', 'id': label_id, 'name': label}
120 + output_json_dict['categories'].append(category_info)
121 +
122 + with open(output_jsonpath, 'w') as f:
123 + output_json = json.dumps(output_json_dict)
124 + f.write(output_json)
125 +
126 +
127 +def main():
128 + parser = argparse.ArgumentParser(
129 + description='This script support converting voc format xmls to coco format json')
130 + parser.add_argument('--ann_dir', type=str, default=None,
131 + help='path to annotation files directory. It is not need when use --ann_paths_list')
132 + parser.add_argument('--ann_ids', type=str, default=None,
133 + help='path to annotation files ids list. It is not need when use --ann_paths_list')
134 + parser.add_argument('--ann_paths_list', type=str, default=None,
135 + help='path of annotation paths list. It is not need when use --ann_dir and --ann_ids')
136 + parser.add_argument('--labels', type=str, default=None,
137 + help='path to label list.')
138 + parser.add_argument('--labelt', type=str, default=None,
139 + help='path to label table.')
140 + parser.add_argument('--output', type=str, default='output.json', help='path to output json file')
141 + parser.add_argument('--ext', type=str, default='', help='additional extension of annotation file')
142 + parser.add_argument('--extract_num_from_imgid', action="store_true",
143 + help='Extract image number from the image filename')
144 + args = parser.parse_args()
145 + label2id = get_label2id(labels_path=args.labels, labeltable_path=args.labelt)
146 + ann_paths = get_annpaths(
147 + ann_dir_path=args.ann_dir,
148 + ann_ids_path=args.ann_ids,
149 + ext=args.ext,
150 + annpaths_list_path=args.ann_paths_list
151 + )
152 + convert_xmls_to_cocojson(
153 + annotation_paths=ann_paths,
154 + label2id=label2id,
155 + output_jsonpath=args.output,
156 + extract_num_from_imgid=args.extract_num_from_imgid
157 + )
158 +
159 +
160 +if __name__ == '__main__':
161 + main()