|
@@ -2,7 +2,7 @@
|
|
|
|
|
|
# PaddleRS Model Construction Parameters
|
|
|
|
|
|
-This document describes the construction parameters of each PaddleRS model trainer in detail, including their parameter names, parameter types, parameter descriptions and default values.
|
|
|
+This document describes the construction parameters of each PaddleRS model trainer, including their parameter names, parameter types, parameter descriptions, and default values.
|
|
|
|
|
|
## `BIT`
|
|
|
|
|
@@ -16,21 +16,22 @@ This implementation adopts pretrained encoders, as opposed to the original work
|
|
|
|-----------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------|-------------|
|
|
|
| `in_channels (int)` | Number of channels of the input image | `3` |
|
|
|
| `num_classes (int)` | Number of target classes | `2` |
|
|
|
-| `use_mixed_loss (bool, optional)` | Whether to use mixed loss function | `False` |
|
|
|
-| `losses (list, optional)` | List of loss functions | `None` |
|
|
|
-| `att_type (str, optional)` | Spatial attention type, optional values are `'CBAM'` and `'BAM'` | `'CBAM'` |
|
|
|
-| `ds_factor (int, optional)` | Downsampling factor | `1` |
|
|
|
-| `backbone (str, optional)` | ResNet architecture to use as backbone. Currently only `'resnet18'` and `'resnet34'` are supported | `'resnet18'` |
|
|
|
-| `n_stages (int, optional)` | Number of ResNet stages used in the backbone, should be a value in `{3, 4, 5}` | `4` |
|
|
|
-| `use_tokenizer (bool, optional)` | Whether to use tokenizer | `True` |
|
|
|
-| `token_len (int, optional)` | Length of input token | `4` |
|
|
|
-| `pool_mode (str, optional)` | Gets the pooling strategy for input tokens when `'use_tokenizer'` is set to False. `'max'` means global max pooling, `'avg'` means global average pooling | `'max'` |
|
|
|
-| `pool_size (int, optional)` | When `'use_tokenizer'` is set to False, the height and width of the pooled feature map | `2` |
|
|
|
-| `enc_with_pos (bool, optional)` | Whether to add learned positional embeddings to the encoder's input feature sequence | `True` |
|
|
|
-| `enc_depth (int, optional)` | Number of attention blocks used in encoder | `1` |
|
|
|
-| `enc_head_dim (int, optional)` | Embedding dimension of each encoder head | `64` |
|
|
|
-| `dec_depth (int, optional)` | Number of attention blocks used in decoder | `8` |
|
|
|
-| `dec_head_dim (int, optional)` | Embedding dimension for each decoder head | `8` |
|
|
|
+| `use_mixed_loss (bool)` | Whether to use mixed loss function | `False` |
|
|
|
+| `losses (list)` | List of loss functions | `None` |
|
|
|
+| `att_type (str)` | Spatial attention type values are `'CBAM'` and `'BAM'` | `'CBAM'` |
|
|
|
+| `ds_factor (int)` | Downsampling factor | `1` |
|
|
|
+| `backbone (str)` | ResNet architecture to use as backbone. Currently only `'resnet18'` and `'resnet34'` are supported | `'resnet18'` |
|
|
|
+| `n_stages (int)` | Number of ResNet stages used in the backbone, should be a value in `{3, 4, 5}` | `4` |
|
|
|
+| `use_tokenizer (bool)` | Whether to use tokenizer | `True` |
|
|
|
+| `token_len (int)` | Length of input token | `4` |
|
|
|
+| `pool_mode (str)` | Gets the pooling strategy for input tokens when `'use_tokenizer'` is set to False. `'max'` means global max pooling, `'avg'` means global average pooling | `'max'` |
|
|
|
+| `pool_size (int)` | When `'use_tokenizer'` is set to False, the height and width of the pooled feature map | `2` |
|
|
|
+| `enc_with_pos (bool)` | Whether to add learned positional embeddings to the encoder's input feature sequence | `True` |
|
|
|
+| `enc_depth (int)` | Number of attention blocks used in encoder | `1` |
|
|
|
+| `enc_head_dim (int)` | Embedding dimension of each encoder head | `64` |
|
|
|
+| `dec_depth (int)` | Number of attention blocks used in decoder | `8` |
|
|
|
+| `dec_head_dim (int)` | Embedding dimension for each decoder head | `8` |
|
|
|
+
|
|
|
|
|
|
## `CDNet`
|
|
|
|
|
@@ -45,6 +46,7 @@ The original article refers to Pablo F. Alcantarilla, et al., "Street-View Chang
|
|
|
| `losses (list)` | List of loss functions | `None` |
|
|
|
| `in_channels (int)` | Number of channels of the input image | `6` |
|
|
|
|
|
|
+
|
|
|
## `ChangeFormer`
|
|
|
|
|
|
The ChangeFormer implementation based on PaddlePaddle.
|
|
@@ -60,7 +62,8 @@ The original article refers to Wele Gedara Chaminda Bandara, Vishal M. Patel,
|
|
|
| `decoder_softmax (bool)` | Whether to use softmax as the last layer activation function of the decoder | `False` |
|
|
|
| `embed_dim (int)` | Hidden layer dimension of the Transformer encoder | `256` |
|
|
|
|
|
|
-## `ChangeStar_FarSeg`
|
|
|
+
|
|
|
+## `ChangeStar`
|
|
|
|
|
|
The ChangeStar implementation with a FarSeg encoder based on PaddlePaddle.
|
|
|
|
|
@@ -76,6 +79,7 @@ The original article refers to Z. Zheng, et al., "Change is Everywhere: Single-T
|
|
|
| `num_convs (int)` | Number of convolutional layers in UNet encoder and decoder | `4` |
|
|
|
| `scale_factor (float)` | Upsampling factor to scale the size of the output segmentation mask | `4.0` |
|
|
|
|
|
|
+
|
|
|
## `DSAMNet`
|
|
|
|
|
|
The DSAMNet implementation based on PaddlePaddle.
|
|
@@ -91,6 +95,7 @@ The original article refers to Q. Shi, et al., "A Deeply Supervised Attention Me
|
|
|
| `ca_ratio (int)` | Channel compression ratio in channel attention module | `8` |
|
|
|
| `sa_kernel (int)` | Kernel size in the spatial attention module | `7` |
|
|
|
|
|
|
+
|
|
|
## `DSIFN`
|
|
|
|
|
|
The DSIFN implementation based on PaddlePaddle.
|
|
@@ -104,7 +109,8 @@ The original article refers to C. Zhang, et al., "A deeply supervised image fusi
|
|
|
| `losses (list)` | List of loss functions | `None` |
|
|
|
| `use_dropout (bool)` | Whether to use dropout | `False`|
|
|
|
|
|
|
-## `FC-EF`
|
|
|
+
|
|
|
+## `FCEarlyFusion`
|
|
|
|
|
|
The FC-EF implementation based on PaddlePaddle.
|
|
|
|
|
@@ -118,7 +124,8 @@ The original article refers to Rodrigo Caye Daudt, et al. "Fully convolutional s
|
|
|
| `in_channels (int)` | Number of channels of the input image | `6` |
|
|
|
| `use_dropout (bool)` | Whether to use dropout | `False`|
|
|
|
|
|
|
-## `FC-Siam-conc`
|
|
|
+
|
|
|
+## `FCSiamConc`
|
|
|
|
|
|
The FC-Siam-conc implementation based on PaddlePaddle.
|
|
|
|
|
@@ -132,7 +139,8 @@ The original article refers to Rodrigo Caye Daudt, et al. "Fully convolutional s
|
|
|
| `in_channels (int)` | Number of channels of the input image | `3` |
|
|
|
| `use_dropout (bool)` | Whether to use dropout | `False`|
|
|
|
|
|
|
-## `FC-Siam-diff`
|
|
|
+
|
|
|
+## `FCSiamDiff`
|
|
|
|
|
|
The FC-Siam-diff implementation based on PaddlePaddle.
|
|
|
|
|
@@ -146,6 +154,7 @@ The original article refers to Rodrigo Caye Daudt, et al. "Fully convolutional s
|
|
|
| `in_channels (int)` | Number of channels of the input image | int | `3` |
|
|
|
| `use_dropout (bool)` | Whether to use dropout | `False` |
|
|
|
|
|
|
+
|
|
|
## `FCCDN`
|
|
|
|
|
|
The FCCDN implementation based on PaddlePaddle.
|
|
@@ -159,7 +168,8 @@ The original article refers to Pan Chen, et al., "FCCDN: Feature Constraint Netw
|
|
|
| `use_mixed_loss (bool)` | Whether to use mixed loss | `False`|
|
|
|
| `losses (list)` | List of loss functions | `None` |
|
|
|
|
|
|
-## `P2V-CD`
|
|
|
+
|
|
|
+## `P2V`
|
|
|
|
|
|
The P2V-CD implementation based on PaddlePaddle.
|
|
|
|
|
@@ -173,6 +183,7 @@ The original article refers to M. Lin, et al. "Transition Is a Process: Pair-to-
|
|
|
| `in_channels (int)` | Number of channels of the input image | `3` |
|
|
|
| `video_len (int)` | Number of input video frames | `8` |
|
|
|
|
|
|
+
|
|
|
## `SNUNet`
|
|
|
|
|
|
The SNUNet implementation based on PaddlePaddle.
|
|
@@ -183,7 +194,8 @@ The original article refers to S. Fang, et al., "SNUNet-CD: A Densely Connected
|
|
|
|------------------------|-------------------------------------------------|------|
|
|
|
| `in_channels (int)` | Number of channels of the input image | |
|
|
|
| `num_classes (int)` | Number of target classes | |
|
|
|
-| `width (int, optional)` | Output channels of the first convolutional layer | 32 |
|
|
|
+| `width (int)` | Output channels of the first convolutional layer | 32 |
|
|
|
+
|
|
|
|
|
|
## `STANet`
|
|
|
|
|
@@ -199,6 +211,7 @@ The original article refers to H. Chen and Z. Shi, "A Spatial-Temporal Attentio
|
|
|
| `in_channels (int)` | Number of channels of the input image | `3` |
|
|
|
| `width (int)` | Number of channels in the neural network | `32` |
|
|
|
|
|
|
+
|
|
|
## `CondenseNetV2`
|
|
|
|
|
|
The CondenseNetV2 implementation based on PaddlePaddle.
|
|
@@ -209,9 +222,10 @@ The CondenseNetV2 implementation based on PaddlePaddle.
|
|
|
| `use_mixed_loss (bool)` | Whether to use mixed loss function | `False` |
|
|
|
| `losses (list)` | List of loss functions | `None` |
|
|
|
| `in_channels (int)` | Number of channels of the input image | `3` |
|
|
|
-| `arch (str)` | Architecture of the model, can be `'A'`, `'B'` or `'C'` | `'A'` |
|
|
|
+| `arch (str)` | Architecture of the model, which can be `'A'`, `'B'`, or `'C'` | `'A'` |
|
|
|
|
|
|
-## `HRNet`
|
|
|
+
|
|
|
+## `HRNet`
|
|
|
|
|
|
The HRNet implementation based on PaddlePaddle.
|
|
|
|
|
@@ -222,7 +236,7 @@ The HRNet implementation based on PaddlePaddle.
|
|
|
| `losses (list)` | List of loss functions | `None` |
|
|
|
|
|
|
|
|
|
-## `MobileNetV3`
|
|
|
+## `MobileNetV3`
|
|
|
|
|
|
The MobileNetV3 implementation based on PaddlePaddle.
|
|
|
|
|
@@ -233,7 +247,7 @@ The MobileNetV3 implementation based on PaddlePaddle.
|
|
|
| `losses (list)` | List of loss functions | `None` |
|
|
|
|
|
|
|
|
|
-## `ResNet50-vd`
|
|
|
+## `ResNet50_vd`
|
|
|
|
|
|
The ResNet50-vd implementation based on PaddlePaddle.
|
|
|
|
|
@@ -243,6 +257,7 @@ The ResNet50-vd implementation based on PaddlePaddle.
|
|
|
| `use_mixed_loss (bool)` | Whether to use mixed loss function | `False` |
|
|
|
| `losses (list)` | List of loss functions | `None` |
|
|
|
|
|
|
+
|
|
|
## `DRN`
|
|
|
|
|
|
The DRN implementation based on PaddlePaddle.
|
|
@@ -250,16 +265,16 @@ The DRN implementation based on PaddlePaddle.
|
|
|
| Parameter Name | Description | Default Value |
|
|
|
|-------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------|
|
|
|
| `losses (list)` | List of loss functions | `None` |
|
|
|
-| `sr_factor (int)` | Scaling factor for super-resolution, the size of the original image will be multiplied by this factor. For example, if the original image is `H` x `W`, the output image will be `sr_factor * H` x `sr_factor * W`. | `4` |
|
|
|
-| `min_max (None \| tuple[float, float])` | Minimum and maximum image pixel values | `None` |
|
|
|
+| `sr_factor (int)` | Scaling factor for super-resolution. The output image size will be the original image size multiplied by this factor. For example, if the original image is `H` x `W`, the output image will be `sr_factor * H` x `sr_factor * W` | `4` |
|
|
|
+| `min_max (None \| tuple[float, float])` | Minimum and maximum pixel values of the input image. If not specified, the data type's default minimum and maximum values are used | `None` |
|
|
|
| `scales (tuple[int])` | Scaling factor | `(2, 4)` |
|
|
|
| `n_blocks (int)` | Number of residual blocks | `30` |
|
|
|
| `n_feats (int)` | Number of features in the residual block | `16` |
|
|
|
| `n_colors (int)` | Number of image channels | `3` |
|
|
|
| `rgb_range (float)` | Range of image pixel values | `1.0` |
|
|
|
| `negval (float)` | Negative value in nonlinear mapping | `0.2` |
|
|
|
-| `Supplementary Description of `lq_loss_weight` parameter (float)` | Weight of the low-quality image loss, which is used to control the impact of the reconstruction loss on the overall loss of restoring the low-resolution input image into a high-resolution output image. | `0.1` |
|
|
|
-| `dual_loss_weight (float)` | Weight of the bilateral loss | `0.1` |
|
|
|
+| `Supplementary Description of `lq_loss_weight` parameter (float)` | Weight of the primal regression loss | `0.1` |
|
|
|
+| `dual_loss_weight (float)` | Weight of the dual regression loss | `0.1` |
|
|
|
|
|
|
|
|
|
## `ESRGAN`
|
|
@@ -269,14 +284,15 @@ The ESRGAN implementation based on PaddlePaddle.
|
|
|
| Parameter Name | Description | Default Value |
|
|
|
|----------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| --- |
|
|
|
| `losses (list)` | List of loss functions | `None` |
|
|
|
-| `sr_factor (int)` | Scaling factor for super-resolution, the size of the original image will be multiplied by this factor. For example, if the original image is `H` x `W`, the output image will be `sr_factor * H` x `sr_factor * W` | `4` |
|
|
|
+| `sr_factor (int)` | Scaling factor for super-resolution. The output image size will be the original image size multiplied by this factor. For example, if the original image is `H` x `W`, the output image will be `sr_factor * H` x `sr_factor * W` | `4` |
|
|
|
| `min_max (tuple)` | Minimum and maximum pixel values of the input image. If not specified, the data type's default minimum and maximum values are used | `None` |
|
|
|
-| `use_gan (bool)` | Boolean indicating whether to use GAN (Generative Adversarial Network) during training. If yes, GAN will be used | `True` |
|
|
|
+| `use_gan (bool)` | Whether to use GAN (Generative Adversarial Network) during training. If yes, GAN will be used | `True` |
|
|
|
| `in_channels (int)` | Number of channels of the input image | `3` |
|
|
|
| `out_channels (int)` | Number of channels of the output image | `3` |
|
|
|
| `nf (int)` | Number of filters in the first convolutional layer of the model | `64` |
|
|
|
| `nb (int)` | Number of residual blocks in the model | `23` |
|
|
|
|
|
|
+
|
|
|
## `LESRCNN`
|
|
|
|
|
|
The LESRCNN implementation based on PaddlePaddle.
|
|
@@ -284,25 +300,26 @@ The LESRCNN implementation based on PaddlePaddle.
|
|
|
| Parameter Name | Description | Default Value |
|
|
|
|----------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| --- |
|
|
|
| `losses (list)` | List of loss functions | `None` |
|
|
|
-| `sr_factor (int)` | Scaling factor for super-resolution, the size of the original image will be multiplied by this factor. For example, if the original image is `H` x `W`, the output image will be `sr_factor * H` x `sr_factor * W`. | `4` |
|
|
|
-| `min_max (tuple)` | Minimum and maximum pixel values of the input image. If not specified, the data type's default minimum and maximum values are used. | `None` |
|
|
|
-| `multi_scale (bool)` | Boolean indicating whether to train on multiple scales. If yes, multiple scales are used during training. | `False` |
|
|
|
-| `group (int)` | Controls the number of groups for convolution operations. Standard convolution if set to `1`, DWConv if set to the number of input channels. | `1` |
|
|
|
+| `sr_factor (int)` | Scaling factor for super-resolution. The output image size will be the original image size multiplied by this factor. For example, if the original image is `H` x `W`, the output image will be `sr_factor * H` x `sr_factor * W` | `4` |
|
|
|
+| `min_max (tuple)` | Minimum and maximum pixel values of the input image. If not specified, the data type's default minimum and maximum values are used | `None` |
|
|
|
+| `multi_scale (bool)` | Whether to train on multiple scales. If yes, multiple scales are used during training | `False` |
|
|
|
+| `group (int)` | Number of groups used in convolution operations. | `1` |
|
|
|
+
|
|
|
|
|
|
-## `Faster R-CNN`
|
|
|
+## `FasterRCNN`
|
|
|
|
|
|
The Faster R-CNN implementation based on PaddlePaddle.
|
|
|
|
|
|
| Parameter Name | Description | Default Value |
|
|
|
|-------------------------------|------------------------------------------------------------------------------------------------------------| --- |
|
|
|
| `num_classes (int)` | Number of target classes | `80` |
|
|
|
-| `backbone (str)` | Backbone network model to use | `'ResNet50'` |
|
|
|
-| `with_fpn (bool)` | Boolean indicating whether to use Feature Pyramid Network (FPN) | `True` |
|
|
|
-| `with_dcn (bool)` | Boolean indicating whether to use Deformable Convolutional Networks (DCN) | `False` |
|
|
|
+| `backbone (str)` | Backbone network to use | `'ResNet50'` |
|
|
|
+| `with_fpn (bool)` | Whether to use Feature Pyramid Network (FPN) | `True` |
|
|
|
+| `with_dcn (bool)` | Whether to use Deformable Convolutional Networks (DCN) | `False` |
|
|
|
| `aspect_ratios (list)` | List of aspect ratios of candidate boxes | `[0.5, 1.0, 2.0]` |
|
|
|
| `anchor_sizes (list)` | list of sizes of candidate boxes expressed as base sizes on each feature map | `[[32], [64], [128], [256], [512]]` |
|
|
|
-| `keep_top_k (int)` | Number of predicted boxes to keep before NMS operation | `100` |
|
|
|
-| `nms_threshold (float)` | Non-maximum suppression (NMS) threshold to use | `0.5` |
|
|
|
+| `keep_top_k (int)` | Number of predicted boxes to keep before the non-maximum suppression (NMS) operation | `100` |
|
|
|
+| `nms_threshold (float)` | NMS threshold to use | `0.5` |
|
|
|
| `score_threshold (float)` | Score threshold for filtering predicted boxes | `0.05` |
|
|
|
| `fpn_num_channels (int)` | Number of channels for each pyramid layer in the FPN network | `256` |
|
|
|
| `rpn_batch_size_per_im (int)` | Ratio of positive and negative samples per image in the RPN network | `256` |
|
|
@@ -310,76 +327,80 @@ The Faster R-CNN implementation based on PaddlePaddle.
|
|
|
| `test_pre_nms_top_n (int)` | Number of predicted boxes to keep before NMS operation when testing. If not specified, `keep_top_k` is used. | `None` |
|
|
|
| `test_post_nms_top_n (int)` | Number of predicted boxes to keep after NMS operation at test time | `1000` |
|
|
|
|
|
|
-## `PP-YOLO`
|
|
|
+
|
|
|
+## `PPYOLO`
|
|
|
|
|
|
The PP-YOLO implementation based on PaddlePaddle.
|
|
|
|
|
|
| Parameter Name | Description | Default Value |
|
|
|
|----------------------------------|--------------------------------------------------------------------| --- |
|
|
|
| `num_classes (int)` | Number of target classes | `80` |
|
|
|
-| `backbone (str)` | PPYOLO's backbone network | `'ResNet50_vd_dcn'` |
|
|
|
-| `anchors (list[list[float]])` | Size of predefined anchor boxes | `None` |
|
|
|
+| `backbone (str)` | Backbone network to use | `'ResNet50_vd_dcn'` |
|
|
|
+| `anchors (list[list[float]])` | Sizes of predefined anchor boxes | `None` |
|
|
|
| `anchor_masks (list[list[int]])` | Masks for predefined anchor boxes | `None` |
|
|
|
| `use_coord_conv (bool)` | Whether to use coordinate convolution | `True` |
|
|
|
| `use_iou_aware (bool)` | Whether to use IoU awareness | `True` |
|
|
|
| `use_spp (bool)` | Whether to use spatial pyramid pooling (SPP) | `True` |
|
|
|
-| `use_drop_block (bool)` | Whether to use DropBlock regularization | `True` |
|
|
|
+| `use_drop_block (bool)` | Whether to use DropBlock | `True` |
|
|
|
| `scale_x_y (float)` | Parameter to scale each predicted box | `1.05` |
|
|
|
| `ignore_threshold (float)` | IoU threshold used to assign predicted boxes to ground truth boxes | `0.7` |
|
|
|
| `label_smooth (bool)` | Whether to use label smoothing | `False` |
|
|
|
-| `use_iou_loss (bool)` | Whether to use IoU Loss | `True` |
|
|
|
+| `use_iou_loss (bool)` | Whether to use IoU loss | `True` |
|
|
|
| `use_matrix_nms (bool)` | Whether to use Matrix NMS | `True` |
|
|
|
| `nms_score_threshold (float)` | NMS score threshold | `0.01` |
|
|
|
| `nms_topk (int)` | Maximum number of detections to keep before performing NMS | `-1` |
|
|
|
| `nms_keep_topk (int)` | Maximum number of prediction boxes to keep after NMS | `100`|
|
|
|
| `nms_iou_threshold (float)` | NMS IoU threshold | `0.45` |
|
|
|
|
|
|
-## `PP-YOLO Tiny`
|
|
|
+
|
|
|
+## `PPYOLOTiny`
|
|
|
|
|
|
The PP-YOLO Tiny implementation based on PaddlePaddle.
|
|
|
|
|
|
| Parameter Name | Description | Default Value |
|
|
|
|----------------------------------|-------------------------------------------------------------| --- |
|
|
|
| `num_classes (int)` | Number of target classes | `80` |
|
|
|
-| `backbone (str)` | Backbone network model name to use | `'MobileNetV3'` |
|
|
|
-| `anchors (list[list[float]])` | List of anchor box sizes | `[[10, 15], [24, 36], [72, 42], [35, 87], [102, 96] , [60, 170], [220, 125], [128, 222], [264, 266]]` |
|
|
|
-| `anchor_masks (list[list[int]])` | Anchor box mask | `[[6, 7, 8], [3, 4, 5], [0, 1, 2]]` |
|
|
|
-| `use_iou_aware (bool)` | Boolean value indicating whether to use IoU-aware loss | `False` |
|
|
|
-| `use_spp (bool)` | Boolean indicating whether to use the SPP module | `True` |
|
|
|
-| `use_drop_block (bool)` | Boolean value indicating whether to use the DropBlock block | `True` |
|
|
|
-| `scale_x_y (float)` | Scaling parameter | `1.05` |
|
|
|
-| `ignore_threshold (float)` | Ignore threshold | `0.5` |
|
|
|
-| `label_smooth (bool)` | Boolean indicating whether to use label smoothing | `False` |
|
|
|
-| `use_iou_loss (bool)` | Boolean value indicating whether to use IoU Loss | `True` |
|
|
|
-| `use_matrix_nms (bool)` | Boolean indicating whether to use Matrix NMS | `False` |
|
|
|
+| `backbone (str)` | Backbone network to use | `'MobileNetV3'` |
|
|
|
+| `anchors (list[list[float]])` | Sizes of predefined anchor boxes | `[[10, 15], [24, 36], [72, 42], [35, 87], [102, 96] , [60, 170], [220, 125], [128, 222], [264, 266]]` |
|
|
|
+| `anchor_masks (list[list[int]])` | Masks for predefined anchor boxes | `[[6, 7, 8], [3, 4, 5], [0, 1, 2]]` |
|
|
|
+| `use_iou_aware (bool)` | Whether to use IoU awareness | `False` |
|
|
|
+| `use_spp (bool)` | Whether to use spatial pyramid pooling (SPP) | `True` |
|
|
|
+| `use_drop_block (bool)` | Whether to use the DropBlock | `True` |
|
|
|
+| `scale_x_y (float)` | Parameter to scale each predicted box | `1.05` |
|
|
|
+| `ignore_threshold (float)` | IoU threshold used to assign predicted boxes to ground truth boxes | `0.5` |
|
|
|
+| `label_smooth (bool)` | Whether to use label smoothing | `False` |
|
|
|
+| `use_iou_loss (bool)` | Whether to use IoU loss | `True` |
|
|
|
+| `use_matrix_nms (bool)` | Whether to use Matrix NMS | `False` |
|
|
|
| `nms_score_threshold (float)` | NMS score threshold | `0.005` |
|
|
|
-| `nms_topk (int)` | Number of bounding boxes to keep before NMS operation | `1000` |
|
|
|
-| `nms_keep_topk (int)` | Number of bounding boxes to keep after NMS operation | `100` |
|
|
|
+| `nms_topk (int)` | Maximum number of detections to keep before performing NMS | `1000` |
|
|
|
+| `nms_keep_topk (int)` | Maximum number of prediction boxes to keep after NMS | `100` |
|
|
|
| `nms_iou_threshold (float)` | NMS IoU threshold | `0.45` |
|
|
|
|
|
|
-## `PP-YOLOv2`
|
|
|
+
|
|
|
+## `PPYOLOv2`
|
|
|
|
|
|
The PP-YOLOv2 implementation based on PaddlePaddle.
|
|
|
|
|
|
| Parameter Name | Description | Default Value |
|
|
|
|----------------------------------| --- | --- |
|
|
|
| `num_classes (int)` | Number of target classes | `80` |
|
|
|
-| `backbone (str)` | PPYOLO's backbone network | `'ResNet50_vd_dcn'` |
|
|
|
+| `backbone (str)` | Backbone network to use | `'ResNet50_vd_dcn'` |
|
|
|
| `anchors (list[list[float]])` | Sizes of predefined anchor boxes| `[[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]]` |
|
|
|
| `anchor_masks (list[list[int]])` | Masks of predefined anchor boxes | `[[6, 7, 8], [3, 4, 5], [0, 1, 2]]` |
|
|
|
| `use_iou_aware (bool)` | Whether to use IoU awareness | `True` |
|
|
|
| `use_spp (bool)` | Whether to use spatial pyramid pooling (SPP) | `True` |
|
|
|
-| `use_drop_block (bool)` | Whether to use DropBlock regularization | `True` |
|
|
|
+| `use_drop_block (bool)` | Whether to use DropBlock | `True` |
|
|
|
| `scale_x_y (float)` | Parameter to scale each predicted box | `1.05` |
|
|
|
| `ignore_threshold (float)` | IoU threshold used to assign predicted boxes to ground truth boxes | `0.7` |
|
|
|
| `label_smooth (bool)` | Whether to use label smoothing | `False` |
|
|
|
-| `use_iou_loss (bool)` | Whether to use IoU Loss | `True` |
|
|
|
+| `use_iou_loss (bool)` | Whether to use IoU loss | `True` |
|
|
|
| `use_matrix_nms (bool)` | Whether to use Matrix NMS | `True` |
|
|
|
| `nms_score_threshold (float)` | NMS score threshold | `0.01` |
|
|
|
| `nms_topk (int)` | Maximum number of detections to keep before performing NMS | `-1` |
|
|
|
| `nms_keep_topk (int)` | Maximum number of prediction boxes to keep after NMS | `100`|
|
|
|
| `nms_iou_threshold (float)` | NMS IoU threshold | `0.45` |
|
|
|
|
|
|
+
|
|
|
## `YOLOv3`
|
|
|
|
|
|
The YOLOv3 implementation based on PaddlePaddle.
|
|
@@ -387,17 +408,18 @@ The YOLOv3 implementation based on PaddlePaddle.
|
|
|
| Parameter Name | Description | Default Value |
|
|
|
| --- |-----------------------------------------------------------------------------------------------------------------------------| --- |
|
|
|
| `num_classes (int)` | Number of target classes | `80` |
|
|
|
-| `backbone (str)` | Name of the feature extraction network | `'MobileNetV1'` |
|
|
|
-| `anchors (list[list[int]])` | Sizes of all anchor boxes | `[[10, 13], [16, 30], [33, 23], [30, 61], [62, 45 ], [59, 119], [116, 90], [156, 198], [373, 326]]` |
|
|
|
-| `anchor_masks (list[list[int]])` | Which anchor boxes to use to predict the target box | `[[6, 7, 8], [3, 4, 5], [0, 1, 2]]` |
|
|
|
-| `ignore_threshold (float)` | IoU threshold of the predicted box and the ground truth box, below which the threshold will be considered as the background | `0.7` |
|
|
|
-| `nms_score_threshold (float)` | In non-maximum suppression, score threshold below which boxes will be discarded | `0.01` |
|
|
|
-| `nms_topk (int)` | In non-maximum value suppression, the maximum number of scoring boxes to keep, if it is -1, all boxes are kept | `1000` |
|
|
|
-| `nms_keep_topk (int)` | In non-maximum value suppression, the maximum number of boxes to keep per image | `100` |
|
|
|
-| `nms_iou_threshold (float)` | In non-maximum value suppression, IoU threshold, boxes larger than this threshold will be discarded | `0.45` |
|
|
|
-| `label_smooth (bool)` | Whether to use label smoothing when computing loss | `False` |
|
|
|
-
|
|
|
-## `BiSeNet V2`
|
|
|
+| `backbone (str)` | Backbone network to use | `'MobileNetV1'` |
|
|
|
+| `anchors (list[list[int]])` | Sizes of predefined anchor boxes | `[[10, 13], [16, 30], [33, 23], [30, 61], [62, 45 ], [59, 119], [116, 90], [156, 198], [373, 326]]` |
|
|
|
+| `anchor_masks (list[list[int]])` | Masks of predefined anchor boxes | `[[6, 7, 8], [3, 4, 5], [0, 1, 2]]` |
|
|
|
+| `ignore_threshold (float)` | IoU threshold used to assign predicted boxes to ground truth boxes | `0.7` |
|
|
|
+| `nms_score_threshold (float)` | NMS score threshold | `0.01` |
|
|
|
+| `nms_topk (int)` | Maximum number of detections to keep before performing NMS | `1000` |
|
|
|
+| `nms_keep_topk (int)` | Maximum number of prediction boxes to keep after NMS | `100` |
|
|
|
+| `nms_iou_threshold (float)` | NMS IoU threshold | `0.45` |
|
|
|
+| `label_smooth (bool)` | Whether to use label smoothing when computing losses | `False` |
|
|
|
+
|
|
|
+
|
|
|
+## `BiSeNetV2`
|
|
|
|
|
|
The BiSeNet V2 implementation based on PaddlePaddle.
|
|
|
|
|
@@ -409,7 +431,8 @@ The BiSeNet V2 implementation based on PaddlePaddle.
|
|
|
| `losses (list)` | List of loss functions | `{}` |
|
|
|
| `align_corners (bool)` | Whether to use the corner alignment method | `False` |
|
|
|
|
|
|
-## `DeepLab V3+`
|
|
|
+
|
|
|
+## `DeepLabV3P`
|
|
|
|
|
|
The DeepLab V3+ implementation based on PaddlePaddle.
|
|
|
|
|
@@ -421,7 +444,7 @@ The DeepLab V3+ implementation based on PaddlePaddle.
|
|
|
| `use_mixed_loss (bool)` | Whether to use mixed loss function | `False` |
|
|
|
| `losses (list)` | List of loss functions | `None` |
|
|
|
| `output_stride (int)` | Downsampling ratio of the output feature map relative to the input feature map | `8` |
|
|
|
-| `backbone_indices (tuple)` | Output the location indices of different stages of the backbone network | `(0, 3)` |
|
|
|
+| `backbone_indices (tuple)` | Indices of different stages of the backbone network for use | `(0, 3)` |
|
|
|
| `aspp_ratios (tuple)` | Dilation ratio of dilated convolution | `(1, 12, 24, 36)` |
|
|
|
| `aspp_out_channels (int)` | Number of ASPP module output channels | `256` |
|
|
|
| `align_corners (bool)` | Whether to use the corner alignment method | `False` |
|
|
@@ -440,6 +463,7 @@ The original article refers to A. Ma, J. Wang, Y. Zhong and Z. Zheng, "FactSeg:
|
|
|
| `use_mixed_loss (bool)` | Whether to use mixed loss function | `False` |
|
|
|
| `losses (list)` | List of loss functions | `None` |
|
|
|
|
|
|
+
|
|
|
## `FarSeg`
|
|
|
|
|
|
The FarSeg implementation based on PaddlePaddle.
|
|
@@ -453,7 +477,8 @@ The original article refers to Zheng Z, Zhong Y, Wang J, et al. Foreground-awar
|
|
|
| `use_mixed_loss (bool)` | Whether to use mixed loss function | `False` |
|
|
|
| `losses (list)` | List of loss functions | `None` |
|
|
|
|
|
|
-## `Fast-SCNN`
|
|
|
+
|
|
|
+## `FastSCNN`
|
|
|
|
|
|
The Fast-SCNN implementation based on PaddlePaddle.
|
|
|
|
|
@@ -466,7 +491,7 @@ The Fast-SCNN implementation based on PaddlePaddle.
|
|
|
| `align_corners (bool)` | Whether to use the corner alignment method | `False` |
|
|
|
|
|
|
|
|
|
-## `HRNet`
|
|
|
+## `HRNet`
|
|
|
|
|
|
The HRNet implementation based on PaddlePaddle.
|
|
|
|
|
@@ -474,7 +499,21 @@ The HRNet implementation based on PaddlePaddle.
|
|
|
|-------------------------|------------------------------------------------------------------------------------------------------------------| --- |
|
|
|
| `in_channels (int)` | Number of channels of the input image | `3` |
|
|
|
| `num_classes (int)` | Number of target classes | `2` |
|
|
|
-| `width (int)` | Initial number of channels for the network | `48` |
|
|
|
+| `width (int)` | Initial number of feature channels for the network | `48` |
|
|
|
+| `use_mixed_loss (bool)` | Whether to use mixed loss function | `False` |
|
|
|
+| `losses (list)` | List of loss functions | `None` |
|
|
|
+| `align_corners (bool)` | Whether to use the corner alignment method | `False` |
|
|
|
+
|
|
|
+
|
|
|
+## `UNet`
|
|
|
+
|
|
|
+The UNet implementation based on PaddlePaddle.
|
|
|
+
|
|
|
+| Parameter Name | Description | Default Value |
|
|
|
+|-------------------------|------------------------------------------------------------------------------------------------------------------| --- |
|
|
|
+| `in_channels (int)` | Number of channels of the input image | `3` |
|
|
|
+| `num_classes (int)` | Number of target classes | `2` |
|
|
|
+| `use_deconv (int)` | Whether to use deconvolution for upsampling | `48` |
|
|
|
| `use_mixed_loss (bool)` | Whether to use mixed loss function | `False` |
|
|
|
| `losses (list)` | List of loss functions | `None` |
|
|
|
| `align_corners (bool)` | Whether to use the corner alignment method | `False` |
|