Просмотр исходного кода

refine rotated object detector (#146)

Asthestarsfalll 1 год назад
Родитель
Сommit
470ae257fe

+ 8 - 8
docs/apis/data_cn.md

@@ -16,7 +16,7 @@
 |-------|----|--------|-----|
 |`data_dir`|`str`|数据集存放目录。||
 |`file_list`|`str`|file list路径。file list是一个文本文件,其中每一行包含一个样本的路径信息。`CDDataset`对file list的具体要求请参见下文。||
-|`transforms`|`paddlers.transforms.Compose`|对输入数据应用的数据变换算子。||
+|`transforms`|`paddlers.transforms.Compose` \| `list`|对输入数据应用的数据变换算子。||
 |`label_list`|`str` \| `None`|label list文件。label list是一个文本文件,其中每一行包含一个类别的名称。|`None`|
 |`num_workers`|`int` \| `str`|加载数据时使用的辅助进程数。若设置为`'auto'`,则按照如下规则确定使用进程数:当CPU核心数大于16时,使用8个数据读取辅助进程;否则,使用CPU核心数一半数量的辅助进程。|`'auto'`|
 |`shuffle`|`bool`|是否随机打乱数据集中的样本。|`False`|
@@ -38,7 +38,7 @@
 |-------|----|--------|-----|
 |`data_dir`|`str`|数据集存放目录。||
 |`file_list`|`str`|file list路径。file list是一个文本文件,其中每一行包含一个样本的路径信息。`ClasDataset`对file list的具体要求请参见下文。||
-|`transforms`|`paddlers.transforms.Compose`|对输入数据应用的数据变换算子。||
+|`transforms`|`paddlers.transforms.Compose` \| `list`|对输入数据应用的数据变换算子。||
 |`label_list`|`str` \| `None`|label list文件。label list是一个文本文件,其中每一行包含一个类别的名称。|`None`|
 |`num_workers`|`int` \| `str`|加载数据时使用的辅助进程数。若设置为`'auto'`,则按照如下规则确定使用进程数:当CPU核心数大于16时,使用8个数据读取辅助进程;否则,使用CPU核心数一半数量的辅助进程。|`'auto'`|
 |`shuffle`|`bool`|是否随机打乱数据集中的样本。|`False`|
@@ -58,13 +58,13 @@
 |`data_dir`|`str`|数据集存放目录。||
 |`image_dir`|`str`|输入图像存放目录。||
 |`anno_path`|`str`|[COCO格式](https://cocodataset.org/#home)标注文件路径。||
-|`transforms`|`paddlers.transforms.Compose`|对输入数据应用的数据变换算子。||
+|`transforms`|`paddlers.transforms.Compose` \| `list`|对输入数据应用的数据变换算子。||
 |`label_list`|`str` \| `None`|label list文件。label list是一个文本文件,其中每一行包含一个类别的名称。|`None`|
 |`num_workers`|`int` \| `str`|加载数据时使用的辅助进程数。若设置为`'auto'`,则按照如下规则确定使用进程数:当CPU核心数大于16时,使用8个数据读取辅助进程;否则,使用CPU核心数一半数量的辅助进程。|`'auto'`|
 |`shuffle`|`bool`|是否随机打乱数据集中的样本。|`False`|
 |`allow_empty`|`bool`|是否向数据集中添加负样本。|`False`|
 |`empty_ratio`|`float`|负样本占比,仅当`allow_empty`为`True`时生效。若`empty_ratio`为负值或大于等于1,则保留所有生成的负样本。|`1.0`|
-|`batch_transforms`|`paddlers.transforms.BatchCompose`|对输入数据应用的批数据变换算子。||
+|`batch_transforms`|`paddlers.transforms.BatchCompose` \| `list`|对输入数据应用的批数据变换算子。||
 
 ### VOC格式目标检测数据集`VOCDetDataset`
 
@@ -76,13 +76,13 @@
 |-------|----|--------|-----|
 |`data_dir`|`str`|数据集存放目录。||
 |`file_list`|`str`|file list路径。file list是一个文本文件,其中每一行包含一个样本的路径信息。`VOCDetDataset`对file list的具体要求请参见下文。||
-|`transforms`|`paddlers.transforms.Compose`|对输入数据应用的数据变换算子。||
+|`transforms`|`paddlers.transforms.Compose` \| `list`|对输入数据应用的数据变换算子。||
 |`label_list`|`str` \| `None`|label list文件。label list是一个文本文件,其中每一行包含一个类别的名称。|`None`|
 |`num_workers`|`int` \| `str`|加载数据时使用的辅助进程数。若设置为`'auto'`,则按照如下规则确定使用进程数:当CPU核心数大于16时,使用8个数据读取辅助进程;否则,使用CPU核心数一半数量的辅助进程。|`'auto'`|
 |`shuffle`|`bool`|是否随机打乱数据集中的样本。|`False`|
 |`allow_empty`|`bool`|是否向数据集中添加负样本。|`False`|
 |`empty_ratio`|`float`|负样本占比,仅当`allow_empty`为`True`时生效。若`empty_ratio`为负值或大于等于1,则保留所有生成的负样本。|`1.0`|
-|`batch_transforms`|`paddlers.transforms.BatchCompose`|对输入数据应用的批数据变换算子。||
+|`batch_transforms`|`paddlers.transforms.BatchCompose` \| `list`|对输入数据应用的批数据变换算子。||
 
 `VOCDetDataset`对file list的要求如下:
 
@@ -98,7 +98,7 @@
 |-------|----|--------|-----|
 |`data_dir`|`str`|数据集存放目录。||
 |`file_list`|`str`|file list路径。file list是一个文本文件,其中每一行包含一个样本的路径信息。`ResDataset`对file list的具体要求请参见下文。||
-|`transforms`|`paddlers.transforms.Compose`|对输入数据应用的数据变换算子。||
+|`transforms`|`paddlers.transforms.Compose` \| `list`|对输入数据应用的数据变换算子。||
 |`num_workers`|`int` \| `str`|加载数据时使用的辅助进程数。若设置为`'auto'`,则按照如下规则确定使用进程数:当CPU核心数大于16时,使用8个数据读取辅助进程;否则,使用CPU核心数一半数量的辅助进程。|`'auto'`|
 |`shuffle`|`bool`|是否随机打乱数据集中的样本。|`False`|
 |`sr_factor`|`int` \| `None`|对于超分辨率重建任务,指定为超分辨率倍数;对于其它任务,指定为`None`。|`None`|
@@ -117,7 +117,7 @@
 |-------|----|--------|-----|
 |`data_dir`|`str`|数据集存放目录。||
 |`file_list`|`str`|file list路径。file list是一个文本文件,其中每一行包含一个样本的路径信息。`SegDataset`对file list的具体要求请参见下文。||
-|`transforms`|`paddlers.transforms.Compose`|对输入数据应用的数据变换算子。||
+|`transforms`|`paddlers.transforms.Compose` \| `list`|对输入数据应用的数据变换算子。||
 |`label_list`|`str` \| `None`|label list文件。label list是一个文本文件,其中每一行包含一个类别的名称。|`None`|
 |`num_workers`|`int` \| `str`|加载数据时使用的辅助进程数。若设置为`'auto'`,则按照如下规则确定使用进程数:当CPU核心数大于16时,使用8个数据读取辅助进程;否则,使用CPU核心数一半数量的辅助进程。|`'auto'`|
 |`shuffle`|`bool`|是否随机打乱数据集中的样本。|`False`|

+ 8 - 8
docs/apis/data_en.md

@@ -16,7 +16,7 @@ The initialization parameter list is as follows:
 |-------|----|--------|-----|
 |`data_dir`|`str`|Directory that stores the dataset.||
 |`file_list`|`str`|File list path. File list is a text file, in which each line contains the path infomation of one sample. The specific requirements of `CDDataset` on the file list are listed below.||
-|`transforms`|`paddlers.transforms.Compose`|Data transformation operators applied to input data.||
+|`transforms`|`paddlers.transforms.Compose` \| `list`|Data transformation operators applied to input data.||
 |`label_list`|`str` \| `None`|Label list path. Label list is a text file, in which each line contains the name of class.|`None`|
 |`num_workers`|`int` \| `str`|Number of auxiliary processes used when loading data. If it is set to `'auto'`, use the following rules to determine the number of processes to use: When the number of CPU cores is greater than 16, 8 data read auxiliary processes are used; otherwise, the number of auxiliary processes is set to half the counts of CPU cores.|`'auto'`|
 |`shuffle`|`bool`|Whether to randomly shuffle the samples in the dataset.|`False`|
@@ -38,7 +38,7 @@ The initialization parameter list is as follows:
 |-------|----|--------|-----|
 |`data_dir`|`str`|Directory that stores the dataset.||
 |`file_list`|`str`|File list path. File list is a text file, in which each line contains the path infomation of one sample.The specific requirements of `ClasDataset` on the file list are listed below.||
-|`transforms`|`paddlers.transforms.Compose`|Data transformation operators applied to input data.||
+|`transforms`|`paddlers.transforms.Compose` \| `list`|Data transformation operators applied to input data.||
 |`label_list`|`str` \| `None`|Label list path. Label list is a text file, in which each line contains the name of class.|`None`|
 |`num_workers`|`int` \| `str`|Number of auxiliary processes used when loading data. If it is set to `'auto'`, use the following rules to determine the number of processes to use: When the number of CPU cores is greater than 16, 8 data read auxiliary processes are used; otherwise, the number of auxiliary processes is set to half the counts of CPU cores.|`'auto'`|
 |`shuffle`|`bool`|Whether to randomly shuffle the samples in the dataset.|`False`|
@@ -58,13 +58,13 @@ The initialization parameter list is as follows:
 |`data_dir`|`str`|Directory that stores the dataset.||
 |`image_dir`|`str`|Directory of input images.||
 |`anno_path`|`str`|[COCO Format](https://cocodataset.org/#home)label file path.||
-|`transforms`|`paddlers.transforms.Compose`|Data transformation operators applied to input data.||
+|`transforms`|`paddlers.transforms.Compose` \| `list`|Data transformation operators applied to input data.||
 |`label_list`|`str` \| `None`|Label list path. Label list is a text file, in which each line contains the name of class.|`None`|
 |`num_workers`|`int` \| `str`|Number of auxiliary processes used when loading data. If it is set to `'auto'`, use the following rules to determine the number of processes to use: When the number of CPU cores is greater than 16, 8 data read auxiliary processes are used; otherwise, the number of auxiliary processes is set to half the counts of CPU cores.|`'auto'`|
 |`shuffle`|`bool`|Whether to randomly shuffle the samples in the dataset.|`False`|
 |`allow_empty`|`bool`|Whether to add negative samples to the dataset.|`False`|
 |`empty_ratio`|`float`|Negative sample ratio. Take effect only if `allow_empty` is `True`. If `empty_ratio` is negative or greater than or equal to 1, all negative samples generated are retained.|`1.0`|
-|`batch_transforms`|`paddlers.transforms.BatchCompose`|Data batch transformation operators applied to input data.||
+|`batch_transforms`|`paddlers.transforms.BatchCompose` \| `list`|Data batch transformation operators applied to input data.||
 
 ### VOC Format Object Detection Dataset `VOCDetDataset`
 
@@ -76,13 +76,13 @@ The initialization parameter list is as follows:
 |-------|----|--------|-----|
 |`data_dir`|`str`|Directory that stores the dataset. ||
 |`file_list`|`str`|File list path. File list is a text file, in which each line contains the path infomation of one sample.The specific requirements of `VOCDetDataset` on the file list are listed below.||
-|`transforms`|`paddlers.transforms.Compose`|Data transformation operators applied to input data. ||
+|`transforms`|`paddlers.transforms.Compose` \| `list`|Data transformation operators applied to input data. ||
 |`label_list`|`str` \| `None`|Label list path. Label list is a text file, in which each line contains the name of class.|`None`|
 |`num_workers`|`int` \| `str`|Number of auxiliary processes used when loading data. If it is set to `'auto'`, use the following rules to determine the number of processes to use: When the number of CPU cores is greater than 16, 8 data read auxiliary processes are used; otherwise, the number of auxiliary processes is set to half the counts of CPU cores.|`'auto'`|
 |`shuffle`|`bool`|Whether to randomly shuffle the samples in the dataset.|`False`|
 |`allow_empty`|`bool`|Whether to add negative samples to the dataset.|`False`|
 |`empty_ratio`|`float`|Negative sample ratio. Takes effect only if `allow_empty` is `True`. If `empty_ratio` is negative or greater than or equal to `1`, all negative samples generated will be retained.|`1.0`|
-|`batch_transforms`|`paddlers.transforms.BatchCompose`|Data batch transformation operators applied to input data.||
+|`batch_transforms`|`paddlers.transforms.BatchCompose` \| `list`|Data batch transformation operators applied to input data.||
 
 The requirements of `VOCDetDataset` for the file list are as follows:
 
@@ -98,7 +98,7 @@ The initialization parameter list is as follows:
 |-------|----|--------|-----|
 |`data_dir`|`str`|Directory that stores the dataset.||
 |`file_list`|`str`|File list path. file list is a text file, in which each line contains the path infomation of one sample.The specific requirements of `ResDataset` on the file list are listed below.||
-|`transforms`|`paddlers.transforms.Compose`|Data transformation operators applied to input data.||
+|`transforms`|`paddlers.transforms.Compose` \| `list`|Data transformation operators applied to input data.||
 |`num_workers`|`int` \| `str`|Number of auxiliary processes used when loading data. If it is set to `'auto'`, use the following rules to determine the number of processes to use: When the number of CPU cores is greater than 16, 8 data read auxiliary processes are used; otherwise, the number of auxiliary processes is set to half the counts of CPU cores.|`'auto'`|
 |`shuffle`|`bool`|Whether to randomly shuffle the samples in the dataset.|`False`|
 |`sr_factor`|`int` \| `None`|For super resolution reconstruction task, this is the scaling factor. For other tasks, please specify `sr_factor` as `None`.|`None`|
@@ -117,7 +117,7 @@ The initialization parameter list is as follows:
 |-------|----|--------|-----|
 |`data_dir`|`str`|Directory that stores the dataset.||
 |`file_list`|`str`|File list path. file list is a text file, in which each line contains the path infomation of one sample.The specific requirements of `SegDataset` on the file list are listed below.||
-|`transforms`|`paddlers.transforms.Compose`|Data transformation operators applied to input data.||
+|`transforms`|`paddlers.transforms.Compose` \| `list`|Data transformation operators applied to input data.||
 |`label_list`|`str` \| `None`|Label list path. Label list is a text file, in which each line contains the name of class.|`None`|
 |`num_workers`|`int` \| `str`|Number of auxiliary processes used when loading data. If it is set to `'auto'`, use the following rules to determine the number of processes to use: When the number of CPU cores is greater than 16, 8 data read auxiliary processes are used; otherwise, the number of auxiliary processes is set to half the counts of CPU cores.|`'auto'`|
 |`shuffle`|`bool`|Whether to randomly shuffle the samples in the dataset.|`False`|

+ 6 - 0
docs/apis/train_cn.md

@@ -164,6 +164,7 @@ def train(self,
           learning_rate=.001,
           warmup_steps=0,
           warmup_start_lr=0.0,
+          scheduler='Piecewise',
           lr_decay_epochs=(216, 243),
           lr_decay_gamma=0.1,
           cosine_decay_num_epochs=1000,
@@ -172,6 +173,8 @@ def train(self,
           early_stop=False,
           early_stop_patience=5,
           use_vdl=True,
+          clip_grad_by_norm=None,
+          reg_coeff=1e-4,
           resume_checkpoint=None,
           precision='fp32',
           amp_level='O1',
@@ -195,6 +198,7 @@ def train(self,
 |`learning_rate`|`float`|训练时使用的学习率大小,适用于默认优化器。|`0.001`|
 |`warmup_steps`|`int`|默认优化器使用[warm-up策略](https://www.mdpi.com/2079-9292/10/16/2029/htm)的预热轮数。|`0`|
 |`warmup_start_lr`|`int`|默认优化器warm-up阶段使用的初始学习率。|`0`|
+|`scheduler`|`str`|训练时使用的学习率调度器,若为`None`,则使用默认定义的优化器。|`None`|
 |`lr_decay_epochs`|`list` \| `tuple`|默认优化器学习率衰减的milestones,以epoch计。即,在第几个epoch执行学习率的衰减。|`(216, 243)`|
 |`lr_decay_gamma`|`float`|学习率衰减系数,适用于默认优化器。|`0.1`|
 |`cosine_decay_num_epochs`|`int`|使用余弦退火学习率调度器时计算退火周期的参数。|`1000`|
@@ -203,6 +207,8 @@ def train(self,
 |`early_stop`|`bool`|训练过程是否启用早停策略。|`False`|
 |`early_stop_patience`|`int`|启用早停策略时的`patience`参数(参见[`EarlyStop`](https://github.com/PaddlePaddle/PaddleRS/blob/develop/paddlers/utils/utils.py))。|`5`|
 |`use_vdl`|`bool`|是否启用VisualDL日志。|`True`|
+|`clip_grad_by_norm`|`float`|梯度裁剪所允许的最大值。|`None`|
+|`reg_coeff`|`float`|L2正则化系数。|`1e-5`|
 |`resume_checkpoint`|`str` \| `None`|检查点路径。PaddleRS支持从检查点(包含先前训练过程中存储的模型权重和优化器权重)继续训练,但需注意`resume_checkpoint`与`pretrain_weights`不得同时设置为`None`以外的值。|`None`|
 |`precision`|`str`|当设定为`'fp16'`时,启用自动混合精度训练。|`'fp32'`|
 |`amp_level`|`str`|自动混合精度训练模式。在O1模式下,基于白名单和黑名单确定每个算子使用FP16还是FP32精度计算。在O2模式下,除自定义黑名单中指定的算子以及部分不支持FP16精度的算子以外,全部使用FP16精度计算。|`'O1'`|

+ 6 - 0
docs/apis/train_en.md

@@ -164,6 +164,7 @@ def train(self,
           learning_rate=.001,
           warmup_steps=0,
           warmup_start_lr=0.0,
+          scheduler='Piecewise',
           lr_decay_epochs=(216, 243),
           lr_decay_gamma=0.1,
           cosine_decay_num_epochs=1000,
@@ -172,6 +173,8 @@ def train(self,
           early_stop=False,
           early_stop_patience=5,
           use_vdl=True,
+          clip_grad_by_norm=None,
+          reg_coeff=1e-4,
           resume_checkpoint=None,
           precision='fp32',
           amp_level='O1',
@@ -195,6 +198,7 @@ The meaning of each parameter is as follows:
 |`learning_rate`|`float`|Learning rate used during training, for default optimizer.|`0.001`|
 |`warmup_steps`|`int`|Number of [warm-up](https://www.mdpi.com/2079-9292/10/16/2029/htm) rounds used by the default optimizer.|`0`|
 |`warmup_start_lr`|`int`|Default initial learning rate used in the warm-up phase of the optimizer.|`0`|
+|`scheduler`|`str`|Learning rate scheduler used for training. If None, a default scheduler will be used.|`None`|
 |`lr_decay_epochs`|`list` \| `tuple`|Milestones of learning rate decline of the default optimizer, in terms of epochs. That is, which epoch the decay of the learning rate occurs.|`(216, 243)`|
 |`lr_decay_gamma`|`float`|Learning rate attenuation coefficient, for default optimizer.|`0.1`|
 |`cosine_decay_num_epochs`|`int`|Parameter to determine the annealing cycle when a cosine annealing learning rate scheduler is used.|`1000`|
@@ -203,6 +207,8 @@ The meaning of each parameter is as follows:
 |`early_stop`|`bool`|Whether to enable the early stopping policy during training.|`False`|
 |`early_stop_patience`|`int`|`patience` parameter when the early stopping policy is enabled. Please refer to [`EarlyStop`](https://github.com/PaddlePaddle/PaddleRS/blob/develop/paddlers/utils/utils.py) for more details.|`5`|
 |`use_vdl`|`bool`|Whether to enable VisualDL.|`True`|
+|`clip_grad_by_norm`|`float`|Maximum global norm for gradient clipping.|`None`|
+|`reg_coeff`|`float`|Coefficient for L2 regularization.|`1e-4`|
 |`resume_checkpoint`|`str` \| `None`|Checkpoint path. PaddleRS supports resuming training from checkpoints (including model weights and optimizer weights stored during previous training), but note that `resume_checkpoint` and `pretrain_weights` must not be set to values other than `None` at the same time.|`None`|
 |`precision`|`str`|Use AMP (auto mixed precision) training if `precision` is set to `'fp16'`.|`'fp32'`|
 |`amp_level`|`str`|Auto mixed precision level. Accepted values are 'O1' and 'O2': At O1 level, the input data type of each operator will be casted according to a white list and a black list. At O2 level, all parameters and input data will be casted to FP16, except those for the operators in the black list, those without the support for FP16 kernel, and those for the batchnorm layers.|`'O1'`|

+ 2 - 1
docs/quick_start_cn.md

@@ -55,7 +55,8 @@ pip install GDAL‑3.3.3‑cp39‑cp39‑win_amd64.whl
 
 4. (可选)安装ext_op
 
-PaddleRS支持旋转目标检测,在使用之前需要安装`ext_op`外部自定义库,安装方式如下:
+PaddleRS支持旋转目标检测,在使用之前需要安装自定义外部算子库`ext_op`,安装方式如下:
+
 ```shell
 cd paddlers/models/ppdet/ext_op
 python setup.py install

+ 1 - 1
docs/quick_start_en.md

@@ -48,7 +48,7 @@ pip install GDAL‑3.3.3‑cp39‑cp39‑win_amd64.whl
 
 4. (Optional) Install ext_op
 
-PaddleRS supports rotated object detection, which requires the installation of the `ext_op` external custom library before use. you need ti install ext_op as follows:
+PaddleRS supports rotated object detection, which requires the installation of the `ext_op` library before use. you need ti install ext_op as follows:
 
 ```shell
 cd paddlers/models/ppdet/ext_op

+ 3 - 2
paddlers/datasets/base.py

@@ -44,8 +44,9 @@ class BaseDataset(Dataset):
 
         self.num_workers = get_num_workers(num_workers)
         self.shuffle = shuffle
-        self.batch_transforms = None
-        self.build_collate_fn(batch_transforms)
+        if isinstance(batch_transforms, list):
+            batch_transforms = BatchCompose(batch_transforms)
+        self.batch_transforms = batch_transforms
 
     def __getitem__(self, idx):
         sample = construct_sample_from_dict(self.file_list[idx])

+ 0 - 1
paddlers/datasets/voc.py

@@ -80,7 +80,6 @@ class VOCDetDataset(BaseDataset):
                     self.num_max_boxes *= 2
                     break
 
-        self.batch_transforms = None
         self.allow_empty = allow_empty
         self.empty_ratio = empty_ratio
         self.file_list = list()

+ 8 - 4
paddlers/tasks/base.py

@@ -115,7 +115,7 @@ class BaseModel(metaclass=ModelMeta):
                 backbone_name = getattr(self, 'backbone_name', None)
                 pretrain_weights = get_pretrain_weights(
                     pretrain_weights,
-                    self.__class__.__name__,
+                    self.model_name,
                     save_dir,
                     backbone_name=backbone_name)
         if pretrain_weights is not None:
@@ -332,7 +332,7 @@ class BaseModel(metaclass=ModelMeta):
                     save_dtype='float32')
 
         # XXX: Hard-coding
-        if self.model_type == 'detector' and 'RCNN' in self.__class__.__name__ and train_dataset.pos_num < len(
+        if self.model_type == 'detector' and 'RCNN' in self.model_name and train_dataset.pos_num < len(
                 train_dataset.file_list):
             nranks = 1
         else:
@@ -386,6 +386,10 @@ class BaseModel(metaclass=ModelMeta):
             step_time_tic = time.time()
 
             for step, data in enumerate(self.train_data_loader()):
+                # `PicoDet` and `PPYOLOE_R` need to switch label assinger according to epoch_id
+                # TODO: refactor this
+                if self.model_name in ['PicoDet', 'PPYOLOE_R']:
+                    data['epoch_id'] = i
                 if nranks > 1:
                     outputs = self.train_step(step, data, ddp_net, optimizer)
                 else:
@@ -499,9 +503,9 @@ class BaseModel(metaclass=ModelMeta):
                 Defaults to 'output'.
         """
 
-        if self.__class__.__name__ in {'FasterRCNN', 'MaskRCNN', 'PicoDet'}:
+        if self.model_name in {'FasterRCNN', 'MaskRCNN', 'PicoDet'}:
             raise ValueError("{} does not support pruning currently!".format(
-                self.__class__.__name__))
+                self.model_name))
 
         assert criterion in {'l1_norm', 'fpgm'}, \
             "Pruning criterion {} is not supported. Please choose from {'l1_norm', 'fpgm'}."

+ 356 - 146
paddlers/tasks/object_detector.py

@@ -28,7 +28,7 @@ import paddlers.models.ppdet as ppdet
 from paddlers.models.ppdet.modeling.proposal_generator.target_layer import BBoxAssigner, MaskAssigner
 from paddlers.transforms import decode_image, construct_sample
 from paddlers.transforms.operators import _NormalizeBox, _PadBox, _BboxXYXY2XYWH, Resize, Pad
-from paddlers.transforms.batch_operators import BatchCompose, _BatchPad, _Gt2YoloTarget
+from paddlers.transforms.batch_operators import BatchCompose, _BatchPad, _Gt2YoloTarget, BatchPadRGT, BatchNormalizeImage
 from paddlers.models.ppdet.optimizer import ModelEMA
 import paddlers.utils.logging as logging
 from paddlers.utils.checkpoint import det_pretrain_weights_dict
@@ -43,6 +43,7 @@ __all__ = [
     "PPYOLOv2",
     "MaskRCNN",
     "FCOSR",
+    "PPYOLOE_R",
 ]
 
 # TODO: Prune and decoupling
@@ -57,6 +58,7 @@ class BaseDetector(BaseModel):
         'rbox':
         {'im_id', 'image_shape', 'image', 'gt_bbox', 'gt_class', 'gt_poly'},
     }
+    supported_backbones = None
 
     def __init__(self, model_name, num_classes=80, **params):
         self.init_params.update(locals())
@@ -83,6 +85,13 @@ class BaseDetector(BaseModel):
     def set_data_fields(cls, data_name, data_fields):
         cls.data_fields[data_name] = data_fields
 
+    def _is_backbone_weight(self):
+        target_backbone = ['ESNET_', 'CSPResNet_']
+        for b in target_backbone:
+            if b in self.backbone_name:
+                return True
+        return False
+
     def _build_inference_net(self):
         infer_net = self.net
         infer_net.eval()
@@ -102,6 +111,12 @@ class BaseDetector(BaseModel):
         }]
         return input_spec
 
+    def _check_backbone(self, backbone):
+        if backbone not in self.supported_backbones:
+            raise ValueError(
+                "backbone: {} is not supported. Please choose one of "
+                "{}.".format(backbone, self.supported_backbones))
+
     def _check_image_shape(self, image_shape):
         if len(image_shape) == 2:
             image_shape = [1, 3] + image_shape
@@ -129,7 +144,7 @@ class BaseDetector(BaseModel):
             depth = name[0]
             fixed_kwargs['depth'] = int(depth[6:])
             if len(name) > 1:
-                fixed_kwargs['variant'] = name[1]
+                fixed_kwargs['variant'] = name[1][1]
             backbone = getattr(ppdet.modeling, 'ResNet')
             backbone = functools.partial(backbone, **fixed_kwargs)
         else:
@@ -254,6 +269,7 @@ class BaseDetector(BaseModel):
               early_stop_patience=5,
               use_vdl=True,
               clip_grad_by_norm=None,
+              reg_coeff=1e-4,
               resume_checkpoint=None,
               precision='fp32',
               amp_level='O1',
@@ -286,6 +302,8 @@ class BaseDetector(BaseModel):
                 Defaults to 0.
             warmup_start_lr (float, optional): Start learning rate of warm-up training. 
                 Defaults to 0..
+            scheduler (str, optional): Learning rate scheduler used for training. If None,
+                a default scheduler will be used. Default to None.
             lr_decay_epochs (list|tuple, optional): Epoch milestones for learning 
                 rate decay. Defaults to (216, 243).
             lr_decay_gamma (float, optional): Gamma coefficient of learning rate decay. 
@@ -303,6 +321,10 @@ class BaseDetector(BaseModel):
             early_stop_patience (int, optional): Early stop patience. Defaults to 5.
             use_vdl(bool, optional): Whether to use VisualDL to monitor the training 
                 process. Defaults to True.
+            clip_grad_by_norm (float, optional): Maximum global norm for gradient clipping. 
+                Default to None.
+            reg_coeff (float, optional): Coefficient for L2 weight decay regularization.
+                Default to 1e-4.
             resume_checkpoint (str|None, optional): Path of the checkpoint to resume
                 training from. If None, no training checkpoint will be resumed. At most
                 Aone of `resume_checkpoint` and `pretrain_weights` can be set simultaneously.
@@ -330,14 +352,14 @@ class BaseDetector(BaseModel):
     def _pre_train(self, in_args):
         return in_args
 
-    def _real_train(self, num_epochs, train_dataset, train_batch_size,
-                    eval_dataset, optimizer, save_interval_epochs,
-                    log_interval_steps, save_dir, pretrain_weights,
-                    learning_rate, warmup_steps, warmup_start_lr,
-                    lr_decay_epochs, lr_decay_gamma, metric, use_ema,
-                    early_stop, early_stop_patience, use_vdl, resume_checkpoint,
-                    scheduler, cosine_decay_num_epochs, clip_grad_by_norm,
-                    precision, amp_level, custom_white_list, custom_black_list):
+    def _real_train(
+            self, num_epochs, train_dataset, train_batch_size, eval_dataset,
+            optimizer, save_interval_epochs, log_interval_steps, save_dir,
+            pretrain_weights, learning_rate, warmup_steps, warmup_start_lr,
+            lr_decay_epochs, lr_decay_gamma, metric, use_ema, early_stop,
+            early_stop_patience, use_vdl, resume_checkpoint, scheduler,
+            cosine_decay_num_epochs, clip_grad_by_norm, reg_coeff, precision,
+            amp_level, custom_white_list, custom_black_list):
         self.precision = precision
         self.amp_level = amp_level
         self.custom_white_list = custom_white_list
@@ -366,14 +388,12 @@ class BaseDetector(BaseModel):
 
         self.labels = train_dataset.labels
         self.num_max_boxes = train_dataset.num_max_boxes
-        train_batch_transforms = self._default_batch_transforms(
-            'train') if train_dataset.batch_transforms is None else None
-        eval_batch_transforms = self._default_batch_transforms(
-            'eval') if eval_dataset.batch_transforms is None else None
+
+        train_batch_transforms = self._compose_batch_transforms(
+            'train', train_dataset.batch_transforms)
+
         train_dataset.build_collate_fn(train_batch_transforms,
                                        self._default_collate_fn)
-        eval_dataset.build_collate_fn(eval_batch_transforms,
-                                      self._default_collate_fn)
 
         # Build optimizer if not defined
         if optimizer is None:
@@ -389,7 +409,8 @@ class BaseDetector(BaseModel):
                 num_steps_each_epoch=num_steps_each_epoch,
                 num_epochs=num_epochs,
                 clip_grad_by_norm=clip_grad_by_norm,
-                cosine_decay_num_epochs=cosine_decay_num_epochs)
+                cosine_decay_num_epochs=cosine_decay_num_epochs,
+                reg_coeff=reg_coeff, )
         else:
             self.optimizer = optimizer
 
@@ -422,8 +443,8 @@ class BaseDetector(BaseModel):
             pretrain_weights=pretrain_weights,
             save_dir=pretrained_dir,
             resume_checkpoint=resume_checkpoint,
-            is_backbone_weights=(pretrain_weights == 'IMAGENET' and
-                                 'ESNet_' in self.backbone_name))
+            is_backbone_weights=pretrain_weights == 'IMAGENET' and
+            self._is_backbone_weight())
 
         if use_ema:
             ema = ModelEMA(model=self.net, decay=.9998, use_thres_step=True)
@@ -454,6 +475,27 @@ class BaseDetector(BaseModel):
     def _default_batch_transforms(self, mode):
         raise NotImplementedError
 
+    def _filter_batch_transforms(self, defaults, targets):
+        # TODO: Warning message
+        if targets is None:
+            return defaults
+        target_types = [type(i) for i in targets]
+        filtered = [i for i in defaults if type(i) not in target_types]
+        return filtered
+
+    def _compose_batch_transforms(self, mode, batch_transforms):
+        defaults = self._default_batch_transforms(mode)
+        out = []
+        if isinstance(batch_transforms, BatchCompose):
+            batch_transforms = batch_transforms.batch_transforms
+        if batch_transforms is not None:
+            out.extend(batch_transforms)
+        filtered = self._filter_batch_transforms(defaults.batch_transforms,
+                                                 batch_transforms)
+        out.extend(filtered)
+
+        return BatchCompose(out, collate_batch=defaults.collate_batch)
+
     def quant_aware_train(self,
                           num_epochs,
                           train_dataset,
@@ -580,9 +622,12 @@ class BaseDetector(BaseModel):
                 "Evaluation metric {} is not supported. Please choose from 'COCO' and 'VOC'."
 
         eval_dataset.data_fields = self.data_fields[self.metric]
-        eval_batch_transforms = self._default_batch_transforms(
-            'eval') if eval_dataset.batch_transforms is None else None
-        eval_dataset._build_collate_fn(eval_batch_transforms)
+
+        eval_batch_transforms = self._compose_batch_transforms(
+            'eval', eval_dataset.batch_transforms)
+        eval_dataset.build_collate_fn(eval_batch_transforms,
+                                      self._default_collate_fn)
+
         self._check_transforms(eval_dataset.transforms)
 
         self.net.eval()
@@ -791,6 +836,9 @@ class BaseDetector(BaseModel):
 
 
 class PicoDet(BaseDetector):
+    supported_backbones = ('ESNet_s', 'ESNet_m', 'ESNet_l', 'LCNet',
+                           'MobileNetV3', 'ResNet18_vd')
+
     def __init__(self,
                  num_classes=80,
                  backbone='ESNet_m',
@@ -800,14 +848,8 @@ class PicoDet(BaseDetector):
                  nms_iou_threshold=.6,
                  **params):
         self.init_params = locals()
-        if backbone not in {
-                'ESNet_s', 'ESNet_m', 'ESNet_l', 'LCNet', 'MobileNetV3',
-                'ResNet18_vd'
-        }:
-            raise ValueError(
-                "backbone: {} is not supported. Please choose one of "
-                "{'ESNet_s', 'ESNet_m', 'ESNet_l', 'LCNet', 'MobileNetV3', 'ResNet18_vd'}.".
-                format(backbone))
+        self._check_backbone(backbone)
+
         self.backbone_name = backbone
         if params.get('with_net', True):
             kwargs = {}
@@ -1017,9 +1059,12 @@ class PicoDet(BaseDetector):
                 dataset, batch_size, mode, collate_fn)
 
 
-class _YOLOv3(BaseDetector):
+class YOLOv3(BaseDetector):
+    supported_backbones = ('MobileNetV1', 'MobileNetV1_ssld', 'MobileNetV3',
+                           'MobileNetV3_ssld', 'DarkNet53', 'ResNet50_vd_dcn',
+                           'ResNet34')
+
     def __init__(self,
-                 rotate=False,
                  num_classes=80,
                  backbone='MobileNetV1',
                  post_process=None,
@@ -1035,16 +1080,7 @@ class _YOLOv3(BaseDetector):
                  label_smooth=False,
                  **params):
         self.init_params = locals()
-        if backbone not in {
-                'MobileNetV1', 'MobileNetV1_ssld', 'MobileNetV3',
-                'MobileNetV3_ssld', 'DarkNet53', 'ResNet50_vd_dcn', 'ResNet34',
-                'ResNeXt50_32x4d'
-        }:
-            raise ValueError(
-                "backbone: {} is not supported. Please choose one of "
-                "{'MobileNetV1', 'MobileNetV1_ssld', 'MobileNetV3', 'MobileNetV3_ssld', 'DarkNet53', "
-                "'ResNet50_vd_dcn', 'ResNet34', 'ResNeXt50_32x4d'}.".format(
-                    backbone))
+        self._check_backbone(backbone)
 
         self.backbone_name = backbone
         if params.get('with_net', True):
@@ -1058,6 +1094,7 @@ class _YOLOv3(BaseDetector):
             kwargs['norm_type'] = norm_type
 
             if 'MobileNetV3' in backbone:
+                backbone = 'MobileNetV3'
                 kwargs['feature_maps'] = [7, 13, 16]
             elif backbone == 'ResNet50_vd_dcn':
                 kwargs.update(
@@ -1071,14 +1108,10 @@ class _YOLOv3(BaseDetector):
                 kwargs.update(
                     dict(
                         return_idx=[1, 2, 3], freeze_at=-1, freeze_norm=False))
-            elif backbone == 'ResNeXt50_32x4d':
-                backbone = 'ResNet50'
-                kwargs.update(
-                    dict(
-                        return_idx=[1, 2, 3],
-                        base_width=4,
-                        groups=32,
-                        freeze_norm=False))
+            elif backbone == 'DarkNet53':
+                backbone = 'DarkNet'
+            elif 'MobileNet' in backbone:
+                backbone = 'MobileNet'
 
             backbone = self._get_backbone(backbone, **kwargs)
             nms = ppdet.modeling.MultiClassNMS(
@@ -1087,62 +1120,36 @@ class _YOLOv3(BaseDetector):
                 keep_top_k=nms_keep_topk,
                 nms_threshold=nms_iou_threshold,
                 normalized=nms_normalized)
-            if rotate:
-                neck = ppdet.modeling.FPN(
-                    in_channels=[i.channels for i in backbone.out_shape],
-                    out_channel=256,
-                    has_extra_convs=True,
-                    use_c5=False,
-                    relu_before_extra_convs=True)
-                assigner = ppdet.modeling.FCOSRAssigner(
-                    num_classes=num_classes,
-                    factor=12,
-                    threshold=0.23,
-                    boundary=[[-1, 64], [64, 128], [128, 256], [256, 512],
-                              [512, 100000000.0]])
-                yolo_head = ppdet.modeling.FCOSRHead(
-                    num_classes=num_classes,
-                    in_channels=[i.channels for i in neck.out_shape],
-                    feat_channels=256,
-                    fpn_strides=[8, 16, 32, 64, 128],
-                    stacked_convs=4,
-                    loss_weight={'class': 1.,
-                                 'probiou': 1.},
-                    assigner=assigner,
-                    nms=nms)
-                post_process = None
-            else:
-                neck = ppdet.modeling.YOLOv3FPN(
-                    norm_type=norm_type,
-                    in_channels=[i.channels for i in backbone.out_shape])
-                loss = ppdet.modeling.YOLOv3Loss(
-                    num_classes=num_classes,
-                    ignore_thresh=ignore_threshold,
-                    label_smooth=label_smooth)
-                yolo_head = ppdet.modeling.YOLOv3Head(
-                    in_channels=[i.channels for i in neck.out_shape],
-                    anchors=anchors,
-                    anchor_masks=anchor_masks,
-                    num_classes=num_classes,
-                    loss=loss)
-                post_process = ppdet.modeling.BBoxPostProcess(
-                    decode=ppdet.modeling.YOLOBox(num_classes=num_classes),
-                    nms=nms)
-                post_process = ppdet.modeling.BBoxPostProcess(
-                    decode=ppdet.modeling.YOLOBox(num_classes=num_classes),
-                    nms=ppdet.modeling.MultiClassNMS(
-                        score_threshold=nms_score_threshold,
-                        nms_top_k=nms_topk,
-                        keep_top_k=nms_keep_topk,
-                        nms_threshold=nms_iou_threshold,
-                        normalized=nms_normalized))
+            neck = ppdet.modeling.YOLOv3FPN(
+                norm_type=norm_type,
+                in_channels=[i.channels for i in backbone.out_shape])
+            loss = ppdet.modeling.YOLOv3Loss(
+                num_classes=num_classes,
+                ignore_thresh=ignore_threshold,
+                label_smooth=label_smooth)
+            yolo_head = ppdet.modeling.YOLOv3Head(
+                in_channels=[i.channels for i in neck.out_shape],
+                anchors=anchors,
+                anchor_masks=anchor_masks,
+                num_classes=num_classes,
+                loss=loss)
+            post_process = ppdet.modeling.BBoxPostProcess(
+                decode=ppdet.modeling.YOLOBox(num_classes=num_classes), nms=nms)
+            post_process = ppdet.modeling.BBoxPostProcess(
+                decode=ppdet.modeling.YOLOBox(num_classes=num_classes),
+                nms=ppdet.modeling.MultiClassNMS(
+                    score_threshold=nms_score_threshold,
+                    nms_top_k=nms_topk,
+                    keep_top_k=nms_keep_topk,
+                    nms_threshold=nms_iou_threshold,
+                    normalized=nms_normalized))
             params.update({
                 'backbone': backbone,
                 'neck': neck,
                 'yolo_head': yolo_head,
                 'post_process': post_process
             })
-        super(_YOLOv3, self).__init__(
+        super(YOLOv3, self).__init__(
             model_name='YOLOv3', num_classes=num_classes, **params)
         self.anchors = anchors
         self.anchor_masks = anchor_masks
@@ -1196,6 +1203,10 @@ class _YOLOv3(BaseDetector):
 
 
 class FasterRCNN(BaseDetector):
+    supported_backbones = ('ResNet50', 'ResNet50_vd', 'ResNet50_vd_ssld',
+                           'ResNet34', 'ResNet34_vd', 'ResNet101',
+                           'ResNet101_vd', 'HRNet_W18')
+
     def __init__(self,
                  num_classes=80,
                  backbone='ResNet50',
@@ -1213,26 +1224,23 @@ class FasterRCNN(BaseDetector):
                  test_post_nms_top_n=1000,
                  **params):
         self.init_params = locals()
-        if backbone not in {
-                'ResNet50', 'ResNet50_vd', 'ResNet50_vd_ssld', 'ResNet34',
-                'ResNet34_vd', 'ResNet101', 'ResNet101_vd', 'HRNet_W18'
-        }:
-            raise ValueError(
-                "backbone: {} is not supported. Please choose one of "
-                "{'ResNet50', 'ResNet50_vd', 'ResNet50_vd_ssld', 'ResNet34', 'ResNet34_vd', "
-                "'ResNet101', 'ResNet101_vd', 'HRNet_W18'}.".format(backbone))
+        self._check_backbone(backbone)
+
         self.backbone_name = backbone
 
         if params.get('with_net', True):
             dcn_v2_stages = [1, 2, 3] if with_dcn else [-1]
             kwargs = {}
-            kwargs['dcn_v2_stages'] = dcn_v2_stages
             if backbone == 'HRNet_W18':
                 if not with_fpn:
                     logging.warning(
                         "Backbone {} should be used along with fpn enabled, 'with_fpn' is forcibly set to True".
                         format(backbone))
                     with_fpn = True
+                kwargs.update(
+                    dict(
+                        width=18, freeze_at=0, return_idx=[0, 1, 2, 3]))
+                backbone = 'HRNet'
                 if with_dcn:
                     logging.warning(
                         "Backbone {} should be used along with dcn disabled, 'with_dcn' is forcibly set to False".
@@ -1244,13 +1252,13 @@ class FasterRCNN(BaseDetector):
                         format(backbone))
                     with_fpn = True
                 kwargs['lr_mult_list'] = [0.05, 0.05, 0.1, 0.15]
+                kwargs['dcn_v2_stages'] = dcn_v2_stages
             elif 'ResNet50' in backbone:
                 if not with_fpn and with_dcn:
                     logging.warning(
                         "Backbone {} without fpn should be used along with dcn disabled, 'with_dcn' is forcibly set to False".
                         format(backbone))
-                kwargs.update(dict(return_idx=[2], num_stages=3))
-                kwargs.pop('dcn_v2_stages')
+                    kwargs.update(dict(return_idx=[2], num_stages=3))
             elif 'ResNet34' in backbone:
                 if not with_fpn:
                     logging.warning(
@@ -1455,7 +1463,10 @@ class FasterRCNN(BaseDetector):
         return self._define_input_spec(image_shape)
 
 
-class PPYOLO(_YOLOv3):
+class PPYOLO(YOLOv3):
+    supported_backbones = ('ResNet50_vd_dcn', 'ResNet18_vd',
+                           'MobileNetV3_large', 'MobileNetV3_small')
+
     def __init__(self,
                  num_classes=80,
                  backbone='ResNet50_vd_dcn',
@@ -1476,14 +1487,8 @@ class PPYOLO(_YOLOv3):
                  nms_iou_threshold=0.45,
                  **params):
         self.init_params = locals()
-        if backbone not in {
-                'ResNet50_vd_dcn', 'ResNet18_vd', 'MobileNetV3_large',
-                'MobileNetV3_small'
-        }:
-            raise ValueError(
-                "backbone: {} is not supported. Please choose one of "
-                "{'ResNet50_vd_dcn', 'ResNet18_vd', 'MobileNetV3_large', 'MobileNetV3_small'}.".
-                format(backbone))
+        self._check_backbone(backbone)
+
         self.backbone_name = backbone
         self.downsample_ratios = [
             32, 16, 8
@@ -1514,7 +1519,7 @@ class PPYOLO(_YOLOv3):
 
             if backbone == 'ResNet50_vd_dcn':
                 backbone = self._get_backbone(
-                    'ResNet',
+                    backbone,
                     variant='d',
                     norm_type=norm_type,
                     return_idx=[1, 2, 3],
@@ -1525,7 +1530,7 @@ class PPYOLO(_YOLOv3):
 
             elif backbone == 'ResNet18_vd':
                 backbone = self._get_backbone(
-                    'ResNet',
+                    backbone,
                     depth=18,
                     variant='d',
                     norm_type=norm_type,
@@ -1614,7 +1619,8 @@ class PPYOLO(_YOLOv3):
                 'post_process': post_process
             })
 
-        super(PPYOLO, self).__init__(
+        # NOTE: call BaseDetector.__init__ instead of YOLOv3.__init__
+        super(YOLOv3, self).__init__(
             model_name='YOLOv3', num_classes=num_classes, **params)
         self.anchors = anchors
         self.anchor_masks = anchor_masks
@@ -1643,7 +1649,9 @@ class PPYOLO(_YOLOv3):
         return self._define_input_spec(image_shape)
 
 
-class PPYOLOTiny(_YOLOv3):
+class PPYOLOTiny(YOLOv3):
+    supported_backbones = ('MobileNetV3', )
+
     def __init__(self,
                  num_classes=80,
                  backbone='MobileNetV3',
@@ -1668,6 +1676,7 @@ class PPYOLOTiny(_YOLOv3):
             logging.warning("PPYOLOTiny only supports MobileNetV3 as backbone. "
                             "Backbone is forcibly set to MobileNetV3.")
         self.backbone_name = 'MobileNetV3'
+
         self.downsample_ratios = [32, 16, 8]
         if params.get('with_net', True):
             if paddlers.env_info['place'] == 'gpu' and paddlers.env_info[
@@ -1741,7 +1750,8 @@ class PPYOLOTiny(_YOLOv3):
                 'post_process': post_process
             })
 
-        super(PPYOLOTiny, self).__init__(
+        # NOTE: call BaseDetector.__init__ instead of YOLOv3.__init__
+        super(YOLOv3, self).__init__(
             model_name='YOLOv3', num_classes=num_classes, **params)
         self.anchors = anchors
         self.anchor_masks = anchor_masks
@@ -1771,7 +1781,9 @@ class PPYOLOTiny(_YOLOv3):
         return self._define_input_spec(image_shape)
 
 
-class PPYOLOv2(_YOLOv3):
+class PPYOLOv2(YOLOv3):
+    supported_backbones = ('ResNet50_vd_dcn', 'ResNet101_vd_dcn')
+
     def __init__(self,
                  num_classes=80,
                  backbone='ResNet50_vd_dcn',
@@ -1792,10 +1804,7 @@ class PPYOLOv2(_YOLOv3):
                  nms_iou_threshold=0.45,
                  **params):
         self.init_params = locals()
-        if backbone not in {'ResNet50_vd_dcn', 'ResNet101_vd_dcn'}:
-            raise ValueError(
-                "backbone: {} is not supported. Please choose one of "
-                "{'ResNet50_vd_dcn', 'ResNet101_vd_dcn'}.".format(backbone))
+        self._check_backbone(backbone)
         self.backbone_name = backbone
         self.downsample_ratios = [32, 16, 8]
 
@@ -1808,8 +1817,7 @@ class PPYOLOv2(_YOLOv3):
 
             if backbone == 'ResNet50_vd_dcn':
                 backbone = self._get_backbone(
-                    'ResNet',
-                    variant='d',
+                    backbone,
                     norm_type=norm_type,
                     return_idx=[1, 2, 3],
                     dcn_v2_stages=[3],
@@ -1819,9 +1827,7 @@ class PPYOLOv2(_YOLOv3):
 
             elif backbone == 'ResNet101_vd_dcn':
                 backbone = self._get_backbone(
-                    'ResNet',
-                    depth=101,
-                    variant='d',
+                    backbone,
                     norm_type=norm_type,
                     return_idx=[1, 2, 3],
                     dcn_v2_stages=[3],
@@ -1888,7 +1894,8 @@ class PPYOLOv2(_YOLOv3):
                 'post_process': post_process
             })
 
-        super(PPYOLOv2, self).__init__(
+        # NOTE: call BaseDetector.__init__ instead of YOLOv3.__init__
+        super(YOLOv3, self).__init__(
             model_name='YOLOv3', num_classes=num_classes, **params)
         self.anchors = anchors
         self.anchor_masks = anchor_masks
@@ -1919,6 +1926,9 @@ class PPYOLOv2(_YOLOv3):
 
 
 class MaskRCNN(BaseDetector):
+    supported_backbones = ('ResNet50', 'ResNet50_vd', 'ResNet50_vd_ssld',
+                           'ResNet101', 'ResNet101_vd')
+
     def __init__(self,
                  num_classes=80,
                  backbone='ResNet50_vd',
@@ -1936,14 +1946,7 @@ class MaskRCNN(BaseDetector):
                  test_post_nms_top_n=1000,
                  **params):
         self.init_params = locals()
-        if backbone not in {
-                'ResNet50', 'ResNet50_vd', 'ResNet50_vd_ssld', 'ResNet101',
-                'ResNet101_vd'
-        }:
-            raise ValueError(
-                "backbone: {} is not supported. Please choose one of "
-                "{'ResNet50', 'ResNet50_vd', 'ResNet50_vd_ssld', 'ResNet101', 'ResNet101_vd'}.".
-                format(backbone))
+        self._check_backbone(backbone)
 
         self.backbone_name = backbone + '_fpn' if with_fpn else backbone
         dcn_v2_stages = [1, 2, 3] if with_dcn else [-1]
@@ -2187,5 +2190,212 @@ class MaskRCNN(BaseDetector):
         return self._define_input_spec(image_shape)
 
 
-YOLOv3 = functools.partial(_YOLOv3, rotate=False)
-FCOSR = functools.partial(_YOLOv3, rotate=True)
+class FCOSR(YOLOv3):
+    supported_backbones = {'ResNeXt50_32x4d'}
+
+    def __init__(self,
+                 num_classes=80,
+                 backbone='ResNeXt50_32x4d',
+                 post_process=None,
+                 anchors=[[10, 13], [16, 30], [33, 23], [30, 61], [62, 45],
+                          [59, 119], [116, 90], [156, 198], [373, 326]],
+                 anchor_masks=[[6, 7, 8], [3, 4, 5], [0, 1, 2]],
+                 nms_score_threshold=0.01,
+                 nms_topk=1000,
+                 nms_keep_topk=100,
+                 nms_iou_threshold=0.45,
+                 nms_normalized=True,
+                 **params):
+        self.init_params = locals()
+        self._check_backbone(backbone)
+
+        self.backbone_name = backbone
+        if params.get('with_net', True):
+            if paddlers.env_info['place'] == 'gpu' and paddlers.env_info[
+                    'num'] > 1 and not os.environ.get('PADDLERS_EXPORT_STAGE'):
+                norm_type = 'sync_bn'
+            else:
+                norm_type = 'bn'
+
+            kwargs = {}
+            kwargs['norm_type'] = norm_type
+
+            backbone = 'ResNet50'
+            kwargs.update(
+                dict(
+                    return_idx=[1, 2, 3],
+                    base_width=4,
+                    groups=32,
+                    freeze_norm=False))
+
+            backbone = self._get_backbone(backbone, **kwargs)
+            nms = ppdet.modeling.MultiClassNMS(
+                score_threshold=nms_score_threshold,
+                nms_top_k=nms_topk,
+                keep_top_k=nms_keep_topk,
+                nms_threshold=nms_iou_threshold,
+                normalized=nms_normalized)
+            neck = ppdet.modeling.FPN(
+                in_channels=[i.channels for i in backbone.out_shape],
+                out_channel=256,
+                has_extra_convs=True,
+                use_c5=False,
+                relu_before_extra_convs=True)
+            assigner = ppdet.modeling.FCOSRAssigner(
+                num_classes=num_classes,
+                factor=12,
+                threshold=0.23,
+                boundary=[[-1, 64], [64, 128], [128, 256], [256, 512],
+                          [512, 100000000.0]])
+            yolo_head = ppdet.modeling.FCOSRHead(
+                num_classes=num_classes,
+                in_channels=[i.channels for i in neck.out_shape],
+                feat_channels=256,
+                fpn_strides=[8, 16, 32, 64, 128],
+                stacked_convs=4,
+                loss_weight={'class': 1.,
+                             'probiou': 1.},
+                assigner=assigner,
+                nms=nms)
+            post_process = None
+            params.update({
+                'backbone': backbone,
+                'neck': neck,
+                'yolo_head': yolo_head,
+                'post_process': post_process
+            })
+        # NOTE: call BaseDetector.__init__ instead of YOLOv3.__init__
+        super(YOLOv3, self).__init__(
+            model_name='YOLOv3', num_classes=num_classes, **params)
+        self.model_name = 'FCOSR'
+        self.anchors = anchors
+        self.anchor_masks = anchor_masks
+
+    def _default_batch_transforms(self, mode='train'):
+        if mode == 'train':
+            batch_transforms = [BatchPadRGT(), _BatchPad(pad_to_stride=32)]
+        else:
+            batch_transforms = [_BatchPad(pad_to_stride=32)]
+
+        if mode == 'eval' and self.metric == 'voc':
+            collate_batch = False
+        else:
+            collate_batch = True
+
+        batch_transforms = BatchCompose(
+            batch_transforms, collate_batch=collate_batch)
+
+        return batch_transforms
+
+
+class PPYOLOE_R(YOLOv3):
+    supported_backbones = ('CSPResNet_m', 'CSPResNet_l', 'CSPResNet_s',
+                           'CSPResNet_x')
+
+    def __init__(self,
+                 num_classes=80,
+                 backbone='CSPResNet_l',
+                 post_process=None,
+                 anchors=[[10, 13], [16, 30], [33, 23], [30, 61], [62, 45],
+                          [59, 119], [116, 90], [156, 198], [373, 326]],
+                 anchor_masks=[[6, 7, 8], [3, 4, 5], [0, 1, 2]],
+                 nms_score_threshold=0.01,
+                 nms_topk=1000,
+                 nms_keep_topk=100,
+                 nms_iou_threshold=0.45,
+                 nms_normalized=True,
+                 **params):
+        self.init_params = locals()
+        self._check_backbone(backbone)
+
+        self.backbone_name = backbone
+        if params.get('with_net', True):
+            if paddlers.env_info['place'] == 'gpu' and paddlers.env_info[
+                    'num'] > 1 and not os.environ.get('PADDLERS_EXPORT_STAGE'):
+                norm_type = 'sync_bn'
+            else:
+                norm_type = 'bn'
+
+            kwargs = {}
+            kwargs['norm_type'] = norm_type
+            kwargs.update(
+                dict(
+                    layers=[3, 6, 6, 3],
+                    channels=[64, 128, 256, 512, 1024],
+                    return_idx=[1, 2, 3],
+                    use_large_stem=True,
+                    use_alpha=True))
+            if backbone == 'CSPResNet_l':
+                kwargs.update(dict(depth_mult=1.0, width_mult=1.0))
+            elif backbone == 'CSPResNet_m':
+                kwargs.update(dict(depth_mult=0.67, width_mult=0.75))
+            elif backbone == 'CSPResNet_s':
+                kwargs.update(dict(depth_mult=0.33, width_mult=0.5))
+            elif backbone == 'CSPResNet_x':
+                kwargs.update(dict(depth_mult=1.33, width_mult=1.25))
+            backbone = 'CSPResNet'
+
+            backbone = self._get_backbone(backbone, **kwargs)
+            nms = ppdet.modeling.MultiClassNMS(
+                score_threshold=nms_score_threshold,
+                nms_top_k=nms_topk,
+                keep_top_k=nms_keep_topk,
+                nms_threshold=nms_iou_threshold,
+                normalized=nms_normalized)
+            neck = ppdet.modeling.CustomCSPPAN(
+                in_channels=[i.channels for i in backbone.out_shape],
+                out_channels=[768, 384, 192],
+                stage_num=1,
+                block_num=3,
+                act='swish',
+                spp=True,
+                use_alpha=True)
+            static_assigner = ppdet.modeling.FCOSRAssigner(
+                num_classes=num_classes,
+                factor=12,
+                threshold=0.23,
+                boundary=[[512, 10000], [256, 512], [-1, 256]])
+            assigner = ppdet.modeling.RotatedTaskAlignedAssigner(
+                topk=13,
+                alpha=1.0,
+                beta=6.0, )
+            yolo_head = ppdet.modeling.PPYOLOERHead(
+                num_classes=num_classes,
+                in_channels=[i.channels for i in neck.out_shape],
+                fpn_strides=[32, 16, 8],
+                grid_cell_offset=0.5,
+                use_varifocal_loss=True,
+                loss_weight={'class': 1.,
+                             'iou': 2.5,
+                             'dfl': 0.05},
+                static_assigner=static_assigner,
+                assigner=assigner,
+                nms=nms)
+            params.update({
+                'backbone': backbone,
+                'neck': neck,
+                'yolo_head': yolo_head,
+                'post_process': post_process
+            })
+        # NOTE: call BaseDetector.__init__ instead of YOLOv3.__init__
+        super(YOLOv3, self).__init__(
+            model_name='YOLOv3', num_classes=num_classes, **params)
+        self.model_name = "PPYOLOE_R"
+        self.anchors = anchors
+        self.anchor_masks = anchor_masks
+
+    def _default_batch_transforms(self, mode='train'):
+        if mode == 'train':
+            batch_transforms = [BatchPadRGT(), _BatchPad(pad_to_stride=32)]
+        else:
+            batch_transforms = [_BatchPad(pad_to_stride=32)]
+
+        if mode == 'eval' and self.metric == 'voc':
+            collate_batch = False
+        else:
+            collate_batch = True
+
+        batch_transforms = BatchCompose(
+            batch_transforms, collate_batch=collate_batch)
+
+        return batch_transforms

+ 5 - 3
paddlers/transforms/functions.py

@@ -427,23 +427,25 @@ def to_uint8(im, norm=True, stretch=False):
         if len(image.shape) == 3:
             for b in range(image.shape[-1]):
                 stretched = exposure.equalize_hist(image[:, :, b])
+                assert np.min(stretched) >= 0
                 stretched /= float(np.max(stretched)) + EPS
                 stretches.append(stretched)
             stretched_img = np.stack(stretches, axis=2)
         else:  # if len(image.shape) == 2
             stretched_img = exposure.equalize_hist(image)
+            assert np.min(stretched_img) >= 0
             stretched_img /= float(np.max(stretched_img)) + EPS
         return stretched_img
 
     dtype = im.dtype.name
-    if dtype == 'uint8' and not stretch:
+    if dtype == 'uint8':
         return im
     if stretch:
         im = _two_percent_linear(im)
-    else:
-        im = _minmax_norm(im)
     if norm:
         im = _equalize_hist(im)
+    if not norm and not stretch:
+        im = _minmax_norm(im)
     im = np.uint8(im * 255)
     return im
 

+ 17 - 4
paddlers/utils/checkpoint.py

@@ -53,7 +53,6 @@ det_pretrain_weights_dict = {
     'YOLOv3_MobileNetV1_ssld': ['COCO', 'PascalVOC', 'IMAGENET'],
     'YOLOv3_DarkNet53': ['COCO', 'IMAGENET'],
     'YOLOv3_ResNet50_vd_dcn': ['COCO', 'IMAGENET'],
-    'YOLOv3_ResNeXt50_32x4d': ['IMAGENET'],
     'YOLOv3_ResNet34': ['COCO', 'IMAGENET'],
     'YOLOv3_MobileNetV3': ['COCO', 'PascalVOC', 'IMAGENET'],
     'YOLOv3_MobileNetV3_ssld': ['PascalVOC', 'IMAGENET'],
@@ -79,7 +78,12 @@ det_pretrain_weights_dict = {
     'MaskRCNN_ResNet50_vd_fpn': ['COCO', 'IMAGENET'],
     'MaskRCNN_ResNet50_vd_ssld_fpn': ['COCO', 'IMAGENET'],
     'MaskRCNN_ResNet101_fpn': ['COCO', 'IMAGENET'],
-    'MaskRCNN_ResNet101_vd_fpn': ['COCO', 'IMAGENET']
+    'MaskRCNN_ResNet101_vd_fpn': ['COCO', 'IMAGENET'],
+    'FCOSR_ResNeXt50_32x4d': ['IMAGENET'],
+    'PPYOLOE_R_CSPResNet_l': ['IMAGENET'],
+    'PPYOLOE_R_CSPResNet_m': ['IMAGENET'],
+    'PPYOLOE_R_CSPResNet_s': ['IMAGENET'],
+    'PPYOLOE_R_CSPResNet_x': ['IMAGENET']
 }
 
 res_pretrain_weights_dict = {}
@@ -286,8 +290,6 @@ imagenet_weights = {
     'https://paddledet.bj.bcebos.com/models/pretrained/HRNet_W18_C_pretrained.pdparams',
     'YOLOv3_ResNet50_vd_dcn_IMAGENET':
     'https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_pretrained.pdparams',
-    'YOLOv3_ResNeXt50_32x4d_IMAGENET':
-    'https://paddledet.bj.bcebos.com/models/pretrained/ResNeXt50_32x4d_pretrained.pdparams',
     'YOLOv3_ResNet34_IMAGENET':
     'https://paddledet.bj.bcebos.com/models/pretrained/ResNet34_pretrained.pdparams',
     'YOLOv3_MobileNetV1_IMAGENET':
@@ -338,6 +340,16 @@ imagenet_weights = {
     'https://bj.bcebos.com/paddleseg/dygraph/hrnet_w18_ssld.tar.gz',
     'C2FNet_HRNet_W48_IMAGENET':
     'https://bj.bcebos.com/paddleseg/dygraph/hrnet_w48_ssld.tar.gz',
+    'FCOSR_ResNeXt50_32x4d_IMAGENET':
+    'https://paddledet.bj.bcebos.com/models/pretrained/ResNeXt50_32x4d_pretrained.pdparams',
+    'PPYOLOE_R_CSPResNet_l_IMAGENET':
+    'https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_l_pretrained.pdparams',
+    'PPYOLOE_R_CSPResNet_m_IMAGENET':
+    'https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_m_pretrained.pdparams',
+    'PPYOLOE_R_CSPResNet_s_IMAGENET':
+    'https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_s_pretrained.pdparams',
+    'PPYOLOE_R_CSPResNet_x_IMAGENET':
+    'https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_x_pretrained.pdparams',
 }
 
 pascalvoc_weights = {
@@ -509,6 +521,7 @@ def load_pretrain_weights(model, pretrain_weights=None, model_name=None):
         if os.path.exists(pretrain_weights):
             param_state_dict = paddle.load(pretrain_weights)
             model_state_dict = model.state_dict()
+
             # HACK: Fit for faster rcnn. Pretrain weights contain prefix of 'backbone'
             # while res5 module is located in bbox_head.head. Replace the prefix of
             # res5 with 'bbox_head.head' to load pretrain weights correctly.

+ 6 - 5
test_tipc/configs/det/_base_/rsod.yaml

@@ -29,16 +29,17 @@ transforms:
       type: RandomCrop
     - !Node
       type: RandomHorizontalFlip
-    - !Node
-      type: BatchRandomResize
-      args:
-        target_sizes: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608]
-        interp: RANDOM
     - !Node
       type: Normalize
       args:
         mean: [0.485, 0.456, 0.406]
         std: [0.229, 0.224, 0.225]
+  train_batch:
+    - !Node
+      type: BatchRandomResize
+      args:
+        target_sizes: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608]
+        interp: RANDOM
   eval:
     - !Node
       type: DecodeImg

+ 6 - 5
test_tipc/configs/det/_base_/sarship.yaml

@@ -29,16 +29,17 @@ transforms:
       type: RandomCrop
     - !Node
       type: RandomHorizontalFlip
-    - !Node
-      type: BatchRandomResize
-      args:
-        target_sizes: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608]
-        interp: RANDOM
     - !Node
       type: Normalize
       args:
         mean: [0.485, 0.456, 0.406]
         std: [0.229, 0.224, 0.225]
+  train_batch:
+    - !Node
+      type: BatchRandomResize
+      args:
+        target_sizes: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608]
+        interp: RANDOM
   eval:
     - !Node
       type: DecodeImg

+ 21 - 0
test_tipc/run_task.py

@@ -63,6 +63,16 @@ if __name__ == '__main__':
     eval_transforms = T.Compose(build_objects(cfg['transforms']['eval'], mod=T))
     # Inplace modification
     cfg['datasets']['eval'].args['transforms'] = eval_transforms
+    if cfg['transforms'].get('eval_batch', None) is not None:
+        if cfg['datasets']['eval'].args.get('batch_transforms',
+                                            None) is not None:
+            raise ValueError(
+                "Found key 'batch_transforms' in args of eval dataset and the value is not None."
+            )
+        eval_batch_transforms = T.BatchCompose(
+            build_objects(
+                cfg['transforms']['eval_batch'], mod=T))
+        cfg['datasets']['eval'].args['batch_transforms'] = eval_batch_transforms
     eval_dataset = build_objects(cfg['datasets']['eval'], mod=paddlers.datasets)
 
     if cfg['cmd'] == 'train':
@@ -77,6 +87,17 @@ if __name__ == '__main__':
                 cfg['transforms']['train'], mod=T))
         # Inplace modification
         cfg['datasets']['train'].args['transforms'] = train_transforms
+        if cfg['transforms'].get('train_batch', None) is not None:
+            if cfg['datasets']['train'].args.get('batch_transforms',
+                                                 None) is not None:
+                raise ValueError(
+                    "Found key 'batch_transforms' in args of train dataset and the value is not None."
+                )
+            train_batch_transforms = T.BatchCompose(
+                build_objects(
+                    cfg['transforms']['train_batch'], mod=T))
+            cfg['datasets']['train'].args[
+                'batch_transforms'] = train_batch_transforms
         train_dataset = build_objects(
             cfg['datasets']['train'], mod=paddlers.datasets)
         model = build_objects(

+ 2 - 1
tutorials/train/object_detection/data/.gitignore

@@ -1,4 +1,5 @@
 *.path
 *.zip
 *.tar.gz
-sarship/
+sarship/
+dota/

+ 8 - 5
tutorials/train/object_detection/faster_rcnn.py

@@ -3,8 +3,6 @@
 # 目标检测模型Faster R-CNN训练示例脚本
 # 执行此脚本前,请确认已正确安装PaddleRS库
 
-import os
-
 import paddlers as pdrs
 from paddlers import transforms as T
 
@@ -31,14 +29,18 @@ train_transforms = [
     T.RandomCrop(),
     # 随机水平翻转
     T.RandomHorizontalFlip(),
-    # 对batch进行随机缩放,随机选择插值方式
-    T.BatchRandomResize(
-        target_sizes=[512, 544, 576, 608], interp='RANDOM'),
     # 影像归一化
     T.Normalize(
         mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
 ]
 
+# 定义作用在一个批次数据上的变换
+train_batch_transforms = [
+    # 对batch进行随机缩放,随机选择插值方式
+    T.BatchRandomResize(
+        target_sizes=[512, 544, 576, 608], interp='RANDOM'),
+]
+
 eval_transforms = [
     # 使用双三次插值将输入影像缩放到固定大小
     T.Resize(
@@ -54,6 +56,7 @@ train_dataset = pdrs.datasets.VOCDetDataset(
     file_list=TRAIN_FILE_LIST_PATH,
     label_list=LABEL_LIST_PATH,
     transforms=train_transforms,
+    batch_transforms=train_batch_transforms,
     shuffle=True)
 
 eval_dataset = pdrs.datasets.VOCDetDataset(

+ 19 - 27
tutorials/train/object_detection/fcosr.py

@@ -9,9 +9,9 @@ from paddlers import transforms as T
 # 数据集存放目录
 DATA_DIR = "./data/dota/"
 # 数据集标签文件路径
-ANNO_PATH = "./data/dota/DOTA_trainval1024.json"
+ANNO_PATH = "trainval1024/DOTA_trainval1024.json"
 # 数据集图像目录
-IMAGE_DIR = "./data/dota/images"
+IMAGE_DIR = "trainval1024/images"
 # 实验目录,保存输出的模型权重和结果
 EXP_DIR = "./output/fcosr/"
 
@@ -21,14 +21,14 @@ IMAGE_SIZE = [1024, 1024]
 pdrs.utils.download_and_decompress(
     "https://paddlers.bj.bcebos.com/datasets/dota.zip", path="./data/")
 
-# 对于旋转目标检测,我们需要安装ppdet的外部自定义算子,安装方式如下:
+# 对于旋转目标检测任务,需要安装自定义外部算子,安装方式如下:
 # cd paddlers/models/ppdet/ext_op
 # python setup.py install
 
 # 定义训练和验证时使用的数据变换(数据增强、预处理等)
 # 使用Compose组合多种变换方式。Compose中包含的变换将按顺序串行执行
 # API说明:https://github.com/PaddlePaddle/PaddleRS/blob/develop/docs/apis/data.md
-train_transforms = T.Compose([
+train_transforms = [
     # 读取影像
     T.DecodeImg(),
     # 将标签转换为numpy array
@@ -47,21 +47,15 @@ train_transforms = T.Compose([
     # 将标签转换为rotated box的格式
     T.Poly2RBox(
         filter_threshold=2, filter_mode='edge', rbox_type="oc"),
-])
+]
 
-# 定义作用在一个批次数据上的变换
-# 使用BatchCompose组合
-train_batch_transforms = T.BatchCompose([
+train_batch_transforms = [
     # 归一化图像
     T.BatchNormalizeImage(
         mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
-    # 用0填充标签
-    T.BatchPadRGT(),
-    # 填充图像
-    T._BatchPad(pad_to_stride=32)
-])
+]
 
-eval_transforms = T.Compose([
+eval_transforms = [
     T.DecodeImg(),
     # 将标签转换为numpy array
     T.Poly2Array(),
@@ -71,9 +65,7 @@ eval_transforms = T.Compose([
     # 归一化图像
     T.Normalize(
         mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
-])
-
-eval_batch_transforms = T.BatchCompose([T._BatchPad(pad_to_stride=32)])
+]
 
 # 分别构建训练和验证所用的数据集
 train_dataset = pdrs.datasets.COCODetDataset(
@@ -81,16 +73,15 @@ train_dataset = pdrs.datasets.COCODetDataset(
     image_dir=IMAGE_DIR,
     anno_path=ANNO_PATH,
     transforms=train_transforms,
-    shuffle=True,
-    batch_transforms=train_batch_transforms)
+    batch_transforms=train_batch_transforms,
+    shuffle=True)
 
 eval_dataset = pdrs.datasets.COCODetDataset(
     data_dir=DATA_DIR,
     image_dir=IMAGE_DIR,
     anno_path=ANNO_PATH,
     transforms=eval_transforms,
-    shuffle=False,
-    batch_transforms=eval_batch_transforms)
+    shuffle=False)
 
 # 构建FCOSR模型
 # 目前已支持的模型请参考:https://github.com/PaddlePaddle/PaddleRS/blob/develop/docs/intro/model_zoo.md
@@ -107,19 +98,20 @@ model = pdrs.tasks.det.FCOSR(
 model.train(
     num_epochs=36,
     train_dataset=train_dataset,
-    train_batch_size=2,
+    train_batch_size=4,
     eval_dataset=eval_dataset,
     # 每多少个epoch存储一次检查点
     save_interval_epochs=5,
     # 每多少次迭代记录一次日志
     log_interval_steps=4,
+    metric='rbox',
     save_dir=EXP_DIR,
-    # 初始学习率大小
-    learning_rate=0.001,
+    # 初始学习率大小,请根据此公式适当调整learning_rate:(train_batch_size * gpu_nums) / (4 * 4) * 0.01
+    learning_rate=0.01,
     # 学习率预热(learning rate warm-up)步数
-    warmup_steps=500,
+    warmup_steps=50,
     # 初始学习率大小
-    warmup_start_lr=0.03333333,
+    warmup_start_lr=0.03333333 * 0.01,
     # 学习率衰减的epoch节点
     lr_decay_epochs=[24, 33],
     # 学习率衰减的参数
@@ -127,6 +119,6 @@ model.train(
     # 梯度裁剪策略的参数
     clip_grad_by_norm=35.,
     # 指定预训练权重
-    pretrain_weights="COCO",
+    pretrain_weights="IMAGENET",
     # 是否启用VisualDL日志功能
     use_vdl=True)

+ 8 - 5
tutorials/train/object_detection/ppyolo.py

@@ -3,8 +3,6 @@
 # 目标检测模型PP-YOLO训练示例脚本
 # 执行此脚本前,请确认已正确安装PaddleRS库
 
-import os
-
 import paddlers as pdrs
 from paddlers import transforms as T
 
@@ -31,14 +29,18 @@ train_transforms = [
     T.RandomCrop(),
     # 随机水平翻转
     T.RandomHorizontalFlip(),
-    # 对batch进行随机缩放,随机选择插值方式
-    T.BatchRandomResize(
-        target_sizes=[512, 544, 576, 608], interp='RANDOM'),
     # 影像归一化
     T.Normalize(
         mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
 ]
 
+# 定义作用在一个批次数据上的变换
+train_batch_transforms = [
+    # 对batch进行随机缩放,随机选择插值方式
+    T.BatchRandomResize(
+        target_sizes=[512, 544, 576, 608], interp='RANDOM'),
+]
+
 eval_transforms = [
     # 使用双三次插值将输入影像缩放到固定大小
     T.Resize(
@@ -54,6 +56,7 @@ train_dataset = pdrs.datasets.VOCDetDataset(
     file_list=TRAIN_FILE_LIST_PATH,
     label_list=LABEL_LIST_PATH,
     transforms=train_transforms,
+    batch_transforms=train_batch_transforms,
     shuffle=True)
 
 eval_dataset = pdrs.datasets.VOCDetDataset(

+ 8 - 5
tutorials/train/object_detection/ppyolo_tiny.py

@@ -3,8 +3,6 @@
 # 目标检测模型PP-YOLO Tiny训练示例脚本
 # 执行此脚本前,请确认已正确安装PaddleRS库
 
-import os
-
 import paddlers as pdrs
 from paddlers import transforms as T
 
@@ -31,14 +29,18 @@ train_transforms = [
     T.RandomCrop(),
     # 随机水平翻转
     T.RandomHorizontalFlip(),
-    # 对batch进行随机缩放,随机选择插值方式
-    T.BatchRandomResize(
-        target_sizes=[512, 544, 576, 608], interp='RANDOM'),
     # 影像归一化
     T.Normalize(
         mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
 ]
 
+# 定义作用在一个批次数据上的变换
+train_batch_transforms = [
+    # 对batch进行随机缩放,随机选择插值方式
+    T.BatchRandomResize(
+        target_sizes=[512, 544, 576, 608], interp='RANDOM')
+]
+
 eval_transforms = [
     # 使用双三次插值将输入影像缩放到固定大小
     T.Resize(
@@ -54,6 +56,7 @@ train_dataset = pdrs.datasets.VOCDetDataset(
     file_list=TRAIN_FILE_LIST_PATH,
     label_list=LABEL_LIST_PATH,
     transforms=train_transforms,
+    batch_transforms=train_batch_transforms,
     shuffle=True)
 
 eval_dataset = pdrs.datasets.VOCDetDataset(

+ 133 - 0
tutorials/train/object_detection/ppyoloe_r.py

@@ -0,0 +1,133 @@
+#!/usr/bin/env python
+
+# 旋转目标检测模型PPYOLOE-R训练示例脚本
+# 执行此脚本前,请确认已正确安装PaddleRS库
+
+import paddlers as pdrs
+from paddlers import transforms as T
+
+# 数据集存放目录
+DATA_DIR = "./data/dota/"
+# 数据集标签文件路径
+ANNO_PATH = "trainval1024/DOTA_trainval1024.json"
+# 数据集图像目录
+IMAGE_DIR = "trainval1024/images"
+# 实验目录,保存输出的模型权重和结果
+EXP_DIR = "./output/ppyoloe_r/"
+
+IMAGE_SIZE = [1024, 1024]
+
+# 下载和解压SAR影像舰船检测数据集
+pdrs.utils.download_and_decompress(
+    "https://paddlers.bj.bcebos.com/datasets/dota.zip", path="./data/")
+
+# 对于旋转目标检测任务,需要安装自定义外部算子库,安装方式如下:
+# cd paddlers/models/ppdet/ext_op
+# python setup.py install
+
+# 定义训练和验证时使用的数据变换(数据增强、预处理等)
+# 使用Compose组合多种变换方式。Compose中包含的变换将按顺序串行执行
+# API说明:https://github.com/PaddlePaddle/PaddleRS/blob/develop/docs/apis/data.md
+train_transforms = [
+    # 读取影像
+    T.DecodeImg(),
+    # 将标签转换为numpy array
+    T.Poly2Array(),
+    # 随机水平翻转
+    T.RandomRFlip(),
+    # 随机旋转
+    T.RandomRRotate(
+        angle_mode='value', angle=[0, 90, 180, -90]),
+    # 随机旋转
+    T.RandomRRotate(
+        angle_mode='value', angle=[30, 60], rotate_prob=0.5),
+    # 随机缩放图片
+    T.RResize(
+        target_size=IMAGE_SIZE, keep_ratio=True, interp=2),
+    # 将标签转换为rotated box的格式
+    T.Poly2RBox(
+        filter_threshold=2, filter_mode='edge', rbox_type="oc"),
+]
+
+train_batch_transforms = [
+    # 归一化图像
+    T.BatchNormalizeImage(
+        mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
+]
+
+eval_transforms = [
+    T.DecodeImg(),
+    # 将标签转换为numpy array
+    T.Poly2Array(),
+    # 随机缩放图片
+    T.RResize(
+        target_size=IMAGE_SIZE, keep_ratio=True, interp=2),
+    # 归一化图像
+    T.Normalize(
+        mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
+]
+
+# 分别构建训练和验证所用的数据集
+train_dataset = pdrs.datasets.COCODetDataset(
+    data_dir=DATA_DIR,
+    image_dir=IMAGE_DIR,
+    anno_path=ANNO_PATH,
+    transforms=train_transforms,
+    batch_transforms=train_batch_transforms,
+    shuffle=True)
+
+eval_dataset = pdrs.datasets.COCODetDataset(
+    data_dir=DATA_DIR,
+    image_dir=IMAGE_DIR,
+    anno_path=ANNO_PATH,
+    transforms=eval_transforms,
+    shuffle=False)
+
+# 构建YOLOE-R模型
+# 使用如下方式查看PPYOLOE-R支持的backbone
+# print(pdrs.tasks.det.PPYOLOE_R.supported_backbones) 
+
+# 目前已支持的模型请参考:https://github.com/PaddlePaddle/PaddleRS/blob/develop/docs/intro/model_zoo.md
+model = pdrs.tasks.det.PPYOLOE_R(
+    backbone="CSPResNet_m",
+    num_classes=15,
+    nms_score_threshold=0.1,
+    nms_topk=2000,
+    nms_keep_topk=-1,
+    nms_normalized=False,
+    nms_iou_threshold=0.1)
+
+# 执行模型训练
+model.train(
+    num_epochs=36,
+    train_dataset=train_dataset,
+    train_batch_size=2,
+    eval_dataset=eval_dataset,
+    # 每多少个epoch存储一次检查点
+    save_interval_epochs=5,
+    # 每多少次迭代记录一次日志
+    log_interval_steps=4,
+    metric='rbox',
+    save_dir=EXP_DIR,
+    # 使用余弦退火学习率调度器
+    scheduler='Cosine',
+    # 学习率调度器的参数
+    cosine_decay_num_epochs=44,
+    # 初始学习率大小,请根据此公式适当调整learning_rate:(train_batch_size * gpu_nums) / (2 * 4) * 0.01
+    learning_rate=0.008,
+    # 学习率预热(learning rate warm-up)步数
+    warmup_steps=100,
+    # 初始学习率大小
+    warmup_start_lr=0.,
+    # 学习率衰减的epoch节点
+    lr_decay_epochs=[24, 33],
+    # 学习率衰减的参数
+    lr_decay_gamma=0.1,
+    # L2正则化系数
+    reg_coeff=0.0005,
+    # 梯度裁剪策略的参数
+    clip_grad_by_norm=35.,
+    # 指定预训练权重
+    pretrain_weights="IMAGENET",
+    # 是否启用VisualDL日志功能
+    use_vdl=True)

+ 8 - 5
tutorials/train/object_detection/ppyolov2.py

@@ -3,8 +3,6 @@
 # 目标检测模型PP-YOLOv2训练示例脚本
 # 执行此脚本前,请确认已正确安装PaddleRS库
 
-import os
-
 import paddlers as pdrs
 from paddlers import transforms as T
 
@@ -31,14 +29,18 @@ train_transforms = [
     T.RandomCrop(),
     # 随机水平翻转
     T.RandomHorizontalFlip(),
-    # 对batch进行随机缩放,随机选择插值方式
-    T.BatchRandomResize(
-        target_sizes=[512, 544, 576, 608], interp='RANDOM'),
     # 影像归一化
     T.Normalize(
         mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
 ]
 
+# 定义作用在一个批次数据上的变换
+train_batch_transforms = [
+    # 对batch进行随机缩放,随机选择插值方式
+    T.BatchRandomResize(
+        target_sizes=[512, 544, 576, 608], interp='RANDOM')
+]
+
 eval_transforms = [
     # 使用双三次插值将输入影像缩放到固定大小
     T.Resize(
@@ -54,6 +56,7 @@ train_dataset = pdrs.datasets.VOCDetDataset(
     file_list=TRAIN_FILE_LIST_PATH,
     label_list=LABEL_LIST_PATH,
     transforms=train_transforms,
+    batch_transforms=train_batch_transforms,
     shuffle=True)
 
 eval_dataset = pdrs.datasets.VOCDetDataset(

+ 8 - 5
tutorials/train/object_detection/yolov3.py

@@ -3,8 +3,6 @@
 # 目标检测模型YOLOv3训练示例脚本
 # 执行此脚本前,请确认已正确安装PaddleRS库
 
-import os
-
 import paddlers as pdrs
 from paddlers import transforms as T
 
@@ -31,14 +29,18 @@ train_transforms = [
     T.RandomCrop(),
     # 随机水平翻转
     T.RandomHorizontalFlip(),
-    # 对batch进行随机缩放,随机选择插值方式
-    T.BatchRandomResize(
-        target_sizes=[512, 544, 576, 608], interp='RANDOM'),
     # 影像归一化
     T.Normalize(
         mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
 ]
 
+# 定义作用在一个批次数据上的变换
+train_batch_transforms = [
+    # 对batch进行随机缩放,随机选择插值方式
+    T.BatchRandomResize(
+        target_sizes=[512, 544, 576, 608], interp='RANDOM'),
+]
+
 eval_transforms = [
     # 使用双三次插值将输入影像缩放到固定大小
     T.Resize(
@@ -54,6 +56,7 @@ train_dataset = pdrs.datasets.VOCDetDataset(
     file_list=TRAIN_FILE_LIST_PATH,
     label_list=LABEL_LIST_PATH,
     transforms=train_transforms,
+    batch_transforms=train_batch_transforms,
     shuffle=True)
 
 eval_dataset = pdrs.datasets.VOCDetDataset(