Browse Source

【Hackathon No.152】Refine API Documents (#115)

* Complete all documentation

* fix some error

* edit .gitignore

* delete .idea

* Create new Chinese and English documents and improve the quick start documents

* delete operators.py

* made some changes

* test

* fix some error

* fixed some Markdown format errors

* fix some error

* supplementary description of lq_loss_weight parameter

* test

* fix some error

* test

* test

* Update transforms_cons_params_cn.md

* Update transforms_cons_params_en.md

* Update model_cons_param_en.md

* Update model_cons_param_cn.md

* Update transforms_cons_params_en.md

* Update transforms_cons_params_cn.md

* Update transforms_cons_params_cn.md

* Update transforms_cons_params_en.md

* Update model_cons_param_cn.md

* Update model_cons_param_en.md

* fix error

* fix

* fix error

---------

Co-authored-by: ggggkkkknnnn <1066016493@qq.com>
ggggkkkknnnn 2 years ago
parent
commit
66f89f4d7f

+ 3 - 0
.gitignore

@@ -129,6 +129,9 @@ dmypy.json
 # myvscode
 .vscode
 
+# myidea
+.idea
+
 # Pyre type checker
 .pyre/
 

+ 1 - 1
README_CN.md

@@ -272,7 +272,7 @@ PaddleRS具有以下五大特色:
 ## <img src="./docs/images/teach.png" width="30"/> 教程与文档
 
 * 快速上手
-  * [快速上手PaddleRS](./tutorials/train/README.md)
+  * [快速上手PaddleRS](./docs/quick_start.md)
 * 数据准备
   * [快速了解遥感与遥感数据](./docs/data/rs_data.md)
   * [开源遥感数据集汇总表](./docs/data/dataset.md)

+ 1 - 1
README_EN.md

@@ -270,7 +270,7 @@ PaddleRS is an end-to-end high-efficent development toolkit for remote sensing a
 ## <img src="./docs/images/teach.png" width="30"/> Tutorials and Documents
 
 * Quick Start
-  * [Quick start](./tutorials/train/README.md)
+  * [Quick start](./docs/quick_start.md)
 * Data Preparation
   * [Open-source remote sensing datasets](./docs/data/dataset.md)
   * [Efficient interactive segmentation tool EISeg](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.7/EISeg)

+ 516 - 0
docs/intro/model_cons_param_cn.md

@@ -0,0 +1,516 @@
+# PaddleRS模型构造参数
+
+本文档详细介绍了PaddleRS各个模型训练器的构造参数,包括其参数名、参数类型、参数描述及默认值。
+
+## `BIT`
+
+基于PaddlePaddle实现的BIT模型。
+
+该模型的原始文章见于 H. Chen, et al., "Remote Sensing Image Change Detection With Transformers" (https://arxiv.org/abs/2103.00208)。
+
+该实现采用预训练编码器,而非原始工作中随机初始化权重。
+
+| 参数名               | 描述                                                                     | 默认值          |
+|-------------------|------------------------------------------------------------------------|--------------|
+| `in_channels (int)` | 输入图像的通道数                                                               | `3`          |
+| `num_classes (int)`  | 目标类别数量                                                                 | `2`           |
+| `use_mixed_loss (bool, optional)` | 是否使用混合损失函数                                                             | `False`      |
+| `losses (list, optional)` | 损失函数列表                                                                 | `None`       |
+| `att_type (str, optional)` | 空间注意力类型,可选值为`'CBAM'`和`'BAM'`                                           | `'CBAM'`     |
+| `ds_factor (int, optional)` | 下采样因子                                                                  | `1`          |
+| `backbone (str, optional)` | 用作主干网络的 ResNet 型号。目前仅支持`'resnet18'`和`'resnet34'`                       | `'resnet18'` |
+| `n_stages (int, optional)` | 主干网络中使用的 ResNet 阶段数,应为`{3、4、5}`中的值                                     | `4`          |
+| `use_tokenizer (bool, optional)` | 是否使用可学习的 tokenizer                                                     | `True`       |
+| `token_len (int, optional)` | 输入 token 的长度                                                           | `4`          |
+| `pool_mode (str, optional)` | 当`'use_tokenizer'`设置为`False`时,获取输入 token 的池化策略。`'max'`表示全局最大池化,`'avg'`表示全局平均池化 | `'max'`      |
+| `pool_size (int, optional)` | 当`'use_tokenizer'`设置为`False`时,池化后的特征图的高度和宽度                             | `2`          |
+| `enc_with_pos (bool, optional)` | 是否将学习的位置嵌入到编码器的输入特征序列中                                                 | `True`       |
+| `enc_depth (int, optional)` | 编码器中使用的注意力块数                                                           | `1`          |
+| `enc_head_dim (int, optional)` | 每个编码器头的嵌入维度                                                            | `64`         |
+| `dec_depth (int, optional)` | 解码器中使用的注意力模块数量                                                         | `8`          |
+| `dec_head_dim (int, optional)` | 每个解码器头的嵌入维度                                                            | `8`          |
+
+
+
+## `CDNet`
+
+该基于PaddlePaddle的CDNet实现。
+
+该模型的原始文章见于 Pablo F. Alcantarilla, et al., "Street-View Change Detection with Deconvolut ional Networks"(https://link.springer.com/article/10.1007/s10514-018-9734-5).
+
+| 参数名                     | 描述                              | 默认值     |
+|-------------------------| --------------------------------- | ---------- |
+| `num_classes (int)`     | 目标类别数量       | `2`        |
+| `use_mixed_loss (bool)` | 是否使用混合损失函数             | `False`    |
+| `losses (list)`         | 损失函数列表                      | `None`     |
+| `in_channels (int)`     | 输入图像的通道数                  | `6`        |
+
+
+## `ChangeFormer`
+
+基于PaddlePaddle的ChangeFormer实现。
+
+该模型的原始文章见于 Wele Gedara Chaminda Bandara,Vishal M. Patel,“A TRANSFORMER-BASED SIAMESE NETWORK FOR CHANGE DETECTION”(https://arxiv.org/pdf/2201.01293.pdf)。
+
+| 参数名                         | 描述                        | 默认值       |
+|--------------------------------|---------------------------|--------------|
+| `num_classes (int)`            | 目标类别数量                    | `2`          |
+| `use_mixed_loss (bool)`        | 是否使用混合损失函数                | `False`      |
+| `losses (list)`                | 损失函数列表                    | `None`       |
+| `in_channels (int)`            | 输入图像的通道数                  | `3`          |
+| `decoder_softmax (bool)`       | 是否使用softmax作为解码器的最后一层激活函数 | `False`      |
+| `embed_dim (int)`              | Transformer 编码器的隐藏层维度     | `256`        |
+
+
+## `ChangeStar_FarSeg`
+
+基于PaddlePaddle实现的ChangeStar模型,其使用FarSeg编码器。
+
+该模型的原始文章见于 Z. Zheng, et al., "Change is Everywhere: Single-Temporal Supervised Object Change Detection in Remote Sensing Imagery"(https://arxiv.org/abs/2108.07002).
+
+| 参数名                     | 描述                                | 默认值      |
+|-------------------------|-----------------------------------|-------------|
+| `num_classes (int)`     | 目标类别数量                            | `2`         |
+| `use_mixed_loss (bool)` | 是否使用混合损失                          | `False`     |
+| `losses (list)`         | 损失函数列表                            | `None`      |
+| `mid_channels (int)`    | UNet 中间层的通道数                      | `256`       |
+| `inner_channels (int)`  | 注意力模块内部的通道数                       | `16`        |
+| `num_convs (int)`       | UNet 编码器和解码器中卷积层的数量               | `4`         |
+| `scale_factor (float)`  | 上采样因子,用于将低分辨率掩码图像恢复到高分辨率图像大小的放大倍数 | `4.0`       |
+
+
+
+
+## `DSAMNet`
+
+基于PaddlePaddle实现的DSAMNet,用于遥感变化检测。
+
+该模型的原始文章见于 Q. Shi, et al., "A Deeply Supervised Attention Metric-Based Network and an Open Aerial Image Dataset for Remote Sensing Change Detection"(https://ieeexplore.ieee.org/document/9467555).
+
+| 参数名                     | 描述                         | 默认值 |
+|-------------------------|----------------------------|--------|
+| `num_classes (int)`     | 目标类别数量                 | `2`    |
+| `use_mixed_loss (bool)` | 是否使用混合损失函数             | `False`|
+| `losses (list)`         | 损失函数列表                 | `None` |
+| `in_channels (int)`     | 输入图像的通道数             | `3`    |
+| `ca_ratio (int)`        | 通道注意力模块中的通道压缩比 | `8`    |
+| `sa_kernel (int)`       | 空间注意力模块中的卷积核大小 | `7`    |
+
+## `DSIFN`
+
+基于PaddlePaddle的DSIFN实现。
+
+该模型的原始文章见于 The original article refers to C. Zhang, et al., "A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images"(https://www.sciencedirect.com/science/article/pii/S0924271620301532).
+
+| 参数名                   | 描述                   | 默认值 |
+|-------------------------|----------------------|--------|
+| `num_classes (int)`      | 目标类别数量             | `2`    |
+| `use_mixed_loss (bool)`  | 是否使用混合损失函数         | `False`|
+| `losses (list)`          | 损失函数列表             | `None` |
+| `use_dropout (bool)`     | 是否使用 dropout        | `False`|
+
+## `FC-EF`
+
+基于PaddlePaddle的FC-EF实现。
+
+该模型的原始文章见于 The original article refers to Rodrigo Caye Daudt, et al. "Fully convolutional siamese networks for change detection"(https://arxiv.org/abs/1810.08462).
+
+| 参数名                     | 描述                          | 默认值 |
+|-------------------------|-------------------------------|--------|
+| `num_classes (int)`     | 目标类别数量                  | `2`    |
+| `use_mixed_loss (bool)` | 是否使用混合损失函数             | `False`|
+| `losses (list)`         | 损失函数列表                  | `None` |
+| `in_channels (int)`     | 输入图像的通道数              | `6`    |
+| `use_dropout (bool)`    | 是否使用 dropout              | `False`|
+
+
+
+## `FC-Siam-conc`
+
+基于PaddlePaddle的FC-Siam-conc实现。
+
+该模型的原始文章见于 Rodrigo Caye Daudt, et al. "Fully convolutional siamese networks for change detection"(https://arxiv.org/abs/1810.08462).
+
+| 参数名                     | 描述                          | 默认值 |
+|-------------------------|-------------------------------|--------|
+| `num_classes (int)`     | 目标类别数量                  | `2`    |
+| `use_mixed_loss (bool)` | 是否使用混合损失函数            | `False`|
+| `losses (list)`         | 损失函数列表                  | `None` |
+| `in_channels (int)`     | 输入图像的通道数              | `3`    |
+| `use_dropout (bool)`    | 是否使用 dropout               | `False`|
+
+
+## `FC-Siam-diff`
+
+基于PaddlePaddle的FC-Siam-diff实现。
+
+该模型的原始文章见于 Rodrigo Caye Daudt, et al. "Fully convolutional siamese networks for change detection"(https://arxiv.org/abs/1810.08462).
+
+| 参数名 | 描述          | 默认值 |
+| --- |-------------|  --- |
+| `num_classes (int)` | 目标类别数量      | `2` |
+| `use_mixed_loss (bool)` | 是否使用混合损失函数  |`False` |
+| `losses (List)` | 损失函数列表      | `None` |
+| `in_channels (int)` | 输入图像的通道数    | int | `3` |
+| `use_dropout (bool)` | 是否使用 dropout | `False` |
+
+
+## `FCCDN`
+
+基于PaddlePaddle的FCCDN实现。
+
+该模型的原始文章见于 Pan Chen, et al., "FCCDN: Feature Constraint Network for VHR Image Change Detection"(https://arxiv.org/pdf/2105.10860.pdf).
+
+| 参数名                    | 描述         | 默认值 |
+|--------------------------|------------|--------|
+| `in_channels (int)`       | 输入图像的通道数   | `3`    |
+| `num_classes (int)`      | 目标类别数量     | `2`    |
+| `use_mixed_loss (bool)`  | 是否使用混合损失函数 | `False`|
+| `losses (list)`          | 损失函数列表     | `None` |
+
+
+## `P2V-CD`
+
+基于PaddlePaddle的P2V-CD实现。
+
+该模型的原始文章见于 M. Lin, et al. "Transition Is a Process: Pair-to-Video Change Detection Networks for Very High Resolution Remote Sensing Images"(https://ieeexplore.ieee.org/document/9975266).
+
+| 参数名                     | 描述         | 默认值 |
+|-------------------------|------------|--------|
+| `num_classes (int)`     | 目标类别数量     | `2`    |
+| `use_mixed_loss (bool)` | 是否使用混合损失函数 | `False`|
+| `losses (list)`         | 损失函数列表     | `None` |
+| `in_channels (int)`     | 输入图像的通道数   | `3`    |
+| `video_len (int)`       | 输入视频帧的数量   | `8`    |
+## `SNUNet`
+
+基于PaddlePaddle的SNUNet实现。
+
+该模型的原始文章见于 S. Fang, et al., "SNUNet-CD: A Densely Connected Siamese Network for Change Detection of VHR Images" (https://ieeexplore.ieee.org/document/9355573).
+
+| 参数名                     | 描述         | 默认值 |
+|-------------------------|------------| --- |
+| `num_classes (int)`     | 目标类别数量     | `2` |
+| `use_mixed_loss (bool)` | 是否使用混合损失函数 | `False` |
+| `losses (list)`         | 损失函数列表     | `None` |
+| `in_channels (int)`     | 输入图像的通道数   | `3` |
+| `width (int)`           | 神经网络中的通道数  | `32` |
+
+
+## `STANet`
+
+基于PaddlePaddle的STANet实现。
+
+该模型的原始文章见于 H. Chen and Z. Shi, "A Spatial-Temporal Attention-Based Method and a New Dataset for Remote Sensing Image Change Detection"(https://www.mdpi.com/2072-4292/12/10/1662).
+
+| 参数名                     | 描述                                                 | 默认值 |
+|-------------------------|----------------------------------------------------| --- |
+| `num_classes (int)`     | 目标类别数量                                             | `2` |
+| `use_mixed_loss (bool)` | 是否使用混合损失函数                                         | `False` |
+| `losses (list)`         | 损失函数列表                                             | None |
+| `in_channels (int)`     | 输入图像的通道数                                           | `3` |
+| `att_type (str)`        | 注意力模块的类型,可以是`'BAM'`(波段注意力模块)或 `'CBAM'`(通道和波段注意力模块) | `'BAM'` |
+| `ds_factor (int)`       | 下采样因子,可以是`1`、`2`或`4`                               | `1` |
+## `CondenseNetV2`
+
+基于PaddlePaddle的CondenseNetV2实现。
+
+该模型的原始文章见于Yang L, Jiang H, Cai R, et al. “Condensenet v2: Sparse feature reactivation for deep networks” (https://arxiv.org/abs/2104.04382)
+
+| 参数名                     | 描述                         | 默认值 |
+|-------------------------|----------------------------| --- |
+| `num_classes (int)`     | 目标类别数量                     | `2` |
+| `use_mixed_loss (bool)` | 是否使用混合损失函数                 | `False` |
+| `losses (list)`         | 损失函数列表                     | `None` |
+| `in_channels (int)`     | 模型的输入通道数                   | `3` |
+| `arch (str)`            | 模型的架构,可以是`'A'`、`'B'`或`'C'` | `'A'` |
+
+
+## `HRNet`
+
+基于PaddlePaddle的HRNet实现。
+
+| 参数名                     | 描述         | 默认值 |
+|-------------------------|------------| --- |
+| `num_classes (int)`     | 目标类别数量     | `2` |
+| `use_mixed_loss (bool)` | 是否使用混合损失函数 | `False` |
+| `losses (list)`         | 损失函数列表     | `None` |
+
+## `MobileNetV3`
+
+基于PaddlePaddle的MobileNetV3实现。
+
+| 参数名                     | 描述         | 默认值 |
+|-------------------------|------------| --- |
+| `num_classes (int)`     | 目标类别数量     | `2` |
+| `use_mixed_loss (bool)` | 是否使用混合损失函数 | `False` |
+| `losses (list)`         | 损失函数列表     | `None` |
+
+
+## `ResNet50-vd`
+
+基于PaddlePaddle的ResNet50-vd实现。
+
+| 参数名                     | 描述         | 默认值 |
+|-------------------------|------------| --- |
+| `num_classes (int)`     | 目标类别数量     | `2` |
+| `use_mixed_loss (bool)` | 是否使用混合损失函数 | `False` |
+| `losses (list)`         | 损失函数列表     | `None` |
+
+## `DRN`
+
+基于PaddlePaddle的DRN实现。
+
+| 参数名                     | 描述                                                                                     | 默认值   |
+|-------------------------|----------------------------------------------------------------------------------------|-------|
+| `losses (list)`         | 损失函数列表                                                                                 | `None` |
+| `sr_factor (int)`       | 超分辨率的缩放因子,原始图像的大小将乘以此因子。例如,如果原始图像为 `H` x `W`,则输出图像将为 `sr_factor * H` x `sr_factor * W` | `4`   |
+| `min_max (None \| tuple[float, float])` | 图像像素值的最小值和最大值                                                                          | `None` |
+| `scales (tuple[int])` | 缩放因子                                                                                   | `(2, 4)` |
+| `n_blocks (int)`           | 残差块的数量                                                                                 | `30`  |
+| `n_feats (int)`            | 残差块中的特征维度                                                                              | `16`  |
+| `n_colors (int)`           | 图像通道数                                                                                  | `3`   |
+| `rgb_range (float)`        | 图像像素值的范围                                                                               | `1.0` |
+| `negval (float)`           | 用于激活函数中的负数值的处理                                                                         | `0.2` |
+| `lq_loss_weight (float)`   | 低质量图像损失的权重,用来控制将低分辨率的输入图像恢复成高分辨率的输出图像的重构损失对于总体损失的影响程度。                                            | `0.1` |
+| `dual_loss_weight (float)` | 双重损失的权重                                                                                | `0.1` |
+
+## `ESRGAN`
+
+基于PaddlePaddle的ESRGAN实现。
+
+| 参数名                  | 描述                                                                                     | 默认值 |
+|----------------------|----------------------------------------------------------------------------------------| --- |
+| `losses (list)`      | 损失函数列表                                                                                 | `None` |
+| `sr_factor (int)`    | 超分辨率的缩放因子,原始图像的大小将乘以此因子。例如,如果原始图像为 `H` x `W`,则输出图像将为 `sr_factor * H` x `sr_factor * W` | `4` |
+| `min_max (tuple)`    | 输入图像的像素值的最小值和最大值。如果未指定,则使用数据类型的默认最小值和最大值。                                              | `None` |
+| `use_gan (bool)`     | 布尔值,指示是否在训练过程中使用 GAN (生成对抗网络)。如果是,将使用 GAN。                                             | `True` |
+| `in_channels (int)`  | 输入图像的通道数                                                                               | `3` |
+| `out_channels (int)` | 输出图像的通道数。                                                                        | `3` |
+| `nf (int)`           | 模型第一层卷积层的滤波器数量。                                                                        | `64` |
+| `nb (int)`           | 模型中残差块的数量。                                                                             | `23` |
+
+## `LESRCNN`
+
+基于PaddlePaddle的LESRCNN实现。
+
+| 参数名                  | 描述                                                                                      | 默认值 |
+|----------------------|-----------------------------------------------------------------------------------------| --- |
+| `losses (list)`      | 损失函数列表                                                                                  | `None` |
+| `sr_factor (int)`    | 超分辨率的缩放因子,原始图像的大小将乘以此因子。例如,如果原始图像为 `H` x `W`,则输出图像将为 `sr_factor * H` x `sr_factor * W`。 | `4` |
+| `min_max (tuple)`    | 输入图像的像素值的最小值和最大值。如果未指定,则使用数据类型的默认最小值和最大值。                                               | `None` |
+| `multi_scale (bool)` | 布尔值,指示是否在多个尺度下进行训练。如果是,则在训练过程中使用多个尺度。                                                   | `False` |
+| `group (int)`        | 控制卷积操作的组数。如果设置为 `1`,则为标准卷积;如果设置为输入通道数,则为 DWConv。                                        | `1` |
+
+## `Faster R-CNN`
+
+基于PaddlePaddle的Faster R-CNN实现。
+
+| 参数名                           | 描述                                                  | 默认值 |
+|-------------------------------|-----------------------------------------------------| --- |
+| `num_classes (int)`           | 目标类别数量                                              | `80` |
+| `backbone (str)`              | Faster R-CNN的主干网络                                   | `'ResNet50'` |
+| `with_fpn (bool)`             | 布尔值,指示是否使用特征金字塔网络 (FPN)。                            | `True` |
+| `with_dcn (bool)`             | 布尔值,指示是否使用 Deformable Convolutional Networks (DCN)。 | `False` |
+| `aspect_ratios (list)`        | 候选框的宽高比列表。                                          | `[0.5, 1.0, 2.0]` |
+| `anchor_sizes (list)`         | 候选框的大小列表,表示为每个特征图上的基本大小。                            | `[[32], [64], [128], [256], [512]]` |
+| `keep_top_k (int)`            | 在进行 NMS 操作之前,保留的预测框的数量。                             | `100` |
+| `nms_threshold (float)`       | 使用的非极大值抑制 (NMS) 阈值。                                 | `0.5` |
+| `score_threshold (float)`     | 过滤预测框的分数阈值。                                         | `0.05` |
+| `fpn_num_channels (int)`      | FPN 网络中每个金字塔层的通道数。                                  | `256` |
+| `rpn_batch_size_per_im (int)` | RPN 网络中每张图像的正负样本比例。                                 | `256` |
+| `rpn_fg_fraction (float)`     | RPN 网络中前景样本的比例。                                     | `0.5` |
+| `test_pre_nms_top_n (int)`    | 测试时,进行 NMS 操作之前保留的预测框的数量。如果未指定,则使用 `keep_top_k`。    | `None` |
+| `test_post_nms_top_n (int)`   | 测试时,进行 NMS 操作之后保留的预测框的数量。                           | `1000` |
+
+## `PP-YOLO`
+
+基于PaddlePaddle的PP-YOLO实现。
+
+| 参数名                              | 描述                  | 默认值 |
+|----------------------------------|---------------------| --- |
+| `num_classes (int)`              | 目标类别数量              | `80` |
+| `backbone (str)`                 | PPYOLO 的主干网络        | `'ResNet50_vd_dcn'` |
+| `anchors (list[list[float]])`    | 预定义锚框的大小            | `None` |
+| `anchor_masks (list[list[int]])` | 预定义锚框的掩码            | `None` |
+| `use_coord_conv (bool)`          | 是否使用坐标卷积            | `True` |
+| `use_iou_aware (bool)`           | 是否使用 IoU 感知         | `True` |
+| `use_spp (bool)`                 | 是否使用空间金字塔池化(SPP)    | `True` |
+| `use_drop_block (bool)`          | 是否使用 DropBlock 正则化  | `True` |
+| `scale_x_y (float)`              | 对每个预测框进行缩放的参数       | `1.05` |
+| `ignore_threshold (float)`       | IoU 阈值,用于将预测框分配给真实框 | `0.7` |
+| `label_smooth (bool)`            | 是否使用标签平滑            | `False` |
+| `use_iou_loss (bool)`            | 是否使用 IoU Loss       | `True` |
+| `use_matrix_nms (bool)`          | 是否使用 Matrix NMS     | `True` |
+| `nms_score_threshold (float)`    | NMS  的分数阈值          | `0.01` |
+| `nms_topk (int)`                 | 在执行 NMS 之前保留的最大检测数  | `-1` |
+| `nms_keep_topk (int)`            | NMS 后要保留的最大预测框数     | `100`|
+| `nms_iou_threshold (float)`      | NMS IoU 阈值          | `0.45`  |
+
+
+## `PP-YOLO Tiny`
+
+基于PaddlePaddle的PP-YOLO Tiny实现。
+
+| 参数名                              | 描述                    | 默认值 |
+|----------------------------------|-----------------------| --- |
+| `num_classes (int)`              | 目标类别数量                | `80` |
+| `backbone (str)`                 | PP-YOLO Tiny的主干网络     | `'MobileNetV3'` |
+| `anchors (list[list[float]])`    | anchor box 大小列表       | `[[10, 15], [24, 36], [72, 42], [35, 87], [102, 96], [60, 170], [220, 125], [128, 222], [264, 266]]` |
+| `anchor_masks (list[list[int]])` | anchor box 掩码         | `[[6, 7, 8], [3, 4, 5], [0, 1, 2]]` |
+| `use_iou_aware (bool)`           | 布尔值,指示是否使用 IoU-aware loss | `False` |
+| `use_spp (bool)`                 | 布尔值,指示是否使用 SPP 模块     | `True` |
+| `use_drop_block (bool)`          | 布尔值,指示是否使用 DropBlock 模块 | `True` |
+| `scale_x_y (float)`              | 缩放参数                  | `1.05` |
+| `ignore_threshold (float)`       | 忽略阈值                  | `0.5` |
+| `label_smooth (bool)`            | 布尔值,指示是否使用标签平滑        | `False` |
+| `use_iou_loss (bool)`            | 布尔值,指示是否使用 IoU Loss   | `True` |
+| `use_matrix_nms (bool)`          | 布尔值,指示是否使用 Matrix NMS | `False` |
+| `nms_score_threshold (float)`    | NMS 得分阈值              | `0.005` |
+| `nms_topk (int)`                 | NMS 操作前保留的边界框数        | `1000` |
+| `nms_keep_topk (int)`            | NMS 操作后保留的边界框数        | `100` |
+| `nms_iou_threshold (float)`      | NMS IoU 阈值            | `0.45` |
+
+
+## `PP-YOLOv2`
+
+基于PaddlePaddle的PP-YOLOv2实现。
+
+
+| 参数名                              | 描述                  | 默认值 |
+|----------------------------------|---------------------| --- |
+| `num_classes (int)`              | 目标类别数量              | `80` |
+| `backbone (str)`                 | PPYOLO 的骨干网络        | `'ResNet50_vd_dcn'` |
+| `anchors (list[list[float]])`    | 预定义锚框的大小            | `[[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]]` |
+| `anchor_masks (list[list[int]])` | 预定义锚框的掩码            | `[[6, 7, 8], [3, 4, 5], [0, 1, 2]]` |
+| `use_iou_aware (bool)`           | 是否使用 IoU 感知         | `True` |
+| `use_spp (bool)`                 | 是否使用空间金字塔池化( SPP )  | `True` |
+| `use_drop_block (bool)`          | 是否使用 DropBlock 正则化  | `True` |
+| `scale_x_y (float)`              | 对每个预测框进行缩放的参数       | `1.05` |
+| `ignore_threshold (float)`       | IoU 阈值,用于将预测框分配给真实框 | `0.7` |
+| `label_smooth (bool)`            | 是否使用标签平滑            | `False` |
+| `use_iou_loss (bool)`            | 是否使用 IoU Loss       | `True` |
+| `use_matrix_nms (bool)`          | 是否使用 Matrix NMS     | `True` |
+| `nms_score_threshold (float)`    | NMS 的分数阈值           | `0.01` |
+| `nms_topk (int)`                 | 在执行 NMS 之前保留的最大检测数  | `-1` |
+| `nms_keep_topk (int)`            | NMS 后要保留的最大预测框数     | `100`|
+| `nms_iou_threshold (float)`      | NMS IoU 阈值          | `0.45`  |
+
+## `YOLOv3`
+
+基于PaddlePaddle的YOLOv3实现。
+
+| 参数名 | 描述                            | 默认值 |
+| --- |-------------------------------| --- |
+| `num_classes (int)` | 目标类别数量                        | `80` |
+| `backbone (str)` | YOLOv3的主干网络的名称                | `'MobileNetV1'` |
+| `anchors (list[list[int]])` | 所有锚框的大小                       | `[[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]]` |
+| `anchor_masks (list[list[int]])` | 使用哪些锚框来预测目标框                  | `[[6, 7, 8], [3, 4, 5], [0, 1, 2]]` |
+| `ignore_threshold (float)` | 预测框和真实框的 IoU 阈值,低于该阈值将被视为背景   | `0.7` |
+| `nms_score_threshold (float)` | 非极大值抑制中,分数阈值,低于该分数的框将被丢弃      | `0.01` |
+| `nms_topk (int)` | 非极大值抑制中,保留的最大得分框数,为`-1`则保留所有框 | `1000` |
+| `nms_keep_topk (int)` | 非极大值抑制中,每个图像保留的最大框数           | `100` |
+| `nms_iou_threshold (float)` | 非极大值抑制中,IoU 阈值,大于该阈值的框将被丢弃    | `0.45` |
+| `label_smooth (bool)` | 是否在计算损失时使用标签平滑                | `False` |
+
+## `BiSeNet V2`
+
+基于PaddlePaddle的BiSeNet V2实现。
+
+| 参数名                     | 描述 | 默认值      |
+|-------------------------| --- |----------|
+| `in_channels (int)`     | 输入图片的通道数 | `3`      |
+| `num_classes (int)`     | 目标类别数量 | `2`      |
+| `use_mixed_loss (bool)` | 是否使用混合损失函数 | `False`  |
+| `losses (list)`         | 模型的各个部分的损失函数 | `{None}` |
+| `align_corners (bool)`  | 是否使用角点对齐方法 | `False`  |
+
+
+
+
+## `DeepLab V3+`
+
+基于PaddlePaddle的DeepLab V3+实现。
+
+| 参数名                        | 描述                  | 默认值 |
+|----------------------------|---------------------| --- |
+| `in_channels (int)`        | 输入图像的通道数            | `3` |
+| `num_classes (int)`        | 目标类别数量              | `2` |
+| `backbone (str)`           | DeepLab V3+的主干网络         | `ResNet50_vd` |
+| `use_mixed_loss (bool)`    | 是否使用混合损失函数          | `False` |
+| `losses (list)`            | 损失函数列表              | `None` |
+| `output_stride (int)`      | 输出特征图相对于输入特征图的下采样倍率 | `8` |
+| `backbone_indices (tuple)` | 输出主干网络不同阶段的位置索引     | `(0, 3)` |
+| `aspp_ratios (tuple)`      | 空洞卷积的扩张率            | `(1, 12, 24, 36)` |
+| `aspp_out_channels (int)`  | ASPP 模块输出通道数        | `256` |
+| `align_corners (bool)`     | 是否使用角点对齐方法          | `False` |
+
+## `FactSeg`
+
+基于PaddlePaddle的FactSeg实现。
+
+该模型的原始文章见于 A. Ma, J. Wang, Y. Zhong and Z. Zheng, "FactSeg: Foreground Activation -Driven Small Object Semantic Segmentation in Large-Scale Remote Sensing Imagery,"in IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1-16, 2022, Art no. 5606216.
+
+
+| 参数名                     | 描述                             | 默认值 |
+|-------------------------|--------------------------------| --- |
+| `in_channels (int)`     | 输入图像的通道数                       | `3` |
+| `num_classes (int)`     | 目标类别数量                         | `2` |
+| `use_mixed_loss (bool)` | 是否使用混合损失函数                     | `False` |
+| `losses (list)`         | 损失函数列表                         | `None` |
+
+## `FarSeg`
+
+基于PaddlePaddle的FarSeg实现。
+
+该模型的原始文章见于 Zheng Z, Zhong Y, Wang J, et al. Foreground-aware relation network for geospatial object segmentation in high spatial resolution remote sensing imagery[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 4096-4105.
+
+
+| 参数名                     | 描述                             | 默认值 |
+|-------------------------|--------------------------------| --- |
+| `in_channels (int)`     | 输入图像的通道数                       | `3` |
+| `num_classes (int)`     | 目标类别数量                         | `2` |
+| `use_mixed_loss (bool)` | 是否使用混合损失函数                     | `False` |
+| `losses (list)`         | 损失函数列表                         | `None` |
+
+## `Fast-SCNN`
+
+基于PaddlePaddle的Fast-SCNN实现。
+
+| 参数名                     | 描述         | 默认值 |
+|-------------------------|------------| --- |
+| `in_channels (int)`     | 输入图像的通道数   | `3` |
+| `num_classes (int)`     | 目标类别数量     | `2` |
+| `use_mixed_loss (bool)` | 是否使用混合损失函数 | `False` |
+| `losses (list)`         | 损失函数列表     | `None` |
+| `align_corners (bool)`  | 是否使用角点对齐方法 | `False` |
+
+
+## `HRNet`
+
+基于PaddlePaddle的HRNet实现。
+
+| 参数名                     | 描述                             | 默认值 |
+|-------------------------|--------------------------------| --- |
+| `in_channels (int)`     | 输入图像的通道数                       | `3` |
+| `num_classes (int)`     | 目标类别数量                         | `2` |
+| `width (int)`           | 网络的初始通道数                       | `48` |
+| `use_mixed_loss (bool)` | 是否使用混合损失函数                     | `False` |
+| `losses (list)`         | 损失函数列表                         | `None` |
+| `align_corners (bool)`  | 是否使用角点对齐方法                     | `False` |
+
+
+
+
+## `UNet`
+
+基于PaddlePaddle的UNet实现。
+
+| 参数名                     | 描述                             | 默认值 |
+|-------------------------|--------------------------------| --- |
+| `in_channels (int)`     | 输入图像的通道数                       | `3` |
+| `num_classes (int)`     | 目标类别数量                         | `2` |
+| `use_deconv (bool)`     | 是否使用反卷积进行上采样                   | `False` |
+| `use_mixed_loss (bool)` | 是否使用混合损失函数                     | `False` |
+| `losses (list)`         | 损失函数列表                         | `None` |
+| `align_corners (bool)`  | 是否使用角点对齐方法                     | `False` |

+ 478 - 0
docs/intro/model_cons_param_en.md

@@ -0,0 +1,478 @@
+# PaddleRS Model Construction Parameters
+
+This document describes the construction parameters of each PaddleRS model trainer in detail, including their parameter names, parameter types, parameter descriptions and default values.
+
+## `BIT`
+
+The BIT implementation based on PaddlePaddle.
+
+The original article refers to H. Chen, et al., "Remote Sensing Image Change Detection With Transformers "(https://arxiv.org/abs/2103.00208).
+
+This implementation adopts pretrained encoders, as opposed to the original work where weights are randomly initialized.
+
+| Parameter Name                    | Description                                                                                                                                  | Default Value |
+|-----------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------|-------------|
+| `in_channels (int)`               | Number of channels of the input image                                                                                                          | `3` |
+| `num_classes (int)`               | Number of target classes                                                                                                                     | `2` |
+| `use_mixed_loss (bool, optional)` | Whether to use mixed loss function                                                                                                 | `False` |
+| `losses (list, optional)`         | List of loss functions                                                                                                                       | `None` |
+| `att_type (str, optional)`        | Spatial attention type, optional values are `'CBAM'` and `'BAM'`                                                                                 | `'CBAM'` |
+| `ds_factor (int, optional)`       | Downsampling factor                                                                                                                          | `1` |
+| `backbone (str, optional)`        | ResNet architecture to use as backbone. Currently only `'resnet18'` and `'resnet34'` are supported                                               | `'resnet18'` |
+| `n_stages (int, optional)`        | Number of ResNet stages used in the backbone, should be a value in `{3, 4, 5}`                                                                 | `4` |
+| `use_tokenizer (bool, optional)`  | Whether to use tokenizer                                                                                                                     | `True` |
+| `token_len (int, optional)`       | Length of input token                                                                                                                        | `4` |
+| `pool_mode (str, optional)`       | Gets the pooling strategy for input tokens when `'use_tokenizer'` is set to False. `'max'` means global max pooling, `'avg'` means global average pooling | `'max'` |
+| `pool_size (int, optional)`       | When `'use_tokenizer'` is set to False, the height and width of the pooled feature map                                                        | `2` |
+| `enc_with_pos (bool, optional)`   | Whether to add learned positional embeddings to the encoder's input feature sequence                                                         | `True` |
+| `enc_depth (int, optional)`       | Number of attention blocks used in encoder                                                                                                   | `1` |
+| `enc_head_dim (int, optional)`    | Embedding dimension of each encoder head                                                                                                     | `64` |
+| `dec_depth (int, optional)`       | Number of attention blocks used in decoder                                                                                                   | `8` |
+| `dec_head_dim (int, optional)`    | Embedding dimension for each decoder head                                                                                                    | `8` |
+
+## `CDNet`
+
+The CDNet implementation based on PaddlePaddle.
+
+The original article refers to Pablo F. Alcantarilla, et al., "Street-View Change Detection with Deconvolut ional Networks"(https://link.springer.com/article/10.1007/s10514-018-9734-5).
+
+| Parameter Name          | Description                                                                                        | Default Value |
+|-------------------------|----------------------------------------------------------------------------------------------------| ---------- |
+| `num_classes (int)`     | Number of target classes                                     | `2` |
+| `use_mixed_loss (bool)` | Whether to use mixed loss function                                                                 | `False` |
+| `losses (list)`         | List of loss functions                                                                             | `None` |
+| `in_channels (int)`     | Number of channels of the input image                                                              | `6` |
+
+## `ChangeFormer`
+
+The ChangeFormer implementation based on PaddlePaddle.
+
+The original article refers to Wele Gedara Chaminda Bandara,Vishal M. Patel,“A TRANSFORMER-BASED SIAMESE NETWORK FOR CHANGE DETECTION”(https://arxiv.org/pdf/2201.01293.pdf)。
+
+| Parameter Name | Description                                                                 | Default Value |
+|--------------------------------|-----------------------------------------------------------------------------|--------------|
+| `num_classes (int)` | Number of target classes                                                    | `2` |
+| `use_mixed_loss (bool)` | Whether to use mixed loss                                                   | `False` |
+| `losses (list)` | List of loss functions                                                      | `None` |
+| `in_channels (int)` | Number of channels of the input image                                          | `3` |
+| `decoder_softmax (bool)` | Whether to use softmax as the last layer activation function of the decoder | `False` |
+| `embed_dim (int)` | Hidden layer dimension of the Transformer encoder                           | `256` |
+
+## `ChangeStar_FarSeg`
+
+The ChangeStar implementation with a FarSeg encoder based on PaddlePaddle.
+
+The original article refers to Z. Zheng, et al., "Change is Everywhere: Single-Temporal Supervised Object Change Detection in Remote Sensing Imagery"(https://arxiv.org/abs/2108.07002).
+
+| Parameter Name          | Description                                                         | Default Value |
+|-------------------------|---------------------------------------------------------------------|-------------|
+| `num_classes (int)`     | Number of target classes                                            | `2` |
+| `use_mixed_loss (bool)` | Whether to use mixed loss                                           | `False` |
+| `losses (list)`         | List of loss functions                                              | `None` |
+| `mid_channels (int)`    | Number of channels in the middle layer of UNet                      | `256` |
+| `inner_channels (int)`  | Number of channels inside the attention module                      | `16` |
+| `num_convs (int)`       | Number of convolutional layers in UNet encoder and decoder          | `4` |
+| `scale_factor (float)`  | Upsampling factor to scale the size of the output segmentation mask | `4.0` |
+
+## `DSAMNet`
+
+The DSAMNet implementation based on PaddlePaddle.
+
+The original article refers to Q. Shi, et al., "A Deeply Supervised Attention Metric-Based Network and an Open Aerial Image Dataset for Remote Sensing Change Detection"(https://ieeexplore.ieee.org/document/9467555).
+
+| Parameter Name | Description                                                                                                                     | Default Value |
+|-----------------------|---------------------------------------------------------------------------------------------------------------------------------|-------|
+| `num_classes (int)` | Number of target classes                                                                                                        | `2` |
+| `use_mixed_loss (bool)` | Whether to use mixed loss                                                                                                       | `False`|
+| `losses (list)` | List of loss functions                                                                                                          | `None` |
+| `in_channels (int)` | Number of channels of the input image                                                                                           | `3` |
+| `ca_ratio (int)` | Channel compression ratio in channel attention module                                                                           | `8` |
+| `sa_kernel (int)` | Kernel size in the spatial attention module                                                                                     | `7` |
+
+## `DSIFN`
+
+The DSIFN implementation based on PaddlePaddle.
+
+The original article refers to C. Zhang, et al., "A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images"(https://www.sciencedirect.com/science/article/pii/S0924271620301532).
+
+| Parameter Name | Description                                                                                        | Default Value |
+|-----------------------|----------------------------------------------------------------------------------------------------|-------|
+| `num_classes (int)` | Number of target classes                                                                           | `2` |
+| `use_mixed_loss (bool)` | Whether to use mixed loss                                                                          | `False`|
+| `losses (list)` | List of loss functions                                                                             | `None` |
+| `use_dropout (bool)` | Whether to use dropout                                                                             | `False`|
+
+## `FC-EF`
+
+The FC-EF implementation based on PaddlePaddle.
+
+The original article refers to Rodrigo Caye Daudt, et al. "Fully convolutional siamese networks for change detection"(https://arxiv.org/abs/1810.08462)`.
+
+| Parameter Name          | Description                           | Default Value |
+|-------------------------|---------------------------------------|-------|
+| `num_classes (int)`     | Number of target classes              | `2` |
+| `use_mixed_loss (bool)` | Whether to use mixed loss             | `False`|
+| `losses (list)`         | List of loss functions                | `None` |
+| `in_channels (int)`     | Number of channels of the input image | `6` |
+| `use_dropout (bool)`    | Whether to use dropout                | `False`|
+
+## `FC-Siam-conc`
+
+The FC-Siam-conc implementation based on PaddlePaddle.
+
+The original article refers to Rodrigo Caye Daudt, et al. "Fully convolutional siamese networks for change detection"(https://arxiv.org/abs/1810.08462).
+
+| Parameter Name          | Description                                                                                                                     | Default Value |
+|-------------------------|---------------------------------------------------------------------------------------------------------------------------------|-------|
+| `num_classes (int)`     | Number of target classes                                                                                                        | `2` |
+| `use_mixed_loss (bool)` | Whether to use mixed loss                                                                                                       | `False`|
+| `losses (list)`         | List of loss functions                                                                                                          | `None` |
+| `in_channels (int)`     | Number of channels of the input image                                                                                           | `3` |
+| `use_dropout (bool)`    | Whether to use dropout                                                                                                          | `False`|
+
+## `FC-Siam-diff`
+
+The FC-Siam-diff implementation based on PaddlePaddle.
+
+The original article refers to Rodrigo Caye Daudt, et al. "Fully convolutional siamese networks for change detection"(https://arxiv.org/abs/1810.08462).
+
+| Parameter Name          | Description                                                                                      | Default Value |
+|-------------------------|--------------------------------------------------------------------------------------------------| --- |
+| `num_classes (int)`     | Number of target classes                                                  | `2` |
+| `use_mixed_loss (bool)` | Whether to use mixed loss function                       |`False` |
+| `losses (list)`         | List of loss functions                                       | `None` |
+| `in_channels (int)`     | Number of channels of the input image                                                          | int | `3` |
+| `use_dropout (bool)`    | Whether to use dropout                                         | `False` |
+
+## `FCCDN`
+
+The FCCDN implementation based on PaddlePaddle.
+
+The original article refers to Pan Chen, et al., "FCCDN: Feature Constraint Network for VHR Image Change Detection"(https://arxiv.org/pdf/2105.10860.pdf).
+
+| Parameter Name | Description                           | Default Value |
+|--------------------------|---------------------------------------|-------|
+| `in_channels (int)` | Number of channels of the input image | `3` |
+| `num_classes (int)` | Number of target classes              | `2` |
+| `use_mixed_loss (bool)` | Whether to use mixed loss             | `False`|
+| `losses (list)` | List of loss functions                | `None` |
+
+## `P2V-CD`
+
+The P2V-CD implementation based on PaddlePaddle.
+
+The original article refers to M. Lin, et al. "Transition Is a Process: Pair-to-Video Change Detection Networks for Very High Resolution Remote Sensing Images"(https://ieeexplore.ieee.org/document/9975266).
+
+| Parameter Name          | Description                           | Default Value |
+|-------------------------|---------------------------------------|-------|
+| `num_classes (int)`     | Number of target classes              | `2` |
+| `use_mixed_loss (bool)` | Whether to use mixed loss             | `False`|
+| `losses (list)`         | List of loss functions                | `None` |
+| `in_channels (int)`     | Number of channels of the input image | `3` |
+| `video_len (int)`       | Number of input video frames          | `8` |
+
+## `SNUNet`
+
+The SNUNet implementation based on PaddlePaddle.
+
+The original article refers to S. Fang, et al., "SNUNet-CD: A Densely Connected Siamese Network for Change Detection of VHR Images" (https://ieeexplore.ieee.org/document/9355573).
+
+| arg_name               | Description                                     | default  |
+|------------------------|-------------------------------------------------|------|
+| `in_channels (int)`    | Number of channels of the input image           |      |
+| `num_classes(int)`      | Number of target classes                        |      |
+| `width (int,optional)` | Output channels of the first convolutional layer | 32   |
+
+## `STANet`
+
+The STANet implementation based on PaddlePaddle.
+
+The original article refers to  H. Chen and Z. Shi, "A Spatial-Temporal Attention-Based Method and a New Dataset for Remote Sensing Image Change Detection"(https://www.mdpi.com/2072-4292/12/10/1662).
+
+| Parameter Name          | Description                              | Default Value |
+|-------------------------|------------------------------------------| --- |
+| `num_classes (int)`     | Number of target classes                 | `2` |
+| `use_mixed_loss (bool)` | Whether to use mixed loss                | `False` |
+| `losses (list)`         | List of loss functions                   | `None` |
+| `in_channels (int)`     | Number of channels of the input image    | `3` |
+| `width (int)`           | Number of channels in the neural network | `32` |
+
+## `CondenseNetV2`
+
+The CondenseNetV2 implementation based on PaddlePaddle.
+
+| Parameter Name          | Description                                             | Default Value |
+|-------------------------|---------------------------------------------------------| --- |
+| `num_classes (int)`     | Number of target classes                                | `2` |
+| `use_mixed_loss (bool)` | Whether to use mixed loss function                      | `False` |
+| `losses (list)`         | List of loss functions                                  | `None` |
+| `in_channels (int)`     | Number of channels of the input image                   | `3` |
+| `arch (str)`            | Architecture of the model, can be `'A'`, `'B'` or `'C'` | `'A'` |
+
+##  `HRNet`
+
+The HRNet implementation based on PaddlePaddle.
+
+| Parameter Name          | Description                        | Default Value |
+|-------------------------|------------------------------------| --- |
+| `num_classes (int)`     | Number of target classes           | `2` |
+| `use_mixed_loss (bool)` | Whether to use mixed loss function | `False` |
+| `losses (list)`         | List of loss functions             | `None` |
+
+
+##  `MobileNetV3`
+
+The MobileNetV3 implementation based on PaddlePaddle.
+
+| Parameter Name          | Description | Default Value |
+|-------------------------| --- | --- |
+| `num_classes (int)`     | Number of target classes| `2` |
+| `use_mixed_loss (bool)` | Whether to use mixed loss function | `False` |
+| `losses (list)`         | List of loss functions | `None` |
+
+
+##  `ResNet50-vd`
+
+The ResNet50-vd implementation based on PaddlePaddle.
+
+| Parameter Name          | Description | Default Value |
+|-------------------------| --- | --- |
+| `num_classes (int)`     | Number of target classes | `2` |
+| `use_mixed_loss (bool)` | Whether to use mixed loss function | `False` |
+| `losses (list)`         | List of loss functions | `None` |
+
+## `DRN`
+
+The DRN implementation based on PaddlePaddle.
+
+| Parameter Name                                                    | Description                                                                                                                                                                                                         | Default Value |
+|-------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------|
+| `losses (list)`                                                   | List of loss functions                                                                                                                                                                                              | `None` |
+| `sr_factor (int)`                                                 | Scaling factor for super-resolution, the size of the original image will be multiplied by this factor. For example, if the original image is `H` x `W`, the output image will be `sr_factor * H` x `sr_factor * W`. | `4` |
+| `min_max (None \| tuple[float, float])`                                                                                                                                                                                               | Minimum and maximum image pixel values                                                                                                                                                                              | `None` |
+| `scales (tuple[int])`                                        | Scaling factor                                                                                                                                                                                                      | `(2, 4)` |
+| `n_blocks (int)`                                                  | Number of residual blocks                                                                                                                                                                                           | `30` |
+| `n_feats (int)`                                                   | Number of features in the residual block                                                                                                                                                                            | `16` |
+| `n_colors (int)`                                                  | Number of image channels                                                                                                                                                                                            | `3` |
+| `rgb_range (float)`                                               | Range of image pixel values                                                                                                                                                                                         | `1.0` |
+| `negval (float)`                                                  | Negative value in nonlinear mapping                                                                                                                                                                                 | `0.2` |
+| `Supplementary Description of `lq_loss_weight` parameter (float)` | Weight of the low-quality image loss, which is used to control the impact of the reconstruction loss on the overall loss of restoring the low-resolution input image into a high-resolution output image.           | `0.1` |
+| `dual_loss_weight (float)`                                        | Weight of the bilateral loss                                                                                                                                                                                        | `0.1` |
+
+
+## `ESRGAN`
+
+The ESRGAN implementation based on PaddlePaddle.
+
+| Parameter Name       | Description                                                                                                                                                                                                        | Default Value |
+|----------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| --- |
+| `losses (list)`      | List of loss functions                                                                                                                                                                                             | `None` |
+| `sr_factor (int)`    | Scaling factor for super-resolution, the size of the original image will be multiplied by this factor. For example, if the original image is `H` x `W`, the output image will be `sr_factor * H` x `sr_factor * W` | `4` |
+| `min_max (tuple)`    | Minimum and maximum pixel values of the input image. If not specified, the data type's default minimum and maximum values are used                                                                                 | `None` |
+| `use_gan (bool)`     | Boolean indicating whether to use GAN (Generative Adversarial Network) during training. If yes, GAN will be used                                                                                                   | `True` |
+| `in_channels (int)`  | Number of channels of the input image                                                                                                                                                                              | `3` |
+| `out_channels (int)` | Number of channels of the output image                                                                                                                                                                             | `3` |
+| `nf (int)`           | Number of filters in the first convolutional layer of the model                                                                                                                                                    | `64` |
+| `nb (int)`           | Number of residual blocks in the model                                                                                                                                                                             | `23` |
+
+## `LESRCNN`
+
+The LESRCNN implementation based on PaddlePaddle.
+
+| Parameter Name       | Description                                                                                                                                                                                                     | Default Value |
+|----------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| --- |
+| `losses (list)`      | List of loss functions                                                                                                                                                                                                                | `None` |
+| `sr_factor (int)`    | Scaling factor for super-resolution, the size of the original image will be multiplied by this factor. For example, if the original image is `H` x `W`, the output image will be `sr_factor * H` x `sr_factor * W`. | `4` |
+| `min_max (tuple)`    | Minimum and maximum pixel values of the input image. If not specified, the data type's default minimum and maximum values are used.                                                                             | `None` |
+| `multi_scale (bool)` | Boolean indicating whether to train on multiple scales. If yes, multiple scales are used during training.                                                                                                       | `False` |
+| `group (int)`        | Controls the number of groups for convolution operations. Standard convolution if set to `1`, DWConv if set to the number of input channels.                                                                    | `1` |
+
+##  `Faster R-CNN`
+
+The Faster R-CNN implementation based on PaddlePaddle.
+
+| Parameter Name                | Description                                                                                                | Default Value |
+|-------------------------------|------------------------------------------------------------------------------------------------------------| --- |
+| `num_classes (int)`           | Number of target classes                                                                                   | `80` |
+| `backbone (str)`              | Backbone network model to use                                                                              | `'ResNet50'` |
+| `with_fpn (bool)`             | Boolean indicating whether to use Feature Pyramid Network (FPN)                                            | `True` |
+| `with_dcn (bool)`             | Boolean indicating whether to use Deformable Convolutional Networks (DCN)                                  | `False` |
+| `aspect_ratios (list)`        | List of aspect ratios of candidate boxes                                                                   | `[0.5, 1.0, 2.0]` |
+| `anchor_sizes (list)`         | list of sizes of candidate boxes expressed as base sizes on each feature map                               | `[[32], [64], [128], [256], [512]]` |
+| `keep_top_k (int)`            | Number of predicted boxes to keep before NMS operation                                                     | `100` |
+| `nms_threshold (float)`       | Non-maximum suppression (NMS) threshold to use                                                             | `0.5` |
+| `score_threshold (float)`     | Score threshold for filtering predicted boxes                                                              | `0.05` |
+| `fpn_num_channels (int)`      | Number of channels for each pyramid layer in the FPN network                                               | `256` |
+| `rpn_batch_size_per_im (int)` | Ratio of positive and negative samples per image in the RPN network                                        | `256` |
+| `rpn_fg_fraction (float)`     | Fraction of foreground samples in RPN network                                                              | `0.5` |
+| `test_pre_nms_top_n (int)`    | Number of predicted boxes to keep before NMS operation when testing. If not specified, `keep_top_k` is used. | `None` |
+| `test_post_nms_top_n (int)`   | Number of predicted boxes to keep after NMS operation at test time                                         | `1000` |
+
+## `PP-YOLO`
+
+The PP-YOLO implementation based on PaddlePaddle.
+
+| Parameter Name                   | Description                                                        | Default Value |
+|----------------------------------|--------------------------------------------------------------------| --- |
+| `num_classes (int)`              | Number of target classes                                           | `80` |
+| `backbone (str)`                 | PPYOLO's backbone network                                          | `'ResNet50_vd_dcn'` |
+| `anchors (list[list[float]])`    | Size of predefined anchor boxes                                    | `None` |
+| `anchor_masks (list[list[int]])` | Masks for predefined anchor boxes                                  | `None` |
+| `use_coord_conv (bool)`          | Whether to use coordinate convolution                              | `True` |
+| `use_iou_aware (bool)`           | Whether to use IoU awareness                                       | `True` |
+| `use_spp (bool)`                 | Whether to use spatial pyramid pooling (SPP)                       | `True` |
+| `use_drop_block (bool)`          | Whether to use DropBlock regularization                            | `True` |
+| `scale_x_y (float)`              | Parameter to scale each predicted box                              | `1.05` |
+| `ignore_threshold (float)`       | IoU threshold used to assign predicted boxes to ground truth boxes | `0.7` |
+| `label_smooth (bool)`            | Whether to use label smoothing                                     | `False` |
+| `use_iou_loss (bool)`            | Whether to use IoU Loss                                            | `True` |
+| `use_matrix_nms (bool)`          | Whether to use Matrix NMS                                          | `True` |
+| `nms_score_threshold (float)`    | NMS score threshold                                                | `0.01` |
+| `nms_topk (int)`                 | Maximum number of detections to keep before performing NMS         | `-1` |
+| `nms_keep_topk (int)`            | Maximum number of prediction boxes to keep after NMS               | `100`|
+| `nms_iou_threshold (float)`      | NMS IoU threshold                                                  | `0.45` |
+
+##  `PP-YOLO Tiny`
+
+The PP-YOLO Tiny implementation based on PaddlePaddle.
+
+| Parameter Name                   | Description                                                 | Default Value |
+|----------------------------------|-------------------------------------------------------------| --- |
+| `num_classes (int)`              | Number of target classes                                    | `80` |
+| `backbone (str)`                 | Backbone network model name to use                          | `'MobileNetV3'` |
+| `anchors (list[list[float]])`    | List of anchor box sizes                                    | `[[10, 15], [24, 36], [72, 42], [35, 87], [102, 96] , [60, 170], [220, 125], [128, 222], [264, 266]]` |
+| `anchor_masks (list[list[int]])` | Anchor box mask                                             | `[[6, 7, 8], [3, 4, 5], [0, 1, 2]]` |
+| `use_iou_aware (bool)`           | Boolean value indicating whether to use IoU-aware loss      | `False` |
+| `use_spp (bool)`                 | Boolean indicating whether to use the SPP module            | `True` |
+| `use_drop_block (bool)`          | Boolean value indicating whether to use the DropBlock block | `True` |
+| `scale_x_y (float)`              | Scaling parameter                                           | `1.05` |
+| `ignore_threshold (float)`       | Ignore threshold                                            | `0.5` |
+| `label_smooth (bool)`            | Boolean indicating whether to use label smoothing           | `False` |
+| `use_iou_loss (bool)`            | Boolean value indicating whether to use IoU Loss            | `True` |
+| `use_matrix_nms (bool)`          | Boolean indicating whether to use Matrix NMS                | `False` |
+| `nms_score_threshold (float)`    | NMS score threshold                                         | `0.005` |
+| `nms_topk (int)`                 | Number of bounding boxes to keep before NMS operation       | `1000` |
+| `nms_keep_topk (int)`            | Number of bounding boxes to keep after NMS operation        | `100` |
+| `nms_iou_threshold (float)`      | NMS IoU threshold                                           | `0.45` |
+
+## `PP-YOLOv2`
+
+The PP-YOLOv2 implementation based on PaddlePaddle.
+
+| Parameter Name                   | Description | Default Value |
+|----------------------------------| --- | --- |
+| `num_classes (int)`              | Number of target classes | `80` |
+| `backbone (str)`                 | PPYOLO's backbone network | `'ResNet50_vd_dcn'` |
+| `anchors (list[list[float]])`    | Sizes of predefined anchor boxes| `[[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]]` |
+| `anchor_masks (list[list[int]])` | Masks of predefined anchor boxes | `[[6, 7, 8], [3, 4, 5], [0, 1, 2]]` |
+| `use_iou_aware (bool)`           | Whether to use IoU awareness | `True` |
+| `use_spp (bool)`                 | Whether to use spatial pyramid pooling (SPP) | `True` |
+| `use_drop_block (bool)`          | Whether to use DropBlock regularization | `True` |
+| `scale_x_y (float)`              | Parameter to scale each predicted box | `1.05` |
+| `ignore_threshold (float)`       | IoU threshold used to assign predicted boxes to ground truth boxes | `0.7` |
+| `label_smooth (bool)`            | Whether to use label smoothing | `False` |
+| `use_iou_loss (bool)`            | Whether to use IoU Loss | `True` |
+| `use_matrix_nms (bool)`          | Whether to use Matrix NMS | `True` |
+| `nms_score_threshold (float)`    | NMS score threshold | `0.01` |
+| `nms_topk (int)`                 | Maximum number of detections to keep before performing NMS | `-1` |
+| `nms_keep_topk (int)`            | Maximum number of prediction boxes to keep after NMS | `100`|
+| `nms_iou_threshold (float)`      | NMS IoU threshold | `0.45` |
+
+## `YOLOv3`
+
+The YOLOv3 implementation based on PaddlePaddle.
+
+| Parameter Name | Description                                                                                                                 | Default Value |
+| --- |-----------------------------------------------------------------------------------------------------------------------------| --- |
+| `num_classes (int)` | Number of target classes                                                                                                    | `80` |
+| `backbone (str)` | Name of the feature extraction network                                                                                      | `'MobileNetV1'` |
+| `anchors (list[list[int]])` | Sizes of all anchor boxes                                                                                                   | `[[10, 13], [16, 30], [33, 23], [30, 61], [62, 45 ], [59, 119], [116, 90], [156, 198], [373, 326]]` |
+| `anchor_masks (list[list[int]])` | Which anchor boxes to use to predict the target box                                                                         | `[[6, 7, 8], [3, 4, 5], [0, 1, 2]]` |
+| `ignore_threshold (float)` | IoU threshold of the predicted box and the ground truth box, below which the threshold will be considered as the background | `0.7` |
+| `nms_score_threshold (float)` | In non-maximum suppression, score threshold below which boxes will be discarded                                             | `0.01` |
+| `nms_topk (int)` | In non-maximum value suppression, the maximum number of scoring boxes to keep, if it is -1, all boxes are kept              | `1000` |
+| `nms_keep_topk (int)` | In non-maximum value suppression, the maximum number of boxes to keep per image                                             | `100` |
+| `nms_iou_threshold (float)` | In non-maximum value suppression, IoU threshold, boxes larger than this threshold will be discarded                         | `0.45` |
+| `label_smooth (bool)` | Whether to use label smoothing when computing loss                                                                          | `False` |
+
+##  `BiSeNet V2`
+
+The BiSeNet V2 implementation based on PaddlePaddle.
+
+| Parameter Name          | Description | Default Value |
+|-------------------------| --- |---------------|
+| `in_channels (int)`     | Number of channels of the input image | `3`           |
+| `num_classes (int)`     | Number of target classes | `2`           |
+| `use_mixed_loss (bool)` | Whether to use mixed loss function | `False`       |
+| `losses (list)`         | List of loss functions | `{}`          |
+| `align_corners (bool)`  | Whether to use the corner alignment method  | `False`       |
+
+##  `DeepLab V3+`
+
+The DeepLab V3+ implementation based on PaddlePaddle.
+
+| Parameter Name             | Description                                                                    | Default Value |
+|----------------------------|--------------------------------------------------------------------------------| --- |
+| `in_channels (int)`        | Number of channels of the input image                                          | `3` |
+| `num_classes (int)`        | Number of target classes                                                       | `2` |
+| `backbone (str)`           | Backbone network type of neural network                                        | `ResNet50_vd` |
+| `use_mixed_loss (bool)`    | Whether to use mixed loss function                                             | `False` |
+| `losses (list)`            | List of loss functions                                                         | `None` |
+| `output_stride (int)`      | Downsampling ratio of the output feature map relative to the input feature map | `8` |
+| `backbone_indices (tuple)` | Output the location indices of different stages of the backbone network        | `(0, 3)` |
+| `aspp_ratios (tuple)`      | Dilation ratio of dilated convolution                                          | `(1, 12, 24, 36)` |
+| `aspp_out_channels (int)`  | Number of ASPP module output channels                                          | `256` |
+| `align_corners (bool)`     | Whether to use the corner alignment method                                     | `False` |
+
+
+## `FactSeg`
+
+The FactSeg implementation based on PaddlePaddle.
+
+The original article refers to  A. Ma, J. Wang, Y. Zhong and Z. Zheng, "FactSeg: Foreground Activation -Driven Small Object Semantic Segmentation in Large-Scale Remote Sensing Imagery,"in IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1-16, 2022, Art no. 5606216.
+
+| Parameter Name          | Description                                                                                                      | Default Value |
+|-------------------------|------------------------------------------------------------------------------------------------------------------| --- |
+| `in_channels (int)`     | Number of channels of the input image                                                                                   | `3` |
+| `num_classes (int)`     | Number of target classes                                                                  | `2` |
+| `use_mixed_loss (bool)` | Whether to use mixed loss function                                                                                      | `False` |
+| `losses (list)`         | List of loss functions                                                                                | `None` |
+
+## `FarSeg`
+
+The FarSeg implementation based on PaddlePaddle.
+
+The original article refers to  Zheng Z, Zhong Y, Wang J, et al. Foreground-aware relation network for geospatial object segmentation in high spatial resolution remote sensing imagery[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 4096-4105.
+
+| Parameter Name          | Description                                                                                                     | Default Value |
+|-------------------------|-----------------------------------------------------------------------------------------------------------------| --- |
+| `in_channels (int)`     | Number of channels of the input image                                                                           | `3` |
+| `num_classes (int)`     | Number of target classes                                                                                                                | `2` |
+| `use_mixed_loss (bool)` | Whether to use mixed loss function                                                                                      | `False` |
+| `losses (list)`         | List of loss functions                                                                               | `None` |
+
+##  `Fast-SCNN`
+
+The Fast-SCNN implementation based on PaddlePaddle.
+
+| Parameter Name          | Description                                    | Default Value        |
+|-------------------------|------------------------------------------------|----------------------|
+| `in_channels (int)`     | Number of channels of the input image          | `3`                  |
+| `num_classes (int)`     | Number of target classes                       | `2`                  |
+| `use_mixed_loss (bool)` | Whether to use mixed loss function             | `False`              |
+| `losses (list)`         | List of loss functions                         | `None`               |
+| `align_corners (bool)`  | Whether to use the corner alignment method     | `False`              |
+
+
+##  `HRNet`
+
+The HRNet implementation based on PaddlePaddle.
+
+| Parameter Name          | Description                                                                                                      | Default Value |
+|-------------------------|------------------------------------------------------------------------------------------------------------------| --- |
+| `in_channels (int)`     | Number of channels of the input image                                                                                 | `3` |
+| `num_classes (int)`     | Number of target classes                                                                  | `2` |
+| `width (int)`           | Initial number of channels for the network                                                                       | `48` |
+| `use_mixed_loss (bool)` | Whether to use mixed loss function                                                                               | `False` |
+| `losses (list)`         | List of loss functions                                                                                     | `None` |
+| `align_corners (bool)`  | Whether to use the corner alignment method                                                                       | `False` |

+ 239 - 0
docs/intro/transforms_cons_params_cn.md

@@ -0,0 +1,239 @@
+# PaddleRS数据变换算子构造参数
+
+本文档详细介绍了PaddleRS各个数据变化算子的构造参数,包括算子名称、算子用途、各个算子的参数名称、参数类型、参数意义以及参数默认值。
+
+PaddleRS所支持的数据变换算子可见(https://github.com/PaddlePaddle/PaddleRS/blob/develop/docs/intro/transforms.md)
+
+## `AppendIndex`
+
+计算遥感指数并添加到输入影像中。
+
+| 参数名             | 描述                                                                                                                      | 默认值  |
+|-----------------|-------------------------------------------------------------------------------------------------------------------------|------|
+|`index_type (str)`| 遥感索引类型。受支持的索引类型 (https://github.com/PaddlePaddle/PaddleRS/tree/develop/paddlers/transforms/indices.py)。                 |      |
+|`band_indexes (dict,可选)`| 波段名称到波段索引的映射(从1开始) (https://github.com/PaddlePaddle/PaddleRS/tree/develop/paddlers/transforms/indices.py)。              | `None` |
+|`satellite (str,可选)`| 卫星类型。设置后,将自动确定相应的带指数。请参阅支援卫星 (https://github.com/PaddlePaddle/PaddleRS/tree/develop/paddlers/transforms/satellites.py)。 | `None` |
+
+## `CenterCrop`
+
++ 对输入影像进行中心裁剪。
+    - 1. 定位图像的中心。
+    - 2. 裁剪图像。
+
+| 参数名             | 描述                                                                                                       | 默认值  |
+|-----------------|----------------------------------------------------------------------------------------------------------|------|
+|`crop_size (int,可选)`| 裁剪图像的目标大小  | `224`  |
+
+## `Dehaze`
+
+对输入图像进行去雾。
+
+| 参数名             | 描述            | 默认值   |
+|-----------------|---------------|-------|
+|`gamma (bool,可选)`| 是否使用 gamma 校正 | `False` |
+
+## `MatchRadiance`
+
+对两个时相的输入影像进行相对辐射校正
+
+| 参数名             | 描述                                     | 默认值   |
+|-----------------|-----------------------------------------------------|-------|
+|`method (str,可选)`| 用于匹配双时间图像亮度的方法。选项有{`'hist'`, `'lsr'`, `'fft`}。`'hist'`代表直方图匹配,`'lsr'`代表最小二乘回归,`'fft'`替换图像的低频分量以匹配参考图像  | `'hist'` |
+
+## `MixupImage`
+
+将两幅影像(及对应的目标检测标注)混合在一起作为新的样本。
+
+| 参数名             | 描述                | 默认值 |
+|-----------------|-------------------|-----|
+|`alpha (float,可选)`| beta 分布的 alpha 参数 | `1.5` |
+|`beta (float,可选)` | beta 分布的 beta 参数  | `1.5` |
+
+## `Normalize`
+
+对输入影像应用标准化
+
++ 对输入图像应用归一化。归一化步骤如下:
+  - 1. Im = (Im - min_value) * 1 / (max_value - min_value)
+  - 2. Im = Im - mean
+  - 3. Im = Im / STD
+
+| 参数名                 | 描述                               | 默认值                          |
+|---------------------|----------------------------------|------------------------------|
+| `mean (list[float] \| tuple[float],可选)`    | 输入图像的均值                          | `[0.485,0.456,0.406]` |
+| `std (list[float] \| tuple[float],可选)`     | 输入图像的标准差                         | `[0.229,0.224,0.225]` |
+| `min_val (list[float] \| tuple[float],可选)` | 输入图像的最小值。如果为`None`,则对所有通道使用`0`   |    `None`      |
+| `max_val (list[float] \| tuple[float],可选)` | 输入图像的最大值。如果为`None`,则所有通道均使用`255` |  `None`        |
+| `apply_to_tar (bool,可选)` | 是否对目标图像应用数据变换算子                  | `True`                         |
+
+## `Pad`
+
+将输入影像填充到指定的大小
+
+| 参数名                      | 描述                                                         | 默认值                |
+|--------------------------| ------------------------------------------------------------ | --------------------- |
+| `target_size (list[int] \| tuple[int],可选)`    | 图像目标大小                                                 | `None`                |
+| `pad_mode (int,可选)` | 填充模式。目前只支持四种模式:[-1,0,1,2]。如果是`-1`,使用指定的偏移量。若为`0`,只向右和底部填充;若为`1`,按中心填充。如果`2`,只填充左侧和顶部 | `0`                   |
+| `offset (list[int] \| None,可选)`                | 填充偏移量                                                   | `None`                |
+| `im_padding_value (list[float] \| tuple[float])` | 填充区域的 RGB 值                                            | `(127.5,127.5,127.5)` |
+| `label_padding_value (int,可选)` | 掩码的填充值                                                 | `255`                 |
+| `size_divisor (int)`     | 填充后的图像宽度和高度将是`'size_divisor'`的倍数             |                       |
+
+## `RandomBlur`
+
+对输入施加随机模糊
+
+| 参数名             | 描述                                     | 默认值  |
+|-----------------|-----------------------------------------------------|------|
+|`prob (float)`| 模糊的概率 |      |
+
+## `RandomCrop`
+
+对输入影像进行随机中心裁剪
+
++ 随机裁剪输入。
+
+  1. 根据' aspect_ratio '和' scaling '计算裁剪区域的高度和宽度。
+  - 2. 随机定位裁剪区域的左上角。
+  - 3. 裁剪图像。
+  - 4. 调整裁剪区域的大小为' crop_size ' x ' crop_size '。
+
+| 参数名              | 描述         | 默认值                     |
+|------------------|------------|-------------------------|
+| `crop_size (int \| list[int] \| tuple[int])` | 裁剪区域的目标大小。如果为`None`,裁剪区域将不会被调整大小 | `None`                    |
+| `aspect_ratio (list[float],可选)` | 以[min, max]格式显示裁剪区域的纵横比 | `[.5, 2.]`                |
+| `thresholds (list[float],可选)` | IoU 阈值,用于决定有效的 bbox 裁剪 | `[.0,.1, .3, .5, .7, .9]` |
+| `scaling (list[float], 可选)` | 裁剪区域与原始图像之间的比例,格式为[min, max] | `[.3, 1.]`                |
+| `num_attempts (int,可选)` | 放弃前的最大尝试次数 | `50`                      |
+| `allow_no_crop (bool,可选)` | 是否允许不进行裁剪而返回 | `True`                    |
+| `cover_all_box (bool,可选)` | 是否强制覆盖整个目标框 | `False`                   |
+
+## `RandomDistort`
+
+| 参数名                       | 描述                          | 默认值   |
+|---------------------------|-----------------------------|-------|
+| `brightness_range (float,可选)` | 亮度失真范围                      | `.5`    |
+| `brightness_prob (float,可选)` | 亮度失真的概率                     | `.5`    |
+| `contrast_range (float, 可选)` | 对比度失真范围                     | `.5`    |
+| `contrast_prob (float, 可选)` | 对比度失真的概率                    | `.5`    |
+| `saturation_range (float,可选)` | 饱和失真范围                      | `.5`    |
+| `saturation_prob (float,可选)` | 饱和失真的概率                    | `.5`    |
+| `hue_range (float,可选)` | 色调失真范围                      | `.5`    |
+| `hue_prob (float,可选)`| 色相失真的概率                     | `.5`    |
+| `random_apply (bool,可选)` | 以随机( Yolo )或固定( SSD )顺序应用转换 | `True`  |
+| `count (int,可选)`  | 用于控制扭曲次数          | `4`     |
+| `shuffle_channel (bool,可选)` | 是否随机交换通道                    | `False` |
+
+## `RandomExpand`
+
+根据随机偏移扩展输入影像。
+
+| 参数名                             | 描述           | 默认值                 |
+|---------------------------------|--------------|---------------------|
+| `upper_ratio (float,可选)`        | 原始图像扩展到的最大比例 | `4`                   |
+| `prob (float,可选)`               | 应用扩展的概率      | `.5`                  |
+| `im_padding_value (list[float] \| tuple[float],可选)` | 图像的 RGB 填充值  | `(127.5,127.5,127.5)` |
+| `label_padding_value (int,可选)`  | 掩码的填充值       | `255`    |
+
+## `RandomHorizontalFlip`
+
+随机水平翻转输入影像。
+
+| 参数名                                              | 描述        | 默认值                 |
+|--------------------------------------------------|-----------|---------------------|
+| `prob (float,可选)`                           | 翻转输入的概率   | `.5`                  |
+
+## `RandomResize`
+
+随机调整输入影像大小。
+
+| 参数名                       | 描述                                                         | 默认值     |
+|---------------------------| ------------------------------------------------------------ | ---------- |
+| `Target_sizes (list[int] \| list[list\|tuple] \| tuple[list \| tuple])` | 多个目标大小,每个目标大小应该是`int`、`list`或`tuple`       |            |
+| `interp (str,可选)`         | 调整图像大小的插值方法。{`'NEAREST'`, `'LINEAR'`, `'CUBIC'`, `'AREA'`, `'LANCZOS4'`, `'RANDOM'`}之一 | `'LINEAR'` |
+
+## `RandomResizeByShort`
+
+随机调整输入影像大小,保持纵横比不变(根据短边计算缩放系数)。
+
+| 参数名                       | 描述        | 默认值   |
+|---------------------------|-----------|-------|
+| `short_sizes (list[int])` | 图像较短一侧的目标大小|       |
+| `max_size (int,可选)`       |图像长边的上界。如果`'max_size'`为`-1`,则不应用上限   | `-1`  |
+| `interp (str,可选)`         | 调整图像大小的插值方法。{`'NEAREST'`, `'LINEAR'`, `'CUBIC'`, `'AREA'`, `'LANCZOS4'`, `'RANDOM'`}之一         | `'LINEAR'` |
+
+## `RandomScaleAspect`
+
+裁剪输入影像并重新缩放到原始尺寸。
+
+| 参数名                                                               | 描述        | 默认值    |
+|-------------------------------------------------------------------|-----------|--------|
+| `min_scale (float)`| 裁剪区域与原始图像之间的最小比例。如果为`0`,图像将不会被裁剪| `0`     |
+| `aspect_ratio (float)`    |裁剪区域的纵横比  | `.33`    |
+
+## `RandomSwap`
+
+随机交换两个时相的输入影像。
+
+| 参数名                                                               | 描述        | 默认值 |
+|-------------------------------------------------------------------|-----------|-----|
+|`prob (float,可选)`| 交换输入图像的概率 | `0.2` |
+
+## `RandomVerticalFlip`
+
+随机竖直翻转输入影像。
+
+| 参数名                                                              | 描述        | 默认值 |
+|------------------------------------------------------------------|-----------|-----|
+|`prob (float,可选)`| 翻转输入的概率| `.5`  |
+
+## `ReduceDim`
+
+对输入图像进行波段降维。
+
+| 参数名                                                               | 描述             | 默认值  |
+|-------------------------------------------------------------------|----------------|------|
+|`joblib_path (str)`| *.joblib 文件的路径 |      |
+|`apply_to_tar (bool,可选)` | 是否对目标图像应用数据变换算子 | `True` |
+
+## `Resize`
+
+调整输入影像大小。
+
+    -如果' target_size '是int,将图像大小调整为(' target_size ', ' target_size ')`。
+    -如果' target_size '是一个列表或元组,将图像大小调整为' target_size '。
+    注意:如果' interp '为'RANDOM',则插值方法将随机选择。
+
+| 参数名                | 描述         | 默认值      |
+|--------------------|------------|----------|
+| `target_size (int \| list[int] \| tuple[int])` |目标大小。如果它是一个整数,目标高度和宽度都将被设置为`'target_size'`。否则,`'target_size'`表示[目标高度,目标宽度]|          |
+| `interp (str,可选)`  | 调整图像大小的插值方法。{`'NEAREST'`, `'LINEAR'`, `'CUBIC'`, `'AREA'`, `'LANCZOS4'`, `'RANDOM'`}之一 | `'LINEAR'` |
+| `keep_ratio (bool,可选)` | 如果为`True`,宽度和高度的比例因子将被设置为相同的值,调整图像的高度/宽度将不大于目标宽度/高度 | `False`    |
+
+## `ResizeByLong`
+
+调整输入影像大小,保持纵横比不变(根据长边计算缩放系数)。
+
+| 参数名                                        | 描述        | 默认值      |
+|--------------------------------------------|-----------|----------|
+| `long_size (int)`|图像较长一侧的目标大小|          |
+| `interp (str,可选)`                    | 调整图像大小的插值方法。{`'NEAREST'`, `'LINEAR'`, `'CUBIC'`, `'AREA'`, `'LANCZOS4'`, `'RANDOM'`}之一   | `'LINEAR'` |
+
+## `ResizeByShort`
+
+调整输入影像大小,保持纵横比不变(根据短边计算缩放系数)。
+
+| 参数名                   | 描述        | 默认值      |
+|-----------------------|-----------|----------|
+| `short_size (int)`    |图像较短一侧的目标大小|          |
+| `mamax_size (int,可选)` | 图像长边的上界。如果`'max_size'`为`-1`,则不应用上限  | `-1`       |
+| `interp (str,可选)`      | 调整图像大小的插值方法。{`'NEAREST'`, `'LINEAR'`, `'CUBIC'`, `'AREA'`, `'LANCZOS4'`, `'RANDOM'`}之一 | `'LINEAR'` |
+
+## `SelectBand`
+
+对输入影像进行波段选择。
+
+| 参数名              | 描述               | 默认值      |
+|------------------|------------------|----------|
+| `band_list (list,可选)` | 要选择的波段(波段索引从1开始) | `[1,2,3]`  |
+| `apply_to_tar (bool,可选)`| 是否将转换应用到目标图像     | `True`     |

+ 254 - 0
docs/intro/transforms_cons_params_en.md

@@ -0,0 +1,254 @@
+# PaddleRS Data Transformation Operator Construction Parameters
+
+This document describes the parameters of each PaddleRS data transformation operator in detail, including the operator name, operator purpose, parameter name, parameter type, parameter meaning, and parameter default value of each operator.
+
+## `AppendIndex`
+
+Append remote sensing index to input image(s).
+
+| Parameter Name             | Description                                                                                                                                        | Default Value       |
+|-----------------|----------------------------------------------------------------------------------------------------------------------------------------------------|-----------|
+|`index_type (str)`| Type of remote sensinng index. See supported index types in https://github.com/PaddlePaddle/PaddleRS/tree/develop/paddlers/transforms/indices.py . |           |
+|`band_indexes (dict,optional)`|Mapping of band names to band indices (starting from 1)`. See band names in  https://github.com/PaddlePaddle/PaddleRS/tree/develop/paddlers/transforms/indices.py。                                           | `None`      |
+|`satellite (str,optional)`|Type of satellite. If set, band indices will be automatically determined accordingly. See supported satellites in https://github.com/PaddlePaddle/PaddleRS/tree/develop/paddlers/transforms/satellites.py。                               | `None`      |
+
+
+## `CenterCrop`
+
++ Crop the input image(s) at the center.
+  - 1. Locate the center of the image.
+  - 2. Crop the image.
+
+
+| Parameter Name             | Description                                                                                                       | Default Value  |
+|-----------------|----------------------------------------------------------------------------------------------------------|------|
+|`crop_size (int, optional)`| Target size of the cropped image(s)  | `224`  |
+
+## `Dehaze`
+
+ Dehaze input image(s)
+
+
+| Parameter Name             | Description                                   | Default Value   |
+|-----------------|---------------------------------------------------|-------|
+|`gamma (bool,optional)`| Use gamma correction or not  | `False` |
+
+## `MatchRadiance`
+
+Perform relative radiometric correction between bi-temporal images.
+
+| Parameter Name             | Description                                                                                                                                                                                                                                                                 | Default Value |
+|-----------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------|
+|`method (str,optional)`| Method used to match the radiance of the bi-temporal images. Choices are {`'hist'`, `'lsr'`, `'fft'`}. `'hist'` stands for histogram matching, `'lsr'` stands for least-squares regression, and `'fft'` replaces the low-frequency components of the image to match the reference image. | `'hist'` |
+
+
+## `MixupImage`
+
+Mixup two images and their gt_bbbox/gt_score.
+
+| Parameter Name             | Description                                     | Default Value |
+|-----------------|-----------------------------------------------------|-----|
+|`alpha (float,optional)`| Alpha parameter of beta distribution. | `1.5` |
+|`beta (float,optional)` |Beta parameter of beta distribution. | `1.5` |
+
+## `Normalize`
+
++ Apply normalization to the input image(s). The normalization steps are:
+
+  - 1. im = (im - min_value) * 1 / (max_value - min_value)
+  - 2. im = im - mean
+  - 3. im = im / std
+
+
+| Parameter Name      | Description                                                              | Default Value                          |
+|---------------------|--------------------------------------------------------------------------|------------------------------|
+| `mean (list[float] \| tuple[float],optional)`  | Mean of input image(s)                                                   | `[0.485,0.456,0.406]` |
+| `std (list[float] \| tuple[float],optional)`   | Standard deviation of input image(s)                                     | `[0.229,0.224,0.225]` |
+| `min_val (list[float] \| tuple[float],optional)` | Inimum value of input image(s). If `None`, use `0` for all channels.     |    `None`      |
+| `max_val (list[float] \| tuple[float],optional)` | Maximum value of input image(s). If `None`, use `255`. for all channels. |  `None`        |
+| `apply_to_tar (bool,optional)` \| Whether to apply transformation to the target image                      | `True`                         |
+
+## `Pad`
+
+Pad image to a specified size or multiple of `size_divisor`.
+
+| Parameter Name           | Description                                                                                                                                                                                                          | Default Value              |
+|--------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------|
+| `target_size (list[int] \| tuple[int],optional)`     | Image target size, if `None`, pad to multiple of size_divisor.                                                                                                                                                         | `None`               |
+| `pad_mode (int,optional)` | Currently only four modes are supported:[-1, 0, 1, 2]. if `-1`, use specified offsets. If `0`, only pad to right and bottom If `1`, pad according to center. If `2`, only pad left and top.   | `0`                  |
+| `offset (list[int] \| None,optional)`                |  Padding offsets.                                                                                                                                                                                                              | `None`               |
+| `im_padding_value (list[float] \| tuple[float])` | RGB value of padded area.                                                                                                                                                                                                        | `(127.5,127.5,127.5)` |
+| `label_padding_value (int,optional)` |Filling value for the mask.                                                                                                                                                                                                              | `255`                  |
+| `size_divisor (int)`     | Image width and height after padding will be a multiple of `size_divisor`.                                                                                                                                                                       |                      |
+
+## `RandomBlur`
+
+Randomly blur input image(s).
+
+| Parameter Name             | Description                                     | Default Value  |
+|-----------------|-----------------------------------------------------|------|
+|`probb (float)`|Probability of blurring. |      |
+
+## `RandomCrop`
+
++ Randomly crop the input.
+
+  - 1. Compute the height and width of cropped area according to `aspect_ratio` and
+          `scaling`.
+  - 2. Locate the upper left corner of cropped area randomly.
+  - 3. Crop the image(s).
+  - 4. Resize the cropped area to `crop_size` x `crop_size`.
+
+| Parameter Name   | Description                                                                   | Default Value                     |
+|------------------|-------------------------------------------------------------------------------|-------------------------|
+| `crop_size (int \| list[int] \| tuple[int])` | Target size of the cropped area. If `None`, the cropped area will not be resized. | `None`                    |
+| `aspect_ratio (list[float],optional)` | Aspect ratio of cropped region in [min, max] format.                          | `[.5, 2.]`                |
+| `thresholds (list[float],optional)` | IoU thresholds to decide a valid bbox crop.                                   | `[.0,.1, .3, .5, .7, .9]` |
+| `scaling (list[float], optional)` | Ratio between the cropped region and the original image in [min, max] format. | `[.3, 1.]`                |
+| `num_attempts (int,optional)` | Max number of tries before giving up.                                         | `50`                      |
+| `allow_no_crop (bool,optional)` | Whether returning without doing crop is allowed.                              | `True`                    |
+| `cover_all_box (bool,optional)` | Whether to force to cover the entire target box.                              | `False`                   |
+
+## `RandomDistort`
+
+Random color distortion.
+
+| Parameter Name                       | Description                                                     | Default Value   |
+|----------------------------|-----------------------------------------------------------------|-------|
+| `brightness_range (float,optional)` | Range of brightness distortion.                                 | `.5`    |
+| `brightness_prob (float,optional)` | Probability of brightness distortion.                           | `.5`    |
+| `contrast_range (float, optional)` | Range of contrast distortion.                                   | `.5`    |
+| `contrast_prob (float, optional)` | Probability of contrast distortion.                             | `.5`    |
+| `saturation_range (float,optional)` | Range of saturation distortion.                                 | `.5`    |
+| `saturation_prob (float,optional)` | Probability of saturation distortion.                           | `.5`    |
+| `hue_range (float,optional)` | Range of hue distortion.                                        | `.5`    |
+| `hue_probb (float,optional)`| Probability of hue distortion.                                  | `.5`    |
+| `random_apply (bool,optional)` | Apply the transformation in random (yolo) or fixed (SSD) order. | `True`  |
+| `count (int,optional)`  | Count used to control the distortion                | `4`     |
+| `shuffle_channel (bool,optional)` | Whether to swap channels randomly.                                           | `False` |
+
+
+## `RandomExpand`
+
+Randomly expand the input by padding according to random offsets.
+
+| Parameter Name                  | Description                                    | Default Value                 |
+|---------------------------------|----------------------------------------------------|---------------------|
+| `upper_ratio (float,optional)`  | Maximum ratio to which the original image is expanded. | `4`                   |
+| `probb (float,optional)`        |Probability of apply expanding. | `.5`                  |
+| `im_padding_value (list[float] \| tuple[float],optional)` |  RGB filling value for the image  | `(127.5,127.5,127.5)` |
+| `label_padding_value (int,optional)` | Filling value for the mask.  | `255`    |
+
+## `RandomHorizontalFlip`
+
+Randomly flip the input horizontally.
+
+| Parameter Name                                              | Description        | Default Value                |
+|--------------------------------------------------|-----------|---------------------|
+| `probb (float,optional)`                           | Probability of flipping the input   | `.5`                  |
+
+## `RandomResize`
+
+Resize input to random sizes.
+
++ Attention: If `interp` is 'RANDOM', the interpolation method will be chosen randomly.
+
+| Parameter Name            | Description                                                          | Default Value                 |
+|---------------------------|----------------------------------------------------------------------|---------------------|
+| `Target_sizes (list[int] \| list[list \| tuple] \| tuple [list \| tuple])` | Multiple target sizes, each of which should be int, list, or tuple.  | `.5`                  |
+| `interp (str,optional)`   | Interpolation method for resizing image(s). One of {`'NEAREST'`, `'LINEAR'`, `'CUBIC'`, `'AREA'`, `'LANCZOS4'`, `'RANDOM'`}. |   `'LINEAR'`                  ||
+
+
+## `RandomResizeByShort`
+
+Resize input to random sizes while keeping the aspect ratio.
+
++ Attention: If `interp` is 'RANDOM', the interpolation method will be chosen randomly.
+
+| Parameter Name     | Description        | Default Value |
+|--------------------|-----------|-----|
+| `short_sizes (int \| list[int])` | Target size of the shorter side of the image(s).| `.5`  |
+| `max_size (int,optional)` |Upper bound of longer side of the image(s). If `max_size` is -1, no upper bound will be applied.    | `-1`  |
+| `interp (str,optional)` |  Interpolation method for resizing image(s). One of {'`NEAREST'`, `'LINEAR'`, `'CUBIC'`, `'AREA'`, `'LANCZOS4'`, `'RANDOM'`}.  | `'LINEAR'`    |
+
+## `RandomScaleAspect`
+
+Crop input image(s) and resize back to original sizes.
+
+
+| Parameter Name                                                               | Description                                                                                          | Default Value    |
+|-------------------------------------------------------------------|------------------------------------------------------------------------------------------------------|--------|
+| `min_scale (float)`| Minimum ratio between the cropped region and the original image. If `0`, image(s) will not be cropped. | `0`      |
+| `aspect_ratio (float)`    | Aspect ratio of cropped region.                                                                                 | `.33`    |
+
+## `RandomSwap`
+
+Randomly swap multi-temporal images.
+
+
+| Parameter Name                                                               | Description        | Default Value |
+|-------------------------------------------------------------------|-----------|-----|
+|`probb (float,optional)`| Probability of swapping the input images.| `0.2` |
+
+## `RandomVerticalFlip`
+Randomly flip the input vertically.
+
+
+| Parameter Name                                                              | Description        | Default Value |
+|------------------------------------------------------------------|-----------|-----|
+|`prob (float,optional)`| Probability of flipping the input| `.5`  |
+
+
+## `ReduceDim`
+Use PCA to reduce the dimension of input image(s).
+
+| Parameter Name                                                               | Description                                          | Default Value  |
+|-------------------------------------------------------------------|------------------------------------------------------|------|
+|`joblib_path (str)`| Path of *.joblib file of PCA                         |      |
+|`apply_to_tar (bool,optional)` | Whether to apply transformation to the target image. | `True` |
+
+
+## `Resize`
+Resize input.
+
+    - If `target_size` is an int, resize the image(s) to (`target_size`, `target_size`).
+    - If `target_size` is a list or tuple, resize the image(s) to `target_size`.
+    Attention: If `interp` is 'RANDOM', the interpolation method will be chosen randomly.
+
+| Parameter Name     | Description                                                                                                                                                          | Default Value      |
+|--------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|
+| `target_size (int \| list[int] \| tuple[int])` | Target size. If it is an integer, the target height and width will be both set to `target_size`. Otherwise,  `target_size` represents [target height, target width]. |          |
+| `interp (str,optional)` | Interpolation method for resizing image(s). One of {`'NEAREST'`, `'LINEAR'`, `'CUBIC'`, `'AREA'`, `'LANCZOS4'`, `'RANDOM'`}.                                         | `'LINEAR'` |
+| `keep_ratio (bool,optional)` | If `True`, the scaling factor of width and height will be set to same value, and height/width of the resized image will be not  greater than the target width/height. | `False`    |
+
+## `ResizeByLong`
+Resize the input image, keeping the aspect ratio unchanged (calculate the scaling factor based on the long side).
+
+    Attention: If `interp` is 'RANDOM', the interpolation method will be chosen randomly.
+
+
+| Parameter Name                                        | Description        | Default Value      |
+|--------------------------------------------|-----------|----------|
+| `long_size (int)`|The size of the target on the longer side of the image.|          |
+| `interp (str,optional)`                    | Interpolation method for resizing image(s). One of {`'NEAREST'`, `'LINEAR'`, `'CUBIC'`, `'AREA'`, `'LANCZOS4'`, `'RANDOM'`}.  | `'LINEAR'` |
+
+## `ResizeByShort`
+Resize input while keeping the aspect ratio.
+
+    Attention: If `interp` is 'RANDOM', the interpolation method will be chosen randomly.
+
+
+| Parameter Name              | Description                                                                                      | Default Value      |
+|------------------|--------------------------------------------------------------------------------------------------|----------|
+| `short_size (int)` | Target size of the shorter side of the image(s).                                                 |          |
+| `mamax_size (int,optional)` | Upper bound of longer side of the image(s). If `max_size` is -1, no upper bound will be applied. | `-1`       |
+| `interp (str,optional)`  | Interpolation method for resizing image(s). One of {`'NEAREST'`, `'LINEAR'`, `'CUBIC'`, `'AREA'`, `'LANCZOS4'`, `'RANDOM'`}.          | `'LINEAR'` |
+
+
+## `SelectBand`
+Select a set of bands of input image(s).
+
+| Parameter Name              | Description                                          | Default Value      |
+|------------------|------------------------------------------------------|----------|
+| `band_list (list,optional)` | Bands to select (band index starts from 1).          | `[1,2,3]`  |
+| `apply_to_tar (bool,optional)`| Whether to apply transformation to the target image. | `True`     |

+ 46 - 0
docs/quick_start.md

@@ -0,0 +1,46 @@
+# 快速开始
+
+## 环境准备
+
+环境准备可参考:[使用教程——训练模型](../tutorials/train/README.md)
+
+## 模型训练
+
++ 在安装完成PaddleRS后,即可开始模型训练。
++ 模型训练可参考:[使用教程——训练模型](../tutorials/train/README.md)
+
+## 模型精度验证
+
+模型训练完成后,需要对模型进行精度验证,以确保模型的预测效果符合预期。以DeepLab V3+图像分割模型为例,可以使用以下命令启动:
+
+```python
+import paddlex as pdx
+
+# 加载模型
+model = pdx.load_model('output/deeplabv3p/best_model')
+
+# 加载验证集
+dataset = pdx.datasets.SegDataset(
+    data_dir='dataset/val',
+    file_list='dataset/val/list.txt',
+    label_list='dataset/labels.txt',
+    transforms=model.eval_transforms)
+
+# 进行验证
+result = model.evaluate(dataset, batch_size=1, epoch_id=None, return_details=True)
+
+print(result)
+```
+
+在上述代码中,`pdx.load_model()`方法用于加载预训练的DeepLabV3P模型,`pdx.datasets.SegDataset()`方法用于加载验证集数据。`model.evaluate()`方法接受验证集数据集、批大小和轮数等参数,并返回包括预测结果和指标评估在内的验证结果。最后,我们可以打印输出验证结果。
+
+
+## 模型部署
+
+### 模型导出
+
+模型导出可参考:[部署模型导出](../deploy/export/README.md)
+
+### Python部署
+
+Python部署可参考:[Python部署](../deploy/README.md)

+ 24 - 0
paddlers/rs_models/clas/condensenetv2.py

@@ -287,6 +287,30 @@ class _Transition(nn.Layer):
 
 
 class CondenseNetV2(nn.Layer):
+    """
+    The CondenseNetV2 implementation based on PaddlePaddle.
+
+    The original article refers to
+        Yang L, Jiang H, Cai R, et al. "Condensenet v2: Sparse feature reactivation for deep networks"
+            (https://arxiv.org/abs/2104.04382)
+
+    Args:
+        stages (list[int]): Lists the number of stages containing Dense blocks.
+        growth (list[int]): Contains a list of the output channels of the convolutional layer in the Dense Block.
+        HS_start_block (int): Which Dense Block starts with the initial bangs (Hard-Swish) activation function.
+        SE_start_block (int): Which Dense Block to start with is the Squeeze-and-Excitation (SE) module.
+        fc_channel (int): Indicates the number of output channels of the full connection layer.
+        group_1x1 (int): Indicates the number of groups in the 1x1 convolution layer.
+        group_3x3 (int): Number of groups of 3x3 convolution layers.
+        group_trans (int): The number of groups of 1x1 convolution layers in the Transition Layer.
+        bottleneck (bool): Specifies whether to use a bottleneck structure in the Dense Block, which means that a 1x1
+            convolution layer is used to reduce the number of input channels, and then 3x3 convolution is done.
+        last_se_reduction (int): Indicates the proportion of channel reduction in SE module in the last Dense Block.
+        in_channels (int): Indicates the number of channels to input the image. The default value is 3, which represents
+            an RGB image.
+        class_num (int) : Indicates the number of categories of a class task.
+    """
+
     def __init__(
             self,
             stages,

+ 71 - 70
paddlers/transforms/operators.py

@@ -116,7 +116,7 @@ class Compose(object):
 
     def __call__(self, sample):
         """
-        This is equivalent to sequentially calling compose_obj.apply_transforms() 
+        This is equivalent to sequentially calling compose_obj.apply_transforms()
             and compose_obj.arrange_outputs().
         """
         if 'trans_info' not in sample:
@@ -193,20 +193,20 @@ class Transform(object):
 class DecodeImg(Transform):
     """
     Decode image(s) in input.
-    
+
     Args:
-        to_rgb (bool, optional): If True, convert input image(s) from BGR format to 
+        to_rgb (bool, optional): If True, convert input image(s) from BGR format to
             RGB format. Defaults to True.
-        to_uint8 (bool, optional): If True, quantize and convert decoded image(s) to 
+        to_uint8 (bool, optional): If True, quantize and convert decoded image(s) to
             uint8 type. Defaults to True.
-        decode_bgr (bool, optional): If True, automatically interpret a non-geo image 
+        decode_bgr (bool, optional): If True, automatically interpret a non-geo image
             (e.g., jpeg images) as a BGR image. Defaults to True.
-        decode_sar (bool, optional): If True, automatically interpret a single-channel 
-            geo image (e.g. geotiff images) as a SAR image, set this argument to 
+        decode_sar (bool, optional): If True, automatically interpret a single-channel
+            geo image (e.g. geotiff images) as a SAR image, set this argument to
             True. Defaults to True.
-        read_geo_info (bool, optional): If True, read geographical information from 
+        read_geo_info (bool, optional): If True, read geographical information from
             the image. Deafults to False.
-        use_stretch (bool, optional): Whether to apply 2% linear stretch. Valid only if 
+        use_stretch (bool, optional): Whether to apply 2% linear stretch. Valid only if
             `to_uint8` is True. Defaults to False.
     """
 
@@ -391,13 +391,13 @@ class Resize(Transform):
 
     Args:
         target_size (int | list[int] | tuple[int]): Target size. If it is an integer, the
-            target height and width will be both set to `target_size`. Otherwise, 
+            target height and width will be both set to `target_size`. Otherwise,
             `target_size` represents [target height, target width].
-        interp (str, optional): Interpolation method for resizing image(s). One of 
-            {'NEAREST', 'LINEAR', 'CUBIC', 'AREA', 'LANCZOS4', 'RANDOM'}. 
+        interp (str, optional): Interpolation method for resizing image(s). One of
+            {'NEAREST', 'LINEAR', 'CUBIC', 'AREA', 'LANCZOS4', 'RANDOM'}.
             Defaults to 'LINEAR'.
-        keep_ratio (bool, optional): If True, the scaling factor of width and height will 
-            be set to same value, and height/width of the resized image will be not 
+        keep_ratio (bool, optional): If True, the scaling factor of width and height will
+            be set to same value, and height/width of the resized image will be not
             greater than the target width/height. Defaults to False.
 
     Raises:
@@ -526,8 +526,8 @@ class RandomResize(Transform):
     Args:
         target_sizes (list[int] | list[list|tuple] | tuple[list|tuple]):
             Multiple target sizes, each of which should be int, list, or tuple.
-        interp (str, optional): Interpolation method for resizing image(s). One of 
-            {'NEAREST', 'LINEAR', 'CUBIC', 'AREA', 'LANCZOS4', 'RANDOM'}. 
+        interp (str, optional): Interpolation method for resizing image(s). One of
+            {'NEAREST', 'LINEAR', 'CUBIC', 'AREA', 'LANCZOS4', 'RANDOM'}.
             Defaults to 'LINEAR'.
 
     Raises:
@@ -566,8 +566,8 @@ class ResizeByShort(Transform):
         short_size (int): Target size of the shorter side of the image(s).
         max_size (int, optional): Upper bound of longer side of the image(s). If
             `max_size` is -1, no upper bound will be applied. Defaults to -1.
-        interp (str, optional): Interpolation method for resizing image(s). One of 
-            {'NEAREST', 'LINEAR', 'CUBIC', 'AREA', 'LANCZOS4', 'RANDOM'}. 
+        interp (str, optional): Interpolation method for resizing image(s). One of
+            {'NEAREST', 'LINEAR', 'CUBIC', 'AREA', 'LANCZOS4', 'RANDOM'}.
             Defaults to 'LINEAR'.
 
     Raises:
@@ -606,10 +606,10 @@ class RandomResizeByShort(Transform):
 
     Args:
         short_sizes (list[int]): Target size of the shorter side of the image(s).
-        max_size (int, optional): Upper bound of longer side of the image(s). 
+        max_size (int, optional): Upper bound of longer side of the image(s).
             If `max_size` is -1, no upper bound will be applied. Defaults to -1.
-        interp (str, optional): Interpolation method for resizing image(s). One of 
-            {'NEAREST', 'LINEAR', 'CUBIC', 'AREA', 'LANCZOS4', 'RANDOM'}. 
+        interp (str, optional): Interpolation method for resizing image(s). One of
+            {'NEAREST', 'LINEAR', 'CUBIC', 'AREA', 'LANCZOS4', 'RANDOM'}.
             Defaults to 'LINEAR'.
 
     Raises:
@@ -663,12 +663,12 @@ class RandomFlipOrRotate(Transform):
     Flip or Rotate an image in different directions with a certain probability.
 
     Args:
-        probs (list[float]): Probabilities of performing flipping and rotation. 
+        probs (list[float]): Probabilities of performing flipping and rotation.
             Default: [0.35,0.25].
-        probsf (list[float]): Probabilities of 5 flipping modes (horizontal, 
-            vertical, both horizontal and vertical, diagonal, anti-diagonal). 
+        probsf (list[float]): Probabilities of 5 flipping modes (horizontal,
+            vertical, both horizontal and vertical, diagonal, anti-diagonal).
             Default: [0.3, 0.3, 0.2, 0.1, 0.1].
-        probsr (list[float]): Probabilities of 3 rotation modes (90°, 180°, 270° 
+        probsr (list[float]): Probabilities of 3 rotation modes (90°, 180°, 270°
             clockwise). Default: [0.25, 0.5, 0.25].
 
     Examples:
@@ -938,13 +938,13 @@ class Normalize(Transform):
     3. im = im / std
 
     Args:
-        mean (list[float] | tuple[float], optional): Mean of input image(s). 
+        mean (list[float] | tuple[float], optional): Mean of input image(s).
             Defaults to [0.485, 0.456, 0.406].
-        std (list[float] | tuple[float], optional): Standard deviation of input 
+        std (list[float] | tuple[float], optional): Standard deviation of input
             image(s). Defaults to [0.229, 0.224, 0.225].
-        min_val (list[float] | tuple[float], optional): Minimum value of input 
+        min_val (list[float] | tuple[float], optional): Minimum value of input
             image(s). If None, use 0 for all channels. Defaults to None.
-        max_val (list[float] | tuple[float], optional): Maximum value of input 
+        max_val (list[float] | tuple[float], optional): Maximum value of input
             image(s). If None, use 255. for all channels. Defaults to None.
         apply_to_tar (bool, optional): Whether to apply transformation to the target
             image. Defaults to True.
@@ -1004,7 +1004,7 @@ class CenterCrop(Transform):
     2. Crop the image.
 
     Args:
-        crop_size (int, optional): Target size of the cropped image(s). 
+        crop_size (int, optional): Target size of the cropped image(s).
             Defaults to 224.
     """
 
@@ -1038,26 +1038,26 @@ class CenterCrop(Transform):
 class RandomCrop(Transform):
     """
     Randomly crop the input.
-    1. Compute the height and width of cropped area according to `aspect_ratio` and 
+    1. Compute the height and width of cropped area according to `aspect_ratio` and
         `scaling`.
     2. Locate the upper left corner of cropped area randomly.
     3. Crop the image(s).
     4. Resize the cropped area to `crop_size` x `crop_size`.
 
     Args:
-        crop_size (int | list[int] | tuple[int]): Target size of the cropped area. If 
+        crop_size (int | list[int] | tuple[int]): Target size of the cropped area. If
             None, the cropped area will not be resized. Defaults to None.
-        aspect_ratio (list[float], optional): Aspect ratio of cropped region in 
+        aspect_ratio (list[float], optional): Aspect ratio of cropped region in
             [min, max] format. Defaults to [.5, 2.].
-        thresholds (list[float], optional): Iou thresholds to decide a valid bbox 
+        thresholds (list[float], optional): Iou thresholds to decide a valid bbox
             crop. Defaults to [.0, .1, .3, .5, .7, .9].
-        scaling (list[float], optional): Ratio between the cropped region and the 
+        scaling (list[float], optional): Ratio between the cropped region and the
             original image in [min, max] format. Defaults to [.3, 1.].
-        num_attempts (int, optional): Max number of tries before giving up. 
+        num_attempts (int, optional): Max number of tries before giving up.
             Defaults to 50.
-        allow_no_crop (bool, optional): Whether returning without doing crop is 
+        allow_no_crop (bool, optional): Whether returning without doing crop is
             allowed. Defaults to True.
-        cover_all_box (bool, optional): Whether to ensure all bboxes be covered in 
+        cover_all_box (bool, optional): Whether to ensure all bboxes be covered in
             the final crop. Defaults to False.
     """
 
@@ -1251,7 +1251,7 @@ class RandomScaleAspect(Transform):
     """
     Crop input image(s) and resize back to original sizes.
 
-    Args: 
+    Args:
         min_scale (float): Minimum ratio between the cropped region and the original
             image. If 0, image(s) will not be cropped. Defaults to .5.
         aspect_ratio (float): Aspect ratio of cropped region. Defaults to .33.
@@ -1279,12 +1279,12 @@ class RandomExpand(Transform):
     Randomly expand the input by padding according to random offsets.
 
     Args:
-        upper_ratio (float, optional): Maximum ratio to which the original image 
+        upper_ratio (float, optional): Maximum ratio to which the original image
             is expanded. Defaults to 4..
         prob (float, optional): Probability of apply expanding. Defaults to .5.
-        im_padding_value (list[float] | tuple[float], optional): RGB filling value 
+        im_padding_value (list[float] | tuple[float], optional): RGB filling value
             for the image. Defaults to (127.5, 127.5, 127.5).
-        label_padding_value (int, optional): Filling value for the mask. 
+        label_padding_value (int, optional): Filling value for the mask.
             Defaults to 255.
 
     See Also:
@@ -1337,17 +1337,17 @@ class Pad(Transform):
         Pad image to a specified size or multiple of `size_divisor`.
 
         Args:
-            target_size (list[int] | tuple[int], optional): Image target size, if None, pad to 
+            target_size (list[int] | tuple[int], optional): Image target size, if None, pad to
                 multiple of size_divisor. Defaults to None.
             pad_mode (int, optional): Pad mode. Currently only four modes are supported:
                 [-1, 0, 1, 2]. if -1, use specified offsets. If 0, only pad to right and bottom
                 If 1, pad according to center. If 2, only pad left and top. Defaults to 0.
             offsets (list[int]|None, optional): Padding offsets. Defaults to None.
-            im_padding_value (list[float] | tuple[float]): RGB value of padded area. 
+            im_padding_value (list[float] | tuple[float]): RGB value of padded area.
                 Defaults to (127.5, 127.5, 127.5).
-            label_padding_value (int, optional): Filling value for the mask. 
+            label_padding_value (int, optional): Filling value for the mask.
                 Defaults to 255.
-            size_divisor (int): Image width and height after padding will be a multiple of 
+            size_divisor (int): Image width and height after padding will be a multiple of
                 `size_divisor`.
         """
         super(Pad, self).__init__()
@@ -1426,7 +1426,7 @@ class Pad(Transform):
             h, w = self.target_size
             assert (
                     im_h <= h and im_w <= w
-            ), 'target size ({}, {}) cannot be less than image size ({}, {})'\
+            ), 'target size ({}, {}) cannot be less than image size ({}, {})' \
                 .format(h, w, im_h, im_w)
         else:
             h = (np.ceil(im_h / self.size_divisor) *
@@ -1477,11 +1477,12 @@ class MixupImage(Transform):
         Mixup two images and their gt_bbbox/gt_score.
 
         Args:
-            alpha (float, optional): Alpha parameter of beta distribution. 
+            alpha (float, optional): Alpha parameter of beta distribution.
                 Defaults to 1.5.
-            beta (float, optional): Beta parameter of beta distribution. 
+            beta (float, optional): Beta parameter of beta distribution.
                 Defaults to 1.5.
         """
+
         super(MixupImage, self).__init__()
         if alpha <= 0.0:
             raise ValueError("`alpha` should be positive in MixupImage.")
@@ -1558,24 +1559,24 @@ class RandomDistort(Transform):
     Random color distortion.
 
     Args:
-        brightness_range (float, optional): Range of brightness distortion. 
+        brightness_range (float, optional): Range of brightness distortion.
             Defaults to .5.
-        brightness_prob (float, optional): Probability of brightness distortion. 
+        brightness_prob (float, optional): Probability of brightness distortion.
             Defaults to .5.
-        contrast_range (float, optional): Range of contrast distortion. 
+        contrast_range (float, optional): Range of contrast distortion.
             Defaults to .5.
-        contrast_prob (float, optional): Probability of contrast distortion. 
+        contrast_prob (float, optional): Probability of contrast distortion.
             Defaults to .5.
-        saturation_range (float, optional): Range of saturation distortion. 
+        saturation_range (float, optional): Range of saturation distortion.
             Defaults to .5.
-        saturation_prob (float, optional): Probability of saturation distortion. 
+        saturation_prob (float, optional): Probability of saturation distortion.
             Defaults to .5.
         hue_range (float, optional): Range of hue distortion. Defaults to .5.
         hue_prob (float, optional): Probability of hue distortion. Defaults to .5.
         random_apply (bool, optional): Apply the transformation in random (yolo) or
             fixed (SSD) order. Defaults to True.
         count (int, optional): Number of distortions to apply. Defaults to 4.
-        shuffle_channel (bool, optional): Whether to swap channels randomly. 
+        shuffle_channel (bool, optional): Whether to swap channels randomly.
             Defaults to False.
     """
 
@@ -1722,7 +1723,7 @@ class RandomBlur(Transform):
     """
     Randomly blur input image(s).
 
-    Args: 
+    Args:
         prob (float): Probability of blurring.
     """
 
@@ -1758,7 +1759,7 @@ class Dehaze(Transform):
     """
     Dehaze input image(s).
 
-    Args: 
+    Args:
         gamma (bool, optional): Use gamma correction or not. Defaults to False.
     """
 
@@ -1781,7 +1782,7 @@ class ReduceDim(Transform):
     """
     Use PCA to reduce the dimension of input image(s).
 
-    Args: 
+    Args:
         joblib_path (str): Path of *.joblib file of PCA.
         apply_to_tar (bool, optional): Whether to apply transformation to the target
             image. Defaults to True.
@@ -1816,8 +1817,8 @@ class SelectBand(Transform):
     """
     Select a set of bands of input image(s).
 
-    Args: 
-        band_list (list, optional): Bands to select (band index starts from 1). 
+    Args:
+        band_list (list, optional): Bands to select (band index starts from 1).
             Defaults to [1, 2, 3].
         apply_to_tar (bool, optional): Whether to apply transformation to the target
             image. Defaults to True.
@@ -1935,7 +1936,7 @@ class RandomSwap(Transform):
     Randomly swap multi-temporal images.
 
     Args:
-        prob (float, optional): Probability of swapping the input images. 
+        prob (float, optional): Probability of swapping the input images.
             Default: 0.2.
     """
 
@@ -1967,15 +1968,15 @@ class AppendIndex(Transform):
     Append remote sensing index to input image(s).
 
     Args:
-        index_type (str): Type of remote sensinng index. See supported 
-            index types in 
+        index_type (str): Type of remote sensinng index. See supported
+            index types in
             https://github.com/PaddlePaddle/PaddleRS/tree/develop/paddlers/transforms/indices.py .
-        band_indices (dict, optional): Mapping of band names to band indices 
-            (starting from 1). See band names in 
+        band_indices (dict, optional): Mapping of band names to band indices
+            (starting from 1). See band names in
             https://github.com/PaddlePaddle/PaddleRS/tree/develop/paddlers/transforms/indices.py .
             Default: None.
-        satellite (str, optional): Type of satellite. If set, 
-            band indices will be automatically determined accordingly. See supported satellites in 
+        satellite (str, optional): Type of satellite. If set,
+            band indices will be automatically determined accordingly. See supported satellites in
             https://github.com/PaddlePaddle/PaddleRS/tree/develop/paddlers/transforms/satellites.py .
             Default: None.
     """
@@ -2025,8 +2026,8 @@ class MatchRadiance(Transform):
 
     Args:
         method (str, optional): Method used to match the radiance of the
-            bi-temporal images. Choices are {'hist', 'lsr', 'fft}. 'hist' 
-            stands for histogram matching, 'lsr' stands for least-squares 
+            bi-temporal images. Choices are {'hist', 'lsr', 'fft}. 'hist'
+            stands for histogram matching, 'lsr' stands for least-squares
             regression, and 'fft' replaces the low-frequency components of
             the image to match the reference image. Default: 'hist'.
     """