diff --git a/README.md b/README.md
index d73bc462c..abed09648 100644
--- a/README.md
+++ b/README.md
@@ -1,213 +1,49 @@
-# Semantic Segmentation on PyTorch
-
-English | [简体中文](/README_zh-CN.md)
-
-[![python-image]][python-url]
-[![pytorch-image]][pytorch-url]
-[![lic-image]][lic-url]
+# slightweight Segmentation
 
 This project aims at providing a concise, easy-to-use, modifiable reference implementation for semantic segmentation models using PyTorch.
 
-<p align="center"><img width="100%" src="docs/weimar_000091_000019_gtFine_color.png" /></p>
+stage 1: # Installation
 
-## Installation
-
-```
-# semantic-segmentation-pytorch dependencies
+stage 2: # dependencies
 pip install ninja tqdm
 
-# follow PyTorch installation in https://pytorch.org/get-started/locally/
+stage 3: # follow PyTorch installation in https://pytorch.org/get-started/locally/
 conda install pytorch torchvision -c pytorch
 
-# install PyTorch Segmentation
-git clone https://github.com/Tramac/awesome-semantic-segmentation-pytorch.git
-```
-
-## Usage
-### Train
------------------
-- **Single GPU training**
-```
-# for example, train fcn32_vgg16_pascal_voc:
-python train.py --model fcn32s --backbone vgg16 --dataset pascal_voc --lr 0.0001 --epochs 50
-```
-- **Multi-GPU training**
-
-```
-# for example, train fcn32_vgg16_pascal_voc with 4 GPUs:
-export NGPUS=4
-python -m torch.distributed.launch --nproc_per_node=$NGPUS train.py --model fcn32s --backbone vgg16 --dataset pascal_voc --lr 0.0001 --epochs 50
-```
-
-### Evaluation
------------------
-- **Single GPU evaluating**
-```
-# for example, evaluate fcn32_vgg16_pascal_voc
-python eval.py --model fcn32s --backbone vgg16 --dataset pascal_voc
-```
-- **Multi-GPU evaluating**
-```
-# for example, evaluate fcn32_vgg16_pascal_voc with 4 GPUs:
-export NGPUS=4
-python -m torch.distributed.launch --nproc_per_node=$NGPUS eval.py --model fcn32s --backbone vgg16 --dataset pascal_voc
-```
+stage 4: # for example, train swnet_resnet_citys:
+python train.py --model swnet --backbone resnet --dataset citys --lr 0.0001 --epochs 50
+
+
+stage 5: # for example, evaluate swnet_resnet_citys
+python eval.py --model swnet --backbone resnet --dataset citys
+
 ### Demo
-```
+
 cd ./scripts
 #for new users:
-python demo.py --model fcn32s_vgg16_voc --input-pic ../tests/test_img.jpg
+python demo.py --model swnet_resnet_citys --input-pic ../tests/test_img.jpg
 #you should add 'test.jpg' by yourself
-python demo.py --model fcn32s_vgg16_voc --input-pic ../datasets/test.jpg
-```
-
-```
-.{SEG_ROOT}
-├── scripts
-│   ├── demo.py
-│   ├── eval.py
-│   └── train.py
-```
-
-## Support
-
-#### Model
-
-- [FCN](https://arxiv.org/abs/1411.4038)
-- [ENet](https://arxiv.org/pdf/1606.02147)
-- [PSPNet](https://arxiv.org/pdf/1612.01105)
-- [ICNet](https://arxiv.org/pdf/1704.08545)
-- [DeepLabv3](https://arxiv.org/abs/1706.05587)
-- [DeepLabv3+](https://arxiv.org/pdf/1802.02611)
-- [DenseASPP](http://openaccess.thecvf.com/content_cvpr_2018/papers/Yang_DenseASPP_for_Semantic_CVPR_2018_paper.pdf)
-- [EncNet](https://arxiv.org/abs/1803.08904v1)
-- [BiSeNet](https://arxiv.org/abs/1808.00897)
-- [PSANet](https://hszhao.github.io/papers/eccv18_psanet.pdf)
-- [DANet](https://arxiv.org/pdf/1809.02983)
-- [OCNet](https://arxiv.org/pdf/1809.00916)
-- [CGNet](https://arxiv.org/pdf/1811.08201)
-- [ESPNetv2](https://arxiv.org/abs/1811.11431)
-- [DUNet(DUpsampling)](https://arxiv.org/abs/1903.02120)
-- [FastFCN(JPU)](https://arxiv.org/abs/1903.11816)
-- [LEDNet](https://arxiv.org/abs/1905.02423)
-- [Fast-SCNN](https://github.com/Tramac/Fast-SCNN-pytorch)
-- [LightSeg](https://github.com/Tramac/Lightweight-Segmentation)
-- [DFANet](https://arxiv.org/abs/1904.02216)
-
-[DETAILS](https://github.com/Tramac/awesome-semantic-segmentation-pytorch/blob/master/docs/DETAILS.md) for model & backbone.
-```
-.{SEG_ROOT}
-├── core
-│   ├── models
-│   │   ├── bisenet.py
-│   │   ├── danet.py
-│   │   ├── deeplabv3.py
-│   │   ├── deeplabv3+.py
-│   │   ├── denseaspp.py
-│   │   ├── dunet.py
-│   │   ├── encnet.py
-│   │   ├── fcn.py
-│   │   ├── pspnet.py
-│   │   ├── icnet.py
-│   │   ├── enet.py
-│   │   ├── ocnet.py
-│   │   ├── psanet.py
-│   │   ├── cgnet.py
-│   │   ├── espnet.py
-│   │   ├── lednet.py
-│   │   ├── dfanet.py
-│   │   ├── ......
-```
+python demo.py --model swnet_resnet_citys --input-pic ../datasets/test.jpg
+
+### performance evaluation
+
+![image](https://user-images.githubusercontent.com/43395674/159203398-86f4874e-7b0f-48a3-8414-cdf662d56f99.png)
+![image](https://user-images.githubusercontent.com/43395674/159203405-7b656176-2e93-4d67-98e6-6d650204b0d6.png)
+
+![image](https://user-images.githubusercontent.com/43395674/159203470-99a509cc-68cc-4fa4-be65-43e0c9204cb1.png)
+![image](https://user-images.githubusercontent.com/43395674/159203480-10ff8f81-965f-419c-ab98-83fade7b3b65.png)
+
+### a experiment on a simulative scene based on Jetson
+![image](https://user-images.githubusercontent.com/43395674/159203486-19980424-c6c4-4644-a44b-9f52085b2067.png)
+
 
 #### Dataset
 
 You can run script to download dataset, such as:
 
-```
+
 cd ./core/data/downloader
 python ade20k.py --download-dir ../datasets/ade
-```
-
-|                           Dataset                            | training set | validation set | testing set |
-| :----------------------------------------------------------: | :----------: | :------------: | :---------: |
-| [VOC2012](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar) |     1464     |      1449      |      ✘      |
-| [VOCAug](http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/semantic_contours/benchmark.tgz) |    11355     |      2857      |      ✘      |
-| [ADK20K](http://groups.csail.mit.edu/vision/datasets/ADE20K/) |    20210     |      2000      |      ✘      |
-| [Cityscapes](https://www.cityscapes-dataset.com/downloads/)  |     2975     |      500       |      ✘      |
-| [COCO](http://cocodataset.org/#download)           |              |                |             |
-| [SBU-shadow](http://www3.cs.stonybrook.edu/~cvl/content/datasets/shadow_db/SBU-shadow.zip) |     4085     |      638       |      ✘      |
-| [LIP(Look into Person)](http://sysu-hcp.net/lip/)       |    30462     |     10000      |    10000    |
-
-```
-.{SEG_ROOT}
-├── core
-│   ├── data
-│   │   ├── dataloader
-│   │   │   ├── ade.py
-│   │   │   ├── cityscapes.py
-│   │   │   ├── mscoco.py
-│   │   │   ├── pascal_aug.py
-│   │   │   ├── pascal_voc.py
-│   │   │   ├── sbu_shadow.py
-│   │   └── downloader
-│   │       ├── ade20k.py
-│   │       ├── cityscapes.py
-│   │       ├── mscoco.py
-│   │       ├── pascal_voc.py
-│   │       └── sbu_shadow.py
-```
-
-## Result
-- **PASCAL VOC 2012**
-
-|Methods|Backbone|TrainSet|EvalSet|crops_size|epochs|JPU|Mean IoU|pixAcc|
-|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|
-|FCN32s|vgg16|train|val|480|60|✘|47.50|85.39|
-|FCN16s|vgg16|train|val|480|60|✘|49.16|85.98|
-|FCN8s|vgg16|train|val|480|60|✘|48.87|85.02|
-|FCN32s|resnet50|train|val|480|50|✘|54.60|88.57|
-|PSPNet|resnet50|train|val|480|60|✘|63.44|89.78|
-|DeepLabv3|resnet50|train|val|480|60|✘|60.15|88.36|
-
-Note: `lr=1e-4, batch_size=4, epochs=80`.
-
-## Overfitting Test
-See [TEST](https://github.com/Tramac/Awesome-semantic-segmentation-pytorch/tree/master/tests) for details.
-
-```
-.{SEG_ROOT}
-├── tests
-│   └── test_model.py
-```
-
-## To Do
-- [x] add train script
-- [ ] remove syncbn
-- [ ] train & evaluate
-- [x] test distributed training
-- [x] fix syncbn ([Why SyncBN?](https://tramac.github.io/2019/02/25/%E8%B7%A8%E5%8D%A1%E5%90%8C%E6%AD%A5%20Batch%20Normalization[%E8%BD%AC]/))
-- [x] add distributed ([How DIST?]("https://tramac.github.io/2019/03/06/%E5%88%86%E5%B8%83%E5%BC%8F%E8%AE%AD%E7%BB%83-PyTorch/"))
-<!--
-- [x] fix syncbn ([Why SyncBN?](https://tramac.github.io/2019/04/08/SyncBN/))
-- [x] add distributed ([How DIST?](https://tramac.github.io/2019/04/22/%E5%88%86%E5%B8%83%E5%BC%8F%E8%AE%AD%E7%BB%83-PyTorch/))
--->
-## References
-- [PyTorch-Encoding](https://github.com/zhanghang1989/PyTorch-Encoding)
-- [maskrcnn-benchmark](https://github.com/facebookresearch/maskrcnn-benchmark)
-- [gloun-cv](https://github.com/dmlc/gluon-cv)
-- [imagenet](https://github.com/pytorch/examples/tree/master/imagenet)
-
-
-
-<!--
-[![python-image]][python-url]
-[![pytorch-image]][pytorch-url]
-[![lic-image]][lic-url]
--->
-
-[python-image]: https://img.shields.io/badge/Python-2.x|3.x-ff69b4.svg
-[python-url]: https://www.python.org/
-[pytorch-image]: https://img.shields.io/badge/PyTorch-1.1-2BAF2B.svg
-[pytorch-url]: https://pytorch.org/
-[lic-image]: https://img.shields.io/badge/Apache-2.0-blue.svg
-[lic-url]: https://github.com/Tramac/Awesome-semantic-segmentation-pytorch/blob/master/LICENSE
+
+Acknowledgement: we thanks the code support from "awesome-semantic-segmentation-pytorch (https://github.com/Tramac/Awesome-semantic-segmentation-pytorch)". The swnet is a improvement from enet.
+ 
diff --git a/core/models/bisenet.py b/core/models/bisenet.py
deleted file mode 100644
index fac6c0955..000000000
--- a/core/models/bisenet.py
+++ /dev/null
@@ -1,220 +0,0 @@
-"""Bilateral Segmentation Network"""
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-
-from core.models.base_models.resnet import resnet18
-from core.nn import _ConvBNReLU
-
-__all__ = ['BiSeNet', 'get_bisenet', 'get_bisenet_resnet18_citys']
-
-
-class BiSeNet(nn.Module):
-    def __init__(self, nclass, backbone='resnet18', aux=False, jpu=False, pretrained_base=True, **kwargs):
-        super(BiSeNet, self).__init__()
-        self.aux = aux
-        self.spatial_path = SpatialPath(3, 128, **kwargs)
-        self.context_path = ContextPath(backbone, pretrained_base, **kwargs)
-        self.ffm = FeatureFusion(256, 256, 4, **kwargs)
-        self.head = _BiSeHead(256, 64, nclass, **kwargs)
-        if aux:
-            self.auxlayer1 = _BiSeHead(128, 256, nclass, **kwargs)
-            self.auxlayer2 = _BiSeHead(128, 256, nclass, **kwargs)
-
-        self.__setattr__('exclusive',
-                         ['spatial_path', 'context_path', 'ffm', 'head', 'auxlayer1', 'auxlayer2'] if aux else [
-                             'spatial_path', 'context_path', 'ffm', 'head'])
-
-    def forward(self, x):
-        size = x.size()[2:]
-        spatial_out = self.spatial_path(x)
-        context_out = self.context_path(x)
-        fusion_out = self.ffm(spatial_out, context_out[-1])
-        outputs = []
-        x = self.head(fusion_out)
-        x = F.interpolate(x, size, mode='bilinear', align_corners=True)
-        outputs.append(x)
-
-        if self.aux:
-            auxout1 = self.auxlayer1(context_out[0])
-            auxout1 = F.interpolate(auxout1, size, mode='bilinear', align_corners=True)
-            outputs.append(auxout1)
-            auxout2 = self.auxlayer2(context_out[1])
-            auxout2 = F.interpolate(auxout2, size, mode='bilinear', align_corners=True)
-            outputs.append(auxout2)
-        return tuple(outputs)
-
-
-class _BiSeHead(nn.Module):
-    def __init__(self, in_channels, inter_channels, nclass, norm_layer=nn.BatchNorm2d, **kwargs):
-        super(_BiSeHead, self).__init__()
-        self.block = nn.Sequential(
-            _ConvBNReLU(in_channels, inter_channels, 3, 1, 1, norm_layer=norm_layer),
-            nn.Dropout(0.1),
-            nn.Conv2d(inter_channels, nclass, 1)
-        )
-
-    def forward(self, x):
-        x = self.block(x)
-        return x
-
-
-class SpatialPath(nn.Module):
-    """Spatial path"""
-
-    def __init__(self, in_channels, out_channels, norm_layer=nn.BatchNorm2d, **kwargs):
-        super(SpatialPath, self).__init__()
-        inter_channels = 64
-        self.conv7x7 = _ConvBNReLU(in_channels, inter_channels, 7, 2, 3, norm_layer=norm_layer)
-        self.conv3x3_1 = _ConvBNReLU(inter_channels, inter_channels, 3, 2, 1, norm_layer=norm_layer)
-        self.conv3x3_2 = _ConvBNReLU(inter_channels, inter_channels, 3, 2, 1, norm_layer=norm_layer)
-        self.conv1x1 = _ConvBNReLU(inter_channels, out_channels, 1, 1, 0, norm_layer=norm_layer)
-
-    def forward(self, x):
-        x = self.conv7x7(x)
-        x = self.conv3x3_1(x)
-        x = self.conv3x3_2(x)
-        x = self.conv1x1(x)
-
-        return x
-
-
-class _GlobalAvgPooling(nn.Module):
-    def __init__(self, in_channels, out_channels, norm_layer, **kwargs):
-        super(_GlobalAvgPooling, self).__init__()
-        self.gap = nn.Sequential(
-            nn.AdaptiveAvgPool2d(1),
-            nn.Conv2d(in_channels, out_channels, 1, bias=False),
-            norm_layer(out_channels),
-            nn.ReLU(True)
-        )
-
-    def forward(self, x):
-        size = x.size()[2:]
-        pool = self.gap(x)
-        out = F.interpolate(pool, size, mode='bilinear', align_corners=True)
-        return out
-
-
-class AttentionRefinmentModule(nn.Module):
-    def __init__(self, in_channels, out_channels, norm_layer=nn.BatchNorm2d, **kwargs):
-        super(AttentionRefinmentModule, self).__init__()
-        self.conv3x3 = _ConvBNReLU(in_channels, out_channels, 3, 1, 1, norm_layer=norm_layer)
-        self.channel_attention = nn.Sequential(
-            nn.AdaptiveAvgPool2d(1),
-            _ConvBNReLU(out_channels, out_channels, 1, 1, 0, norm_layer=norm_layer),
-            nn.Sigmoid()
-        )
-
-    def forward(self, x):
-        x = self.conv3x3(x)
-        attention = self.channel_attention(x)
-        x = x * attention
-        return x
-
-
-class ContextPath(nn.Module):
-    def __init__(self, backbone='resnet18', pretrained_base=True, norm_layer=nn.BatchNorm2d, **kwargs):
-        super(ContextPath, self).__init__()
-        if backbone == 'resnet18':
-            pretrained = resnet18(pretrained=pretrained_base, **kwargs)
-        else:
-            raise RuntimeError('unknown backbone: {}'.format(backbone))
-        self.conv1 = pretrained.conv1
-        self.bn1 = pretrained.bn1
-        self.relu = pretrained.relu
-        self.maxpool = pretrained.maxpool
-        self.layer1 = pretrained.layer1
-        self.layer2 = pretrained.layer2
-        self.layer3 = pretrained.layer3
-        self.layer4 = pretrained.layer4
-
-        inter_channels = 128
-        self.global_context = _GlobalAvgPooling(512, inter_channels, norm_layer)
-
-        self.arms = nn.ModuleList(
-            [AttentionRefinmentModule(512, inter_channels, norm_layer, **kwargs),
-             AttentionRefinmentModule(256, inter_channels, norm_layer, **kwargs)]
-        )
-        self.refines = nn.ModuleList(
-            [_ConvBNReLU(inter_channels, inter_channels, 3, 1, 1, norm_layer=norm_layer),
-             _ConvBNReLU(inter_channels, inter_channels, 3, 1, 1, norm_layer=norm_layer)]
-        )
-
-    def forward(self, x):
-        x = self.conv1(x)
-        x = self.bn1(x)
-        x = self.relu(x)
-        x = self.maxpool(x)
-        x = self.layer1(x)
-
-        context_blocks = []
-        context_blocks.append(x)
-        x = self.layer2(x)
-        context_blocks.append(x)
-        c3 = self.layer3(x)
-        context_blocks.append(c3)
-        c4 = self.layer4(c3)
-        context_blocks.append(c4)
-        context_blocks.reverse()
-
-        global_context = self.global_context(c4)
-        last_feature = global_context
-        context_outputs = []
-        for i, (feature, arm, refine) in enumerate(zip(context_blocks[:2], self.arms, self.refines)):
-            feature = arm(feature)
-            feature += last_feature
-            last_feature = F.interpolate(feature, size=context_blocks[i + 1].size()[2:],
-                                         mode='bilinear', align_corners=True)
-            last_feature = refine(last_feature)
-            context_outputs.append(last_feature)
-
-        return context_outputs
-
-
-class FeatureFusion(nn.Module):
-    def __init__(self, in_channels, out_channels, reduction=1, norm_layer=nn.BatchNorm2d, **kwargs):
-        super(FeatureFusion, self).__init__()
-        self.conv1x1 = _ConvBNReLU(in_channels, out_channels, 1, 1, 0, norm_layer=norm_layer, **kwargs)
-        self.channel_attention = nn.Sequential(
-            nn.AdaptiveAvgPool2d(1),
-            _ConvBNReLU(out_channels, out_channels // reduction, 1, 1, 0, norm_layer=norm_layer),
-            _ConvBNReLU(out_channels // reduction, out_channels, 1, 1, 0, norm_layer=norm_layer),
-            nn.Sigmoid()
-        )
-
-    def forward(self, x1, x2):
-        fusion = torch.cat([x1, x2], dim=1)
-        out = self.conv1x1(fusion)
-        attention = self.channel_attention(out)
-        out = out + out * attention
-        return out
-
-
-def get_bisenet(dataset='citys', backbone='resnet18', pretrained=False, root='~/.torch/models',
-                pretrained_base=True, **kwargs):
-    acronyms = {
-        'pascal_voc': 'pascal_voc',
-        'pascal_aug': 'pascal_aug',
-        'ade20k': 'ade',
-        'coco': 'coco',
-        'citys': 'citys',
-    }
-    from ..data.dataloader import datasets
-    model = BiSeNet(datasets[dataset].NUM_CLASS, backbone=backbone, pretrained_base=pretrained_base, **kwargs)
-    if pretrained:
-        from .model_store import get_model_file
-        device = torch.device(kwargs['local_rank'])
-        model.load_state_dict(torch.load(get_model_file('bisenet_%s_%s' % (backbone, acronyms[dataset]), root=root),
-                              map_location=device))
-    return model
-
-
-def get_bisenet_resnet18_citys(**kwargs):
-    return get_bisenet('citys', 'resnet18', **kwargs)
-
-
-if __name__ == '__main__':
-    img = torch.randn(2, 3, 224, 224)
-    model = BiSeNet(19, backbone='resnet18')
-    print(model.exclusive)
diff --git a/core/models/cgnet.py b/core/models/cgnet.py
deleted file mode 100644
index 9cae5c837..000000000
--- a/core/models/cgnet.py
+++ /dev/null
@@ -1,210 +0,0 @@
-"""Context Guided Network for Semantic Segmentation"""
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-
-from core.nn import _ConvBNPReLU, _BNPReLU
-
-__all__ = ['CGNet', 'get_cgnet', 'get_cgnet_citys']
-
-
-class CGNet(nn.Module):
-    r"""CGNet
-
-    Parameters
-    ----------
-    nclass : int
-        Number of categories for the training dataset.
-    norm_layer : object
-        Normalization layer used in backbone network (default: :class:`nn.BatchNorm`;
-        for Synchronized Cross-GPU BachNormalization).
-    aux : bool
-        Auxiliary loss.
-
-    Reference:
-        Tianyi Wu, et al. "CGNet: A Light-weight Context Guided Network for Semantic Segmentation."
-        arXiv preprint arXiv:1811.08201 (2018).
-    """
-
-    def __init__(self, nclass, backbone='', aux=False, jpu=False, pretrained_base=True, M=3, N=21, **kwargs):
-        super(CGNet, self).__init__()
-        # stage 1
-        self.stage1_0 = _ConvBNPReLU(3, 32, 3, 2, 1, **kwargs)
-        self.stage1_1 = _ConvBNPReLU(32, 32, 3, 1, 1, **kwargs)
-        self.stage1_2 = _ConvBNPReLU(32, 32, 3, 1, 1, **kwargs)
-
-        self.sample1 = _InputInjection(1)
-        self.sample2 = _InputInjection(2)
-        self.bn_prelu1 = _BNPReLU(32 + 3, **kwargs)
-
-        # stage 2
-        self.stage2_0 = ContextGuidedBlock(32 + 3, 64, dilation=2, reduction=8, down=True, residual=False, **kwargs)
-        self.stage2 = nn.ModuleList()
-        for i in range(0, M - 1):
-            self.stage2.append(ContextGuidedBlock(64, 64, dilation=2, reduction=8, **kwargs))
-        self.bn_prelu2 = _BNPReLU(128 + 3, **kwargs)
-
-        # stage 3
-        self.stage3_0 = ContextGuidedBlock(128 + 3, 128, dilation=4, reduction=16, down=True, residual=False, **kwargs)
-        self.stage3 = nn.ModuleList()
-        for i in range(0, N - 1):
-            self.stage3.append(ContextGuidedBlock(128, 128, dilation=4, reduction=16, **kwargs))
-        self.bn_prelu3 = _BNPReLU(256, **kwargs)
-
-        self.head = nn.Sequential(
-            nn.Dropout2d(0.1, False),
-            nn.Conv2d(256, nclass, 1))
-
-        self.__setattr__('exclusive', ['stage1_0', 'stage1_1', 'stage1_2', 'sample1', 'sample2',
-                                       'bn_prelu1', 'stage2_0', 'stage2', 'bn_prelu2', 'stage3_0',
-                                       'stage3', 'bn_prelu3', 'head'])
-
-    def forward(self, x):
-        size = x.size()[2:]
-        # stage1
-        out0 = self.stage1_0(x)
-        out0 = self.stage1_1(out0)
-        out0 = self.stage1_2(out0)
-
-        inp1 = self.sample1(x)
-        inp2 = self.sample2(x)
-
-        # stage 2
-        out0_cat = self.bn_prelu1(torch.cat([out0, inp1], dim=1))
-        out1_0 = self.stage2_0(out0_cat)
-        for i, layer in enumerate(self.stage2):
-            if i == 0:
-                out1 = layer(out1_0)
-            else:
-                out1 = layer(out1)
-        out1_cat = self.bn_prelu2(torch.cat([out1, out1_0, inp2], dim=1))
-
-        # stage 3
-        out2_0 = self.stage3_0(out1_cat)
-        for i, layer in enumerate(self.stage3):
-            if i == 0:
-                out2 = layer(out2_0)
-            else:
-                out2 = layer(out2)
-        out2_cat = self.bn_prelu3(torch.cat([out2_0, out2], dim=1))
-
-        outputs = []
-        out = self.head(out2_cat)
-        out = F.interpolate(out, size, mode='bilinear', align_corners=True)
-        outputs.append(out)
-        return tuple(outputs)
-
-
-class _ChannelWiseConv(nn.Module):
-    def __init__(self, in_channels, out_channels, dilation=1, **kwargs):
-        super(_ChannelWiseConv, self).__init__()
-        self.conv = nn.Conv2d(in_channels, out_channels, 3, 1, dilation, dilation, groups=in_channels, bias=False)
-
-    def forward(self, x):
-        x = self.conv(x)
-        return x
-
-
-class _FGlo(nn.Module):
-    def __init__(self, in_channels, reduction=16, **kwargs):
-        super(_FGlo, self).__init__()
-        self.gap = nn.AdaptiveAvgPool2d(1)
-        self.fc = nn.Sequential(
-            nn.Linear(in_channels, in_channels // reduction),
-            nn.ReLU(True),
-            nn.Linear(in_channels // reduction, in_channels),
-            nn.Sigmoid())
-
-    def forward(self, x):
-        n, c, _, _ = x.size()
-        out = self.gap(x).view(n, c)
-        out = self.fc(out).view(n, c, 1, 1)
-        return x * out
-
-
-class _InputInjection(nn.Module):
-    def __init__(self, ratio):
-        super(_InputInjection, self).__init__()
-        self.pool = nn.ModuleList()
-        for i in range(0, ratio):
-            self.pool.append(nn.AvgPool2d(3, 2, 1))
-
-    def forward(self, x):
-        for pool in self.pool:
-            x = pool(x)
-        return x
-
-
-class _ConcatInjection(nn.Module):
-    def __init__(self, in_channels, norm_layer=nn.BatchNorm2d, **kwargs):
-        super(_ConcatInjection, self).__init__()
-        self.bn = norm_layer(in_channels)
-        self.prelu = nn.PReLU(in_channels)
-
-    def forward(self, x1, x2):
-        out = torch.cat([x1, x2], dim=1)
-        out = self.bn(out)
-        out = self.prelu(out)
-        return out
-
-
-class ContextGuidedBlock(nn.Module):
-    def __init__(self, in_channels, out_channels, dilation=2, reduction=16, down=False,
-                 residual=True, norm_layer=nn.BatchNorm2d, **kwargs):
-        super(ContextGuidedBlock, self).__init__()
-        inter_channels = out_channels // 2 if not down else out_channels
-        if down:
-            self.conv = _ConvBNPReLU(in_channels, inter_channels, 3, 2, 1, norm_layer=norm_layer, **kwargs)
-            self.reduce = nn.Conv2d(inter_channels * 2, out_channels, 1, bias=False)
-        else:
-            self.conv = _ConvBNPReLU(in_channels, inter_channels, 1, 1, 0, norm_layer=norm_layer, **kwargs)
-        self.f_loc = _ChannelWiseConv(inter_channels, inter_channels, **kwargs)
-        self.f_sur = _ChannelWiseConv(inter_channels, inter_channels, dilation, **kwargs)
-        self.bn = norm_layer(inter_channels * 2)
-        self.prelu = nn.PReLU(inter_channels * 2)
-        self.f_glo = _FGlo(out_channels, reduction, **kwargs)
-        self.down = down
-        self.residual = residual
-
-    def forward(self, x):
-        out = self.conv(x)
-        loc = self.f_loc(out)
-        sur = self.f_sur(out)
-
-        joi_feat = torch.cat([loc, sur], dim=1)
-        joi_feat = self.prelu(self.bn(joi_feat))
-        if self.down:
-            joi_feat = self.reduce(joi_feat)
-
-        out = self.f_glo(joi_feat)
-        if self.residual:
-            out = out + x
-
-        return out
-
-
-def get_cgnet(dataset='citys', backbone='', pretrained=False, root='~/.torch/models', pretrained_base=True, **kwargs):
-    acronyms = {
-        'pascal_voc': 'pascal_voc',
-        'pascal_aug': 'pascal_aug',
-        'ade20k': 'ade',
-        'coco': 'coco',
-        'citys': 'citys',
-    }
-    from core.data.dataloader import datasets
-    model = CGNet(datasets[dataset].NUM_CLASS, backbone=backbone, pretrained_base=pretrained_base, **kwargs)
-    if pretrained:
-        from .model_store import get_model_file
-        device = torch.device(kwargs['local_rank'])
-        model.load_state_dict(torch.load(get_model_file('cgnet_%s' % (acronyms[dataset]), root=root),
-                              map_location=device))
-    return model
-
-
-def get_cgnet_citys(**kwargs):
-    return get_cgnet('citys', '', **kwargs)
-
-
-if __name__ == '__main__':
-    model = get_cgnet_citys()
-    print(model)
diff --git a/core/models/danet.py b/core/models/danet.py
deleted file mode 100644
index 0e8de740b..000000000
--- a/core/models/danet.py
+++ /dev/null
@@ -1,215 +0,0 @@
-"""Dual Attention Network"""
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-
-from .segbase import SegBaseModel
-
-__all__ = ['DANet', 'get_danet', 'get_danet_resnet50_citys',
-           'get_danet_resnet101_citys', 'get_danet_resnet152_citys']
-
-
-class DANet(SegBaseModel):
-    r"""Pyramid Scene Parsing Network
-
-    Parameters
-    ----------
-    nclass : int
-        Number of categories for the training dataset.
-    backbone : string
-        Pre-trained dilated backbone network type (default:'resnet50'; 'resnet50',
-        'resnet101' or 'resnet152').
-    norm_layer : object
-        Normalization layer used in backbone network (default: :class:`mxnet.gluon.nn.BatchNorm`;
-        for Synchronized Cross-GPU BachNormalization).
-    aux : bool
-        Auxiliary loss.
-    Reference:
-        Jun Fu, Jing Liu, Haijie Tian, Yong Li, Yongjun Bao, Zhiwei Fang,and Hanqing Lu.
-        "Dual Attention Network for Scene Segmentation." *CVPR*, 2019
-    """
-
-    def __init__(self, nclass, backbone='resnet50', aux=True, pretrained_base=True, **kwargs):
-        super(DANet, self).__init__(nclass, aux, backbone, pretrained_base=pretrained_base, **kwargs)
-        self.head = _DAHead(2048, nclass, aux, **kwargs)
-
-        self.__setattr__('exclusive', ['head'])
-
-    def forward(self, x):
-        size = x.size()[2:]
-        _, _, c3, c4 = self.base_forward(x)
-        outputs = []
-        x = self.head(c4)
-        x0 = F.interpolate(x[0], size, mode='bilinear', align_corners=True)
-        outputs.append(x0)
-
-        if self.aux:
-            x1 = F.interpolate(x[1], size, mode='bilinear', align_corners=True)
-            x2 = F.interpolate(x[2], size, mode='bilinear', align_corners=True)
-            outputs.append(x1)
-            outputs.append(x2)
-        return outputs
-
-
-class _PositionAttentionModule(nn.Module):
-    """ Position attention module"""
-
-    def __init__(self, in_channels, **kwargs):
-        super(_PositionAttentionModule, self).__init__()
-        self.conv_b = nn.Conv2d(in_channels, in_channels // 8, 1)
-        self.conv_c = nn.Conv2d(in_channels, in_channels // 8, 1)
-        self.conv_d = nn.Conv2d(in_channels, in_channels, 1)
-        self.alpha = nn.Parameter(torch.zeros(1))
-        self.softmax = nn.Softmax(dim=-1)
-
-    def forward(self, x):
-        batch_size, _, height, width = x.size()
-        feat_b = self.conv_b(x).view(batch_size, -1, height * width).permute(0, 2, 1)
-        feat_c = self.conv_c(x).view(batch_size, -1, height * width)
-        attention_s = self.softmax(torch.bmm(feat_b, feat_c))
-        feat_d = self.conv_d(x).view(batch_size, -1, height * width)
-        feat_e = torch.bmm(feat_d, attention_s.permute(0, 2, 1)).view(batch_size, -1, height, width)
-        out = self.alpha * feat_e + x
-
-        return out
-
-
-class _ChannelAttentionModule(nn.Module):
-    """Channel attention module"""
-
-    def __init__(self, **kwargs):
-        super(_ChannelAttentionModule, self).__init__()
-        self.beta = nn.Parameter(torch.zeros(1))
-        self.softmax = nn.Softmax(dim=-1)
-
-    def forward(self, x):
-        batch_size, _, height, width = x.size()
-        feat_a = x.view(batch_size, -1, height * width)
-        feat_a_transpose = x.view(batch_size, -1, height * width).permute(0, 2, 1)
-        attention = torch.bmm(feat_a, feat_a_transpose)
-        attention_new = torch.max(attention, dim=-1, keepdim=True)[0].expand_as(attention) - attention
-        attention = self.softmax(attention_new)
-
-        feat_e = torch.bmm(attention, feat_a).view(batch_size, -1, height, width)
-        out = self.beta * feat_e + x
-
-        return out
-
-
-class _DAHead(nn.Module):
-    def __init__(self, in_channels, nclass, aux=True, norm_layer=nn.BatchNorm2d, norm_kwargs=None, **kwargs):
-        super(_DAHead, self).__init__()
-        self.aux = aux
-        inter_channels = in_channels // 4
-        self.conv_p1 = nn.Sequential(
-            nn.Conv2d(in_channels, inter_channels, 3, padding=1, bias=False),
-            norm_layer(inter_channels, **({} if norm_kwargs is None else norm_kwargs)),
-            nn.ReLU(True)
-        )
-        self.conv_c1 = nn.Sequential(
-            nn.Conv2d(in_channels, inter_channels, 3, padding=1, bias=False),
-            norm_layer(inter_channels, **({} if norm_kwargs is None else norm_kwargs)),
-            nn.ReLU(True)
-        )
-        self.pam = _PositionAttentionModule(inter_channels, **kwargs)
-        self.cam = _ChannelAttentionModule(**kwargs)
-        self.conv_p2 = nn.Sequential(
-            nn.Conv2d(inter_channels, inter_channels, 3, padding=1, bias=False),
-            norm_layer(inter_channels, **({} if norm_kwargs is None else norm_kwargs)),
-            nn.ReLU(True)
-        )
-        self.conv_c2 = nn.Sequential(
-            nn.Conv2d(inter_channels, inter_channels, 3, padding=1, bias=False),
-            norm_layer(inter_channels, **({} if norm_kwargs is None else norm_kwargs)),
-            nn.ReLU(True)
-        )
-        self.out = nn.Sequential(
-            nn.Dropout(0.1),
-            nn.Conv2d(inter_channels, nclass, 1)
-        )
-        if aux:
-            self.conv_p3 = nn.Sequential(
-                nn.Dropout(0.1),
-                nn.Conv2d(inter_channels, nclass, 1)
-            )
-            self.conv_c3 = nn.Sequential(
-                nn.Dropout(0.1),
-                nn.Conv2d(inter_channels, nclass, 1)
-            )
-
-    def forward(self, x):
-        feat_p = self.conv_p1(x)
-        feat_p = self.pam(feat_p)
-        feat_p = self.conv_p2(feat_p)
-
-        feat_c = self.conv_c1(x)
-        feat_c = self.cam(feat_c)
-        feat_c = self.conv_c2(feat_c)
-
-        feat_fusion = feat_p + feat_c
-
-        outputs = []
-        fusion_out = self.out(feat_fusion)
-        outputs.append(fusion_out)
-        if self.aux:
-            p_out = self.conv_p3(feat_p)
-            c_out = self.conv_c3(feat_c)
-            outputs.append(p_out)
-            outputs.append(c_out)
-
-        return tuple(outputs)
-
-
-def get_danet(dataset='citys', backbone='resnet50', pretrained=False,
-              root='~/.torch/models', pretrained_base=True, **kwargs):
-    r"""Dual Attention Network
-
-    Parameters
-    ----------
-    dataset : str, default pascal_voc
-        The dataset that model pretrained on. (pascal_voc, ade20k)
-    pretrained : bool or str
-        Boolean value controls whether to load the default pretrained weights for model.
-        String value represents the hashtag for a certain version of pretrained weights.
-    root : str, default '~/.torch/models'
-        Location for keeping the model parameters.
-    pretrained_base : bool or str, default True
-        This will load pretrained backbone network, that was trained on ImageNet.
-    Examples
-    --------
-    >>> model = get_danet(dataset='pascal_voc', backbone='resnet50', pretrained=False)
-    >>> print(model)
-    """
-    acronyms = {
-        'pascal_voc': 'pascal_voc',
-        'pascal_aug': 'pascal_aug',
-        'ade20k': 'ade',
-        'coco': 'coco',
-        'citys': 'citys',
-    }
-    from ..data.dataloader import datasets
-    model = DANet(datasets[dataset].NUM_CLASS, backbone=backbone, pretrained_base=pretrained_base, **kwargs)
-    if pretrained:
-        from .model_store import get_model_file
-        device = torch.device(kwargs['local_rank'])
-        model.load_state_dict(torch.load(get_model_file('danet_%s_%s' % (backbone, acronyms[dataset]), root=root),
-                              map_location=device))
-    return model
-
-
-def get_danet_resnet50_citys(**kwargs):
-    return get_danet('citys', 'resnet50', **kwargs)
-
-
-def get_danet_resnet101_citys(**kwargs):
-    return get_danet('citys', 'resnet101', **kwargs)
-
-
-def get_danet_resnet152_citys(**kwargs):
-    return get_danet('citys', 'resnet152', **kwargs)
-
-
-if __name__ == '__main__':
-    img = torch.randn(2, 3, 480, 480)
-    model = get_danet_resnet50_citys()
-    outputs = model(img)
diff --git a/core/models/deeplabv3.py b/core/models/deeplabv3.py
deleted file mode 100644
index 98d0c02a3..000000000
--- a/core/models/deeplabv3.py
+++ /dev/null
@@ -1,185 +0,0 @@
-"""Pyramid Scene Parsing Network"""
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-
-from .segbase import SegBaseModel
-from .fcn import _FCNHead
-
-__all__ = ['DeepLabV3', 'get_deeplabv3', 'get_deeplabv3_resnet50_voc', 'get_deeplabv3_resnet101_voc',
-           'get_deeplabv3_resnet152_voc', 'get_deeplabv3_resnet50_ade', 'get_deeplabv3_resnet101_ade',
-           'get_deeplabv3_resnet152_ade']
-
-
-class DeepLabV3(SegBaseModel):
-    r"""DeepLabV3
-
-    Parameters
-    ----------
-    nclass : int
-        Number of categories for the training dataset.
-    backbone : string
-        Pre-trained dilated backbone network type (default:'resnet50'; 'resnet50',
-        'resnet101' or 'resnet152').
-    norm_layer : object
-        Normalization layer used in backbone network (default: :class:`nn.BatchNorm`;
-        for Synchronized Cross-GPU BachNormalization).
-    aux : bool
-        Auxiliary loss.
-
-    Reference:
-        Chen, Liang-Chieh, et al. "Rethinking atrous convolution for semantic image segmentation."
-        arXiv preprint arXiv:1706.05587 (2017).
-    """
-
-    def __init__(self, nclass, backbone='resnet50', aux=False, pretrained_base=True, **kwargs):
-        super(DeepLabV3, self).__init__(nclass, aux, backbone, pretrained_base=pretrained_base, **kwargs)
-        self.head = _DeepLabHead(nclass, **kwargs)
-        if self.aux:
-            self.auxlayer = _FCNHead(1024, nclass, **kwargs)
-
-        self.__setattr__('exclusive', ['head', 'auxlayer'] if aux else ['head'])
-
-    def forward(self, x):
-        size = x.size()[2:]
-        _, _, c3, c4 = self.base_forward(x)
-        outputs = []
-        x = self.head(c4)
-        x = F.interpolate(x, size, mode='bilinear', align_corners=True)
-        outputs.append(x)
-
-        if self.aux:
-            auxout = self.auxlayer(c3)
-            auxout = F.interpolate(auxout, size, mode='bilinear', align_corners=True)
-            outputs.append(auxout)
-        return tuple(outputs)
-
-
-class _DeepLabHead(nn.Module):
-    def __init__(self, nclass, norm_layer=nn.BatchNorm2d, norm_kwargs=None, **kwargs):
-        super(_DeepLabHead, self).__init__()
-        self.aspp = _ASPP(2048, [12, 24, 36], norm_layer=norm_layer, norm_kwargs=norm_kwargs, **kwargs)
-        self.block = nn.Sequential(
-            nn.Conv2d(256, 256, 3, padding=1, bias=False),
-            norm_layer(256, **({} if norm_kwargs is None else norm_kwargs)),
-            nn.ReLU(True),
-            nn.Dropout(0.1),
-            nn.Conv2d(256, nclass, 1)
-        )
-
-    def forward(self, x):
-        x = self.aspp(x)
-        return self.block(x)
-
-
-class _ASPPConv(nn.Module):
-    def __init__(self, in_channels, out_channels, atrous_rate, norm_layer, norm_kwargs):
-        super(_ASPPConv, self).__init__()
-        self.block = nn.Sequential(
-            nn.Conv2d(in_channels, out_channels, 3, padding=atrous_rate, dilation=atrous_rate, bias=False),
-            norm_layer(out_channels, **({} if norm_kwargs is None else norm_kwargs)),
-            nn.ReLU(True)
-        )
-
-    def forward(self, x):
-        return self.block(x)
-
-
-class _AsppPooling(nn.Module):
-    def __init__(self, in_channels, out_channels, norm_layer, norm_kwargs, **kwargs):
-        super(_AsppPooling, self).__init__()
-        self.gap = nn.Sequential(
-            nn.AdaptiveAvgPool2d(1),
-            nn.Conv2d(in_channels, out_channels, 1, bias=False),
-            norm_layer(out_channels, **({} if norm_kwargs is None else norm_kwargs)),
-            nn.ReLU(True)
-        )
-
-    def forward(self, x):
-        size = x.size()[2:]
-        pool = self.gap(x)
-        out = F.interpolate(pool, size, mode='bilinear', align_corners=True)
-        return out
-
-
-class _ASPP(nn.Module):
-    def __init__(self, in_channels, atrous_rates, norm_layer, norm_kwargs, **kwargs):
-        super(_ASPP, self).__init__()
-        out_channels = 256
-        self.b0 = nn.Sequential(
-            nn.Conv2d(in_channels, out_channels, 1, bias=False),
-            norm_layer(out_channels, **({} if norm_kwargs is None else norm_kwargs)),
-            nn.ReLU(True)
-        )
-
-        rate1, rate2, rate3 = tuple(atrous_rates)
-        self.b1 = _ASPPConv(in_channels, out_channels, rate1, norm_layer, norm_kwargs)
-        self.b2 = _ASPPConv(in_channels, out_channels, rate2, norm_layer, norm_kwargs)
-        self.b3 = _ASPPConv(in_channels, out_channels, rate3, norm_layer, norm_kwargs)
-        self.b4 = _AsppPooling(in_channels, out_channels, norm_layer=norm_layer, norm_kwargs=norm_kwargs)
-
-        self.project = nn.Sequential(
-            nn.Conv2d(5 * out_channels, out_channels, 1, bias=False),
-            norm_layer(out_channels, **({} if norm_kwargs is None else norm_kwargs)),
-            nn.ReLU(True),
-            nn.Dropout(0.5)
-        )
-
-    def forward(self, x):
-        feat1 = self.b0(x)
-        feat2 = self.b1(x)
-        feat3 = self.b2(x)
-        feat4 = self.b3(x)
-        feat5 = self.b4(x)
-        x = torch.cat((feat1, feat2, feat3, feat4, feat5), dim=1)
-        x = self.project(x)
-        return x
-
-
-def get_deeplabv3(dataset='pascal_voc', backbone='resnet50', pretrained=False, root='~/.torch/models',
-                  pretrained_base=True, **kwargs):
-    acronyms = {
-        'pascal_voc': 'pascal_voc',
-        'pascal_aug': 'pascal_aug',
-        'ade20k': 'ade',
-        'coco': 'coco',
-        'citys': 'citys',
-    }
-    from ..data.dataloader import datasets
-    model = DeepLabV3(datasets[dataset].NUM_CLASS, backbone=backbone, pretrained_base=pretrained_base, **kwargs)
-    if pretrained:
-        from .model_store import get_model_file
-        device = torch.device(kwargs['local_rank'])
-        model.load_state_dict(torch.load(get_model_file('deeplabv3_%s_%s' % (backbone, acronyms[dataset]), root=root),
-                              map_location=device))
-    return model
-
-
-def get_deeplabv3_resnet50_voc(**kwargs):
-    return get_deeplabv3('pascal_voc', 'resnet50', **kwargs)
-
-
-def get_deeplabv3_resnet101_voc(**kwargs):
-    return get_deeplabv3('pascal_voc', 'resnet101', **kwargs)
-
-
-def get_deeplabv3_resnet152_voc(**kwargs):
-    return get_deeplabv3('pascal_voc', 'resnet152', **kwargs)
-
-
-def get_deeplabv3_resnet50_ade(**kwargs):
-    return get_deeplabv3('ade20k', 'resnet50', **kwargs)
-
-
-def get_deeplabv3_resnet101_ade(**kwargs):
-    return get_deeplabv3('ade20k', 'resnet101', **kwargs)
-
-
-def get_deeplabv3_resnet152_ade(**kwargs):
-    return get_deeplabv3('ade20k', 'resnet152', **kwargs)
-
-
-if __name__ == '__main__':
-    model = get_deeplabv3_resnet50_voc()
-    img = torch.randn(2, 3, 480, 480)
-    output = model(img)
diff --git a/core/models/deeplabv3_plus.py b/core/models/deeplabv3_plus.py
deleted file mode 100644
index 9b5a70355..000000000
--- a/core/models/deeplabv3_plus.py
+++ /dev/null
@@ -1,142 +0,0 @@
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-
-from .base_models.xception import get_xception
-from .deeplabv3 import _ASPP
-from .fcn import _FCNHead
-from ..nn import _ConvBNReLU
-
-__all__ = ['DeepLabV3Plus', 'get_deeplabv3_plus', 'get_deeplabv3_plus_xception_voc']
-
-
-class DeepLabV3Plus(nn.Module):
-    r"""DeepLabV3Plus
-    Parameters
-    ----------
-    nclass : int
-        Number of categories for the training dataset.
-    backbone : string
-        Pre-trained dilated backbone network type (default:'xception').
-    norm_layer : object
-        Normalization layer used in backbone network (default: :class:`nn.BatchNorm`;
-        for Synchronized Cross-GPU BachNormalization).
-    aux : bool
-        Auxiliary loss.
-
-    Reference:
-        Chen, Liang-Chieh, et al. "Encoder-Decoder with Atrous Separable Convolution for Semantic
-        Image Segmentation."
-    """
-
-    def __init__(self, nclass, backbone='xception', aux=True, pretrained_base=True, dilated=True, **kwargs):
-        super(DeepLabV3Plus, self).__init__()
-        self.aux = aux
-        self.nclass = nclass
-        output_stride = 8 if dilated else 32
-
-        self.pretrained = get_xception(pretrained=pretrained_base, output_stride=output_stride, **kwargs)
-
-        # deeplabv3 plus
-        self.head = _DeepLabHead(nclass, **kwargs)
-        if aux:
-            self.auxlayer = _FCNHead(728, nclass, **kwargs)
-
-    def base_forward(self, x):
-        # Entry flow
-        x = self.pretrained.conv1(x)
-        x = self.pretrained.bn1(x)
-        x = self.pretrained.relu(x)
-
-        x = self.pretrained.conv2(x)
-        x = self.pretrained.bn2(x)
-        x = self.pretrained.relu(x)
-
-        x = self.pretrained.block1(x)
-        # add relu here
-        x = self.pretrained.relu(x)
-        low_level_feat = x
-
-        x = self.pretrained.block2(x)
-        x = self.pretrained.block3(x)
-
-        # Middle flow
-        x = self.pretrained.midflow(x)
-        mid_level_feat = x
-
-        # Exit flow
-        x = self.pretrained.block20(x)
-        x = self.pretrained.relu(x)
-        x = self.pretrained.conv3(x)
-        x = self.pretrained.bn3(x)
-        x = self.pretrained.relu(x)
-
-        x = self.pretrained.conv4(x)
-        x = self.pretrained.bn4(x)
-        x = self.pretrained.relu(x)
-
-        x = self.pretrained.conv5(x)
-        x = self.pretrained.bn5(x)
-        x = self.pretrained.relu(x)
-        return low_level_feat, mid_level_feat, x
-
-    def forward(self, x):
-        size = x.size()[2:]
-        c1, c3, c4 = self.base_forward(x)
-        outputs = list()
-        x = self.head(c4, c1)
-        x = F.interpolate(x, size, mode='bilinear', align_corners=True)
-        outputs.append(x)
-        if self.aux:
-            auxout = self.auxlayer(c3)
-            auxout = F.interpolate(auxout, size, mode='bilinear', align_corners=True)
-            outputs.append(auxout)
-        return tuple(outputs)
-
-
-class _DeepLabHead(nn.Module):
-    def __init__(self, nclass, c1_channels=128, norm_layer=nn.BatchNorm2d, **kwargs):
-        super(_DeepLabHead, self).__init__()
-        self.aspp = _ASPP(2048, [12, 24, 36], norm_layer=norm_layer, **kwargs)
-        self.c1_block = _ConvBNReLU(c1_channels, 48, 3, padding=1, norm_layer=norm_layer)
-        self.block = nn.Sequential(
-            _ConvBNReLU(304, 256, 3, padding=1, norm_layer=norm_layer),
-            nn.Dropout(0.5),
-            _ConvBNReLU(256, 256, 3, padding=1, norm_layer=norm_layer),
-            nn.Dropout(0.1),
-            nn.Conv2d(256, nclass, 1))
-
-    def forward(self, x, c1):
-        size = c1.size()[2:]
-        c1 = self.c1_block(c1)
-        x = self.aspp(x)
-        x = F.interpolate(x, size, mode='bilinear', align_corners=True)
-        return self.block(torch.cat([x, c1], dim=1))
-
-
-def get_deeplabv3_plus(dataset='pascal_voc', backbone='xception', pretrained=False, root='~/.torch/models',
-                       pretrained_base=True, **kwargs):
-    acronyms = {
-        'pascal_voc': 'pascal_voc',
-        'pascal_aug': 'pascal_aug',
-        'ade20k': 'ade',
-        'coco': 'coco',
-        'citys': 'citys',
-    }
-    from ..data.dataloader import datasets
-    model = DeepLabV3Plus(datasets[dataset].NUM_CLASS, backbone=backbone, pretrained_base=pretrained_base, **kwargs)
-    if pretrained:
-        from .model_store import get_model_file
-        device = torch.device(kwargs['local_rank'])
-        model.load_state_dict(
-            torch.load(get_model_file('deeplabv3_plus_%s_%s' % (backbone, acronyms[dataset]), root=root),
-                map_location=device))
-    return model
-
-
-def get_deeplabv3_plus_xception_voc(**kwargs):
-    return get_deeplabv3_plus('pascal_voc', 'xception', **kwargs)
-
-
-if __name__ == '__main__':
-    model = get_deeplabv3_plus_xception_voc()
diff --git a/core/models/denseaspp.py b/core/models/denseaspp.py
deleted file mode 100644
index bc0ef927b..000000000
--- a/core/models/denseaspp.py
+++ /dev/null
@@ -1,178 +0,0 @@
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-
-from .base_models.densenet import *
-from .fcn import _FCNHead
-
-__all__ = ['DenseASPP', 'get_denseaspp', 'get_denseaspp_densenet121_citys',
-           'get_denseaspp_densenet161_citys', 'get_denseaspp_densenet169_citys', 'get_denseaspp_densenet201_citys']
-
-
-class DenseASPP(nn.Module):
-    def __init__(self, nclass, backbone='densenet121', aux=False, jpu=False,
-                 pretrained_base=True, dilate_scale=8, **kwargs):
-        super(DenseASPP, self).__init__()
-        self.nclass = nclass
-        self.aux = aux
-        self.dilate_scale = dilate_scale
-        if backbone == 'densenet121':
-            self.pretrained = dilated_densenet121(dilate_scale, pretrained=pretrained_base, **kwargs)
-        elif backbone == 'densenet161':
-            self.pretrained = dilated_densenet161(dilate_scale, pretrained=pretrained_base, **kwargs)
-        elif backbone == 'densenet169':
-            self.pretrained = dilated_densenet169(dilate_scale, pretrained=pretrained_base, **kwargs)
-        elif backbone == 'densenet201':
-            self.pretrained = dilated_densenet201(dilate_scale, pretrained=pretrained_base, **kwargs)
-        else:
-            raise RuntimeError('unknown backbone: {}'.format(backbone))
-        in_channels = self.pretrained.num_features
-
-        self.head = _DenseASPPHead(in_channels, nclass)
-
-        if aux:
-            self.auxlayer = _FCNHead(in_channels, nclass, **kwargs)
-
-        self.__setattr__('exclusive', ['head', 'auxlayer'] if aux else ['head'])
-
-    def forward(self, x):
-        size = x.size()[2:]
-        features = self.pretrained.features(x)
-        if self.dilate_scale > 8:
-            features = F.interpolate(features, scale_factor=2, mode='bilinear', align_corners=True)
-        outputs = []
-        x = self.head(features)
-        x = F.interpolate(x, size, mode='bilinear', align_corners=True)
-        outputs.append(x)
-
-        if self.aux:
-            auxout = self.auxlayer(features)
-            auxout = F.interpolate(auxout, size, mode='bilinear', align_corners=True)
-            outputs.append(auxout)
-        return tuple(outputs)
-
-
-class _DenseASPPHead(nn.Module):
-    def __init__(self, in_channels, nclass, norm_layer=nn.BatchNorm2d, norm_kwargs=None, **kwargs):
-        super(_DenseASPPHead, self).__init__()
-        self.dense_aspp_block = _DenseASPPBlock(in_channels, 256, 64, norm_layer, norm_kwargs)
-        self.block = nn.Sequential(
-            nn.Dropout(0.1),
-            nn.Conv2d(in_channels + 5 * 64, nclass, 1)
-        )
-
-    def forward(self, x):
-        x = self.dense_aspp_block(x)
-        return self.block(x)
-
-
-class _DenseASPPConv(nn.Sequential):
-    def __init__(self, in_channels, inter_channels, out_channels, atrous_rate,
-                 drop_rate=0.1, norm_layer=nn.BatchNorm2d, norm_kwargs=None):
-        super(_DenseASPPConv, self).__init__()
-        self.add_module('conv1', nn.Conv2d(in_channels, inter_channels, 1)),
-        self.add_module('bn1', norm_layer(inter_channels, **({} if norm_kwargs is None else norm_kwargs))),
-        self.add_module('relu1', nn.ReLU(True)),
-        self.add_module('conv2', nn.Conv2d(inter_channels, out_channels, 3, dilation=atrous_rate, padding=atrous_rate)),
-        self.add_module('bn2', norm_layer(out_channels, **({} if norm_kwargs is None else norm_kwargs))),
-        self.add_module('relu2', nn.ReLU(True)),
-        self.drop_rate = drop_rate
-
-    def forward(self, x):
-        features = super(_DenseASPPConv, self).forward(x)
-        if self.drop_rate > 0:
-            features = F.dropout(features, p=self.drop_rate, training=self.training)
-        return features
-
-
-class _DenseASPPBlock(nn.Module):
-    def __init__(self, in_channels, inter_channels1, inter_channels2,
-                 norm_layer=nn.BatchNorm2d, norm_kwargs=None):
-        super(_DenseASPPBlock, self).__init__()
-        self.aspp_3 = _DenseASPPConv(in_channels, inter_channels1, inter_channels2, 3, 0.1,
-                                     norm_layer, norm_kwargs)
-        self.aspp_6 = _DenseASPPConv(in_channels + inter_channels2 * 1, inter_channels1, inter_channels2, 6, 0.1,
-                                     norm_layer, norm_kwargs)
-        self.aspp_12 = _DenseASPPConv(in_channels + inter_channels2 * 2, inter_channels1, inter_channels2, 12, 0.1,
-                                      norm_layer, norm_kwargs)
-        self.aspp_18 = _DenseASPPConv(in_channels + inter_channels2 * 3, inter_channels1, inter_channels2, 18, 0.1,
-                                      norm_layer, norm_kwargs)
-        self.aspp_24 = _DenseASPPConv(in_channels + inter_channels2 * 4, inter_channels1, inter_channels2, 24, 0.1,
-                                      norm_layer, norm_kwargs)
-
-    def forward(self, x):
-        aspp3 = self.aspp_3(x)
-        x = torch.cat([aspp3, x], dim=1)
-
-        aspp6 = self.aspp_6(x)
-        x = torch.cat([aspp6, x], dim=1)
-
-        aspp12 = self.aspp_12(x)
-        x = torch.cat([aspp12, x], dim=1)
-
-        aspp18 = self.aspp_18(x)
-        x = torch.cat([aspp18, x], dim=1)
-
-        aspp24 = self.aspp_24(x)
-        x = torch.cat([aspp24, x], dim=1)
-
-        return x
-
-
-def get_denseaspp(dataset='citys', backbone='densenet121', pretrained=False,
-                  root='~/.torch/models', pretrained_base=True, **kwargs):
-    r"""DenseASPP
-
-    Parameters
-    ----------
-    dataset : str, default citys
-        The dataset that model pretrained on. (pascal_voc, ade20k)
-    pretrained : bool or str
-        Boolean value controls whether to load the default pretrained weights for model.
-        String value represents the hashtag for a certain version of pretrained weights.
-    root : str, default '~/.torch/models'
-        Location for keeping the model parameters.
-    pretrained_base : bool or str, default True
-        This will load pretrained backbone network, that was trained on ImageNet.
-    Examples
-    --------
-    >>> model = get_denseaspp(dataset='citys', backbone='densenet121', pretrained=False)
-    >>> print(model)
-    """
-    acronyms = {
-        'pascal_voc': 'pascal_voc',
-        'pascal_aug': 'pascal_aug',
-        'ade20k': 'ade',
-        'coco': 'coco',
-        'citys': 'citys',
-    }
-    from ..data.dataloader import datasets
-    model = DenseASPP(datasets[dataset].NUM_CLASS, backbone=backbone, pretrained_base=pretrained_base, **kwargs)
-    if pretrained:
-        from .model_store import get_model_file
-        device = torch.device(kwargs['local_rank'])
-        model.load_state_dict(torch.load(get_model_file('denseaspp_%s_%s' % (backbone, acronyms[dataset]), root=root),
-                              map_location=device))
-    return model
-
-
-def get_denseaspp_densenet121_citys(**kwargs):
-    return get_denseaspp('citys', 'densenet121', **kwargs)
-
-
-def get_denseaspp_densenet161_citys(**kwargs):
-    return get_denseaspp('citys', 'densenet161', **kwargs)
-
-
-def get_denseaspp_densenet169_citys(**kwargs):
-    return get_denseaspp('citys', 'densenet169', **kwargs)
-
-
-def get_denseaspp_densenet201_citys(**kwargs):
-    return get_denseaspp('citys', 'densenet201', **kwargs)
-
-
-if __name__ == '__main__':
-    img = torch.randn(2, 3, 480, 480)
-    model = get_denseaspp_densenet121_citys()
-    outputs = model(img)
diff --git a/core/models/dfanet.py b/core/models/dfanet.py
deleted file mode 100644
index dd43bff0f..000000000
--- a/core/models/dfanet.py
+++ /dev/null
@@ -1,111 +0,0 @@
-""" Deep Feature Aggregation"""
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-
-from core.models.base_models import Enc, FCAttention, get_xception_a
-from core.nn import _ConvBNReLU
-
-__all__ = ['DFANet', 'get_dfanet', 'get_dfanet_citys']
-
-
-class DFANet(nn.Module):
-    def __init__(self, nclass, backbone='', aux=False, jpu=False, pretrained_base=False, **kwargs):
-        super(DFANet, self).__init__()
-        self.pretrained = get_xception_a(pretrained_base, **kwargs)
-
-        self.enc2_2 = Enc(240, 48, 4, **kwargs)
-        self.enc3_2 = Enc(144, 96, 6, **kwargs)
-        self.enc4_2 = Enc(288, 192, 4, **kwargs)
-        self.fca_2 = FCAttention(192, **kwargs)
-
-        self.enc2_3 = Enc(240, 48, 4, **kwargs)
-        self.enc3_3 = Enc(144, 96, 6, **kwargs)
-        self.enc3_4 = Enc(288, 192, 4, **kwargs)
-        self.fca_3 = FCAttention(192, **kwargs)
-
-        self.enc2_1_reduce = _ConvBNReLU(48, 32, 1, **kwargs)
-        self.enc2_2_reduce = _ConvBNReLU(48, 32, 1, **kwargs)
-        self.enc2_3_reduce = _ConvBNReLU(48, 32, 1, **kwargs)
-        self.conv_fusion = _ConvBNReLU(32, 32, 1, **kwargs)
-
-        self.fca_1_reduce = _ConvBNReLU(192, 32, 1, **kwargs)
-        self.fca_2_reduce = _ConvBNReLU(192, 32, 1, **kwargs)
-        self.fca_3_reduce = _ConvBNReLU(192, 32, 1, **kwargs)
-        self.conv_out = nn.Conv2d(32, nclass, 1)
-
-        self.__setattr__('exclusive', ['enc2_2', 'enc3_2', 'enc4_2', 'fca_2', 'enc2_3', 'enc3_3', 'enc3_4', 'fca_3',
-                                       'enc2_1_reduce', 'enc2_2_reduce', 'enc2_3_reduce', 'conv_fusion', 'fca_1_reduce',
-                                       'fca_2_reduce', 'fca_3_reduce', 'conv_out'])
-
-    def forward(self, x):
-        # backbone
-        stage1_conv1 = self.pretrained.conv1(x)
-        stage1_enc2 = self.pretrained.enc2(stage1_conv1)
-        stage1_enc3 = self.pretrained.enc3(stage1_enc2)
-        stage1_enc4 = self.pretrained.enc4(stage1_enc3)
-        stage1_fca = self.pretrained.fca(stage1_enc4)
-        stage1_out = F.interpolate(stage1_fca, scale_factor=4, mode='bilinear', align_corners=True)
-
-        # stage2
-        stage2_enc2 = self.enc2_2(torch.cat([stage1_enc2, stage1_out], dim=1))
-        stage2_enc3 = self.enc3_2(torch.cat([stage1_enc3, stage2_enc2], dim=1))
-        stage2_enc4 = self.enc4_2(torch.cat([stage1_enc4, stage2_enc3], dim=1))
-        stage2_fca = self.fca_2(stage2_enc4)
-        stage2_out = F.interpolate(stage2_fca, scale_factor=4, mode='bilinear', align_corners=True)
-
-        # stage3
-        stage3_enc2 = self.enc2_3(torch.cat([stage2_enc2, stage2_out], dim=1))
-        stage3_enc3 = self.enc3_3(torch.cat([stage2_enc3, stage3_enc2], dim=1))
-        stage3_enc4 = self.enc3_4(torch.cat([stage2_enc4, stage3_enc3], dim=1))
-        stage3_fca = self.fca_3(stage3_enc4)
-
-        stage1_enc2_decoder = self.enc2_1_reduce(stage1_enc2)
-        stage2_enc2_docoder = F.interpolate(self.enc2_2_reduce(stage2_enc2), scale_factor=2,
-                                            mode='bilinear', align_corners=True)
-        stage3_enc2_decoder = F.interpolate(self.enc2_3_reduce(stage3_enc2), scale_factor=4,
-                                            mode='bilinear', align_corners=True)
-        fusion = stage1_enc2_decoder + stage2_enc2_docoder + stage3_enc2_decoder
-        fusion = self.conv_fusion(fusion)
-
-        stage1_fca_decoder = F.interpolate(self.fca_1_reduce(stage1_fca), scale_factor=4,
-                                           mode='bilinear', align_corners=True)
-        stage2_fca_decoder = F.interpolate(self.fca_2_reduce(stage2_fca), scale_factor=8,
-                                           mode='bilinear', align_corners=True)
-        stage3_fca_decoder = F.interpolate(self.fca_3_reduce(stage3_fca), scale_factor=16,
-                                           mode='bilinear', align_corners=True)
-        fusion = fusion + stage1_fca_decoder + stage2_fca_decoder + stage3_fca_decoder
-
-        outputs = list()
-        out = self.conv_out(fusion)
-        out = F.interpolate(out, scale_factor=4, mode='bilinear', align_corners=True)
-        outputs.append(out)
-
-        return tuple(outputs)
-
-
-def get_dfanet(dataset='citys', backbone='', pretrained=False, root='~/.torch/models',
-               pretrained_base=True, **kwargs):
-    acronyms = {
-        'pascal_voc': 'pascal_voc',
-        'pascal_aug': 'pascal_aug',
-        'ade20k': 'ade',
-        'coco': 'coco',
-        'citys': 'citys',
-    }
-    from ..data.dataloader import datasets
-    model = DFANet(datasets[dataset].NUM_CLASS, backbone=backbone, pretrained_base=pretrained_base, **kwargs)
-    if pretrained:
-        from .model_store import get_model_file
-        device = torch.device(kwargs['local_rank'])
-        model.load_state_dict(torch.load(get_model_file('dfanet_%s' % (acronyms[dataset]), root=root),
-                              map_location=device))
-    return model
-
-
-def get_dfanet_citys(**kwargs):
-    return get_dfanet('citys', **kwargs)
-
-
-if __name__ == '__main__':
-    model = get_dfanet_citys()
diff --git a/core/models/dunet.py b/core/models/dunet.py
deleted file mode 100644
index ed1eb9cb1..000000000
--- a/core/models/dunet.py
+++ /dev/null
@@ -1,155 +0,0 @@
-"""Decoders Matter for Semantic Segmentation"""
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-
-from .segbase import SegBaseModel
-from .fcn import _FCNHead
-
-__all__ = ['DUNet', 'get_dunet', 'get_dunet_resnet50_pascal_voc',
-           'get_dunet_resnet101_pascal_voc', 'get_dunet_resnet152_pascal_voc']
-
-
-# The model may be wrong because lots of details missing in paper.
-class DUNet(SegBaseModel):
-    """Decoders Matter for Semantic Segmentation
-
-    Reference:
-        Zhi Tian, Tong He, Chunhua Shen, and Youliang Yan.
-        "Decoders Matter for Semantic Segmentation:
-        Data-Dependent Decoding Enables Flexible Feature Aggregation." CVPR, 2019
-    """
-
-    def __init__(self, nclass, backbone='resnet50', aux=True, pretrained_base=True, **kwargs):
-        super(DUNet, self).__init__(nclass, aux, backbone, pretrained_base=pretrained_base, **kwargs)
-        self.head = _DUHead(2144, **kwargs)
-        self.dupsample = DUpsampling(256, nclass, scale_factor=8, **kwargs)
-        if aux:
-            self.auxlayer = _FCNHead(1024, 256, **kwargs)
-            self.aux_dupsample = DUpsampling(256, nclass, scale_factor=8, **kwargs)
-
-        self.__setattr__('exclusive',
-                         ['dupsample', 'head', 'auxlayer', 'aux_dupsample'] if aux else ['dupsample', 'head'])
-
-    def forward(self, x):
-        c1, c2, c3, c4 = self.base_forward(x)
-        outputs = []
-        x = self.head(c2, c3, c4)
-        x = self.dupsample(x)
-        outputs.append(x)
-
-        if self.aux:
-            auxout = self.auxlayer(c3)
-            auxout = self.aux_dupsample(auxout)
-            outputs.append(auxout)
-        return tuple(outputs)
-
-
-class FeatureFused(nn.Module):
-    """Module for fused features"""
-
-    def __init__(self, inter_channels=48, norm_layer=nn.BatchNorm2d, **kwargs):
-        super(FeatureFused, self).__init__()
-        self.conv2 = nn.Sequential(
-            nn.Conv2d(512, inter_channels, 1, bias=False),
-            norm_layer(inter_channels),
-            nn.ReLU(True)
-        )
-        self.conv3 = nn.Sequential(
-            nn.Conv2d(1024, inter_channels, 1, bias=False),
-            norm_layer(inter_channels),
-            nn.ReLU(True)
-        )
-
-    def forward(self, c2, c3, c4):
-        size = c4.size()[2:]
-        c2 = self.conv2(F.interpolate(c2, size, mode='bilinear', align_corners=True))
-        c3 = self.conv3(F.interpolate(c3, size, mode='bilinear', align_corners=True))
-        fused_feature = torch.cat([c4, c3, c2], dim=1)
-        return fused_feature
-
-
-class _DUHead(nn.Module):
-    def __init__(self, in_channels, norm_layer=nn.BatchNorm2d, **kwargs):
-        super(_DUHead, self).__init__()
-        self.fuse = FeatureFused(norm_layer=norm_layer, **kwargs)
-        self.block = nn.Sequential(
-            nn.Conv2d(in_channels, 256, 3, padding=1, bias=False),
-            norm_layer(256),
-            nn.ReLU(True),
-            nn.Conv2d(256, 256, 3, padding=1, bias=False),
-            norm_layer(256),
-            nn.ReLU(True)
-        )
-
-    def forward(self, c2, c3, c4):
-        fused_feature = self.fuse(c2, c3, c4)
-        out = self.block(fused_feature)
-        return out
-
-
-class DUpsampling(nn.Module):
-    """DUsampling module"""
-
-    def __init__(self, in_channels, out_channels, scale_factor=2, **kwargs):
-        super(DUpsampling, self).__init__()
-        self.scale_factor = scale_factor
-        self.conv_w = nn.Conv2d(in_channels, out_channels * scale_factor * scale_factor, 1, bias=False)
-
-    def forward(self, x):
-        x = self.conv_w(x)
-        n, c, h, w = x.size()
-
-        # N, C, H, W --> N, W, H, C
-        x = x.permute(0, 3, 2, 1).contiguous()
-
-        # N, W, H, C --> N, W, H * scale, C // scale
-        x = x.view(n, w, h * self.scale_factor, c // self.scale_factor)
-
-        # N, W, H * scale, C // scale --> N, H * scale, W, C // scale
-        x = x.permute(0, 2, 1, 3).contiguous()
-
-        # N, H * scale, W, C // scale --> N, H * scale, W * scale, C // (scale ** 2)
-        x = x.view(n, h * self.scale_factor, w * self.scale_factor, c // (self.scale_factor * self.scale_factor))
-
-        # N, H * scale, W * scale, C // (scale ** 2) -- > N, C // (scale ** 2), H * scale, W * scale
-        x = x.permute(0, 3, 1, 2)
-
-        return x
-
-
-def get_dunet(dataset='pascal_voc', backbone='resnet50', pretrained=False,
-              root='~/.torch/models', pretrained_base=True, **kwargs):
-    acronyms = {
-        'pascal_voc': 'pascal_voc',
-        'pascal_aug': 'pascal_aug',
-        'ade20k': 'ade',
-        'coco': 'coco',
-        'citys': 'citys',
-    }
-    from ..data.dataloader import datasets
-    model = DUNet(datasets[dataset].NUM_CLASS, backbone=backbone, pretrained_base=pretrained_base, **kwargs)
-    if pretrained:
-        from .model_store import get_model_file
-        device = torch.device(kwargs['local_rank'])
-        model.load_state_dict(torch.load(get_model_file('dunet_%s_%s' % (backbone, acronyms[dataset]), root=root),
-                              map_location=device))
-    return model
-
-
-def get_dunet_resnet50_pascal_voc(**kwargs):
-    return get_dunet('pascal_voc', 'resnet50', **kwargs)
-
-
-def get_dunet_resnet101_pascal_voc(**kwargs):
-    return get_dunet('pascal_voc', 'resnet101', **kwargs)
-
-
-def get_dunet_resnet152_pascal_voc(**kwargs):
-    return get_dunet('pascal_voc', 'resnet152', **kwargs)
-
-
-if __name__ == '__main__':
-    img = torch.randn(2, 3, 256, 256)
-    model = get_dunet_resnet50_pascal_voc()
-    outputs = model(img)
diff --git a/core/models/encnet.py b/core/models/encnet.py
deleted file mode 100644
index 585557bde..000000000
--- a/core/models/encnet.py
+++ /dev/null
@@ -1,212 +0,0 @@
-"""Context Encoding for Semantic Segmentation"""
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-
-from .segbase import SegBaseModel
-from .fcn import _FCNHead
-
-__all__ = ['EncNet', 'EncModule', 'get_encnet', 'get_encnet_resnet50_ade',
-           'get_encnet_resnet101_ade', 'get_encnet_resnet152_ade']
-
-
-class EncNet(SegBaseModel):
-    def __init__(self, nclass, backbone='resnet50', aux=True, se_loss=True, lateral=False,
-                 pretrained_base=True, **kwargs):
-        super(EncNet, self).__init__(nclass, aux, backbone, pretrained_base=pretrained_base, **kwargs)
-        self.head = _EncHead(2048, nclass, se_loss=se_loss, lateral=lateral, **kwargs)
-        if aux:
-            self.auxlayer = _FCNHead(1024, nclass, **kwargs)
-
-        self.__setattr__('exclusive', ['head', 'auxlayer'] if aux else ['head'])
-
-    def forward(self, x):
-        size = x.size()[2:]
-        features = self.base_forward(x)
-
-        x = list(self.head(*features))
-        x[0] = F.interpolate(x[0], size, mode='bilinear', align_corners=True)
-        if self.aux:
-            auxout = self.auxlayer(features[2])
-            auxout = F.interpolate(auxout, size, mode='bilinear', align_corners=True)
-            x.append(auxout)
-        return tuple(x)
-
-
-class _EncHead(nn.Module):
-    def __init__(self, in_channels, nclass, se_loss=True, lateral=True,
-                 norm_layer=nn.BatchNorm2d, norm_kwargs=None, **kwargs):
-        super(_EncHead, self).__init__()
-        self.lateral = lateral
-        self.conv5 = nn.Sequential(
-            nn.Conv2d(in_channels, 512, 3, padding=1, bias=False),
-            norm_layer(512, **({} if norm_kwargs is None else norm_kwargs)),
-            nn.ReLU(True)
-        )
-        if lateral:
-            self.connect = nn.ModuleList([
-                nn.Sequential(
-                    nn.Conv2d(512, 512, 1, bias=False),
-                    norm_layer(512, **({} if norm_kwargs is None else norm_kwargs)),
-                    nn.ReLU(True)),
-                nn.Sequential(
-                    nn.Conv2d(1024, 512, 1, bias=False),
-                    norm_layer(512, **({} if norm_kwargs is None else norm_kwargs)),
-                    nn.ReLU(True)),
-            ])
-            self.fusion = nn.Sequential(
-                nn.Conv2d(3 * 512, 512, 3, padding=1, bias=False),
-                norm_layer(512, **({} if norm_kwargs is None else norm_kwargs)),
-                nn.ReLU(True)
-            )
-        self.encmodule = EncModule(512, nclass, ncodes=32, se_loss=se_loss,
-                                   norm_layer=norm_layer, norm_kwargs=norm_kwargs, **kwargs)
-        self.conv6 = nn.Sequential(
-            nn.Dropout(0.1, False),
-            nn.Conv2d(512, nclass, 1)
-        )
-
-    def forward(self, *inputs):
-        feat = self.conv5(inputs[-1])
-        if self.lateral:
-            c2 = self.connect[0](inputs[1])
-            c3 = self.connect[1](inputs[2])
-            feat = self.fusion(torch.cat([feat, c2, c3], 1))
-        outs = list(self.encmodule(feat))
-        outs[0] = self.conv6(outs[0])
-        return tuple(outs)
-
-
-class EncModule(nn.Module):
-    def __init__(self, in_channels, nclass, ncodes=32, se_loss=True,
-                 norm_layer=nn.BatchNorm2d, norm_kwargs=None, **kwargs):
-        super(EncModule, self).__init__()
-        self.se_loss = se_loss
-        self.encoding = nn.Sequential(
-            nn.Conv2d(in_channels, in_channels, 1, bias=False),
-            norm_layer(in_channels, **({} if norm_kwargs is None else norm_kwargs)),
-            nn.ReLU(True),
-            Encoding(D=in_channels, K=ncodes),
-            nn.BatchNorm1d(ncodes),
-            nn.ReLU(True),
-            Mean(dim=1)
-        )
-        self.fc = nn.Sequential(
-            nn.Linear(in_channels, in_channels),
-            nn.Sigmoid()
-        )
-        if self.se_loss:
-            self.selayer = nn.Linear(in_channels, nclass)
-
-    def forward(self, x):
-        en = self.encoding(x)
-        b, c, _, _ = x.size()
-        gamma = self.fc(en)
-        y = gamma.view(b, c, 1, 1)
-        outputs = [F.relu_(x + x * y)]
-        if self.se_loss:
-            outputs.append(self.selayer(en))
-        return tuple(outputs)
-
-
-class Encoding(nn.Module):
-    def __init__(self, D, K):
-        super(Encoding, self).__init__()
-        # init codewords and smoothing factor
-        self.D, self.K = D, K
-        self.codewords = nn.Parameter(torch.Tensor(K, D), requires_grad=True)
-        self.scale = nn.Parameter(torch.Tensor(K), requires_grad=True)
-        self.reset_params()
-
-    def reset_params(self):
-        std1 = 1. / ((self.K * self.D) ** (1 / 2))
-        self.codewords.data.uniform_(-std1, std1)
-        self.scale.data.uniform_(-1, 0)
-
-    def forward(self, X):
-        # input X is a 4D tensor
-        assert (X.size(1) == self.D)
-        B, D = X.size(0), self.D
-        if X.dim() == 3:
-            # BxDxN -> BxNxD
-            X = X.transpose(1, 2).contiguous()
-        elif X.dim() == 4:
-            # BxDxHxW -> Bx(HW)xD
-            X = X.view(B, D, -1).transpose(1, 2).contiguous()
-        else:
-            raise RuntimeError('Encoding Layer unknown input dims!')
-        # assignment weights BxNxK
-        A = F.softmax(self.scale_l2(X, self.codewords, self.scale), dim=2)
-        # aggregate
-        E = self.aggregate(A, X, self.codewords)
-        return E
-
-    def __repr__(self):
-        return self.__class__.__name__ + '(' \
-               + 'N x' + str(self.D) + '=>' + str(self.K) + 'x' \
-               + str(self.D) + ')'
-
-    @staticmethod
-    def scale_l2(X, C, S):
-        S = S.view(1, 1, C.size(0), 1)
-        X = X.unsqueeze(2).expand(X.size(0), X.size(1), C.size(0), C.size(1))
-        C = C.unsqueeze(0).unsqueeze(0)
-        SL = S * (X - C)
-        SL = SL.pow(2).sum(3)
-        return SL
-
-    @staticmethod
-    def aggregate(A, X, C):
-        A = A.unsqueeze(3)
-        X = X.unsqueeze(2).expand(X.size(0), X.size(1), C.size(0), C.size(1))
-        C = C.unsqueeze(0).unsqueeze(0)
-        E = A * (X - C)
-        E = E.sum(1)
-        return E
-
-
-class Mean(nn.Module):
-    def __init__(self, dim, keep_dim=False):
-        super(Mean, self).__init__()
-        self.dim = dim
-        self.keep_dim = keep_dim
-
-    def forward(self, input):
-        return input.mean(self.dim, self.keep_dim)
-
-
-def get_encnet(dataset='pascal_voc', backbone='resnet50', pretrained=False, root='~/.torch/models',
-               pretrained_base=True, **kwargs):
-    acronyms = {
-        'pascal_voc': 'pascal_voc',
-        'pascal_aug': 'pascal_aug',
-        'ade20k': 'ade',
-        'coco': 'coco',
-        'citys': 'citys',
-    }
-    from ..data.dataloader import datasets
-    model = EncNet(datasets[dataset].NUM_CLASS, backbone=backbone, pretrained_base=pretrained_base, **kwargs)
-    if pretrained:
-        from .model_store import get_model_file
-        device = torch.device(kwargs['local_rank'])
-        model.load_state_dict(torch.load(get_model_file('encnet_%s_%s' % (backbone, acronyms[dataset]), root=root),
-                              map_location=device))
-    return model
-
-
-def get_encnet_resnet50_ade(**kwargs):
-    return get_encnet('ade20k', 'resnet50', **kwargs)
-
-
-def get_encnet_resnet101_ade(**kwargs):
-    return get_encnet('ade20k', 'resnet101', **kwargs)
-
-
-def get_encnet_resnet152_ade(**kwargs):
-    return get_encnet('ade20k', 'resnet152', **kwargs)
-
-
-if __name__ == '__main__':
-    img = torch.randn(2, 3, 224, 224)
-    model = get_encnet_resnet50_ade()
-    outputs = model(img)
diff --git a/core/models/espnet.py b/core/models/espnet.py
deleted file mode 100644
index 051058c1e..000000000
--- a/core/models/espnet.py
+++ /dev/null
@@ -1,117 +0,0 @@
-"ESPNetv2: A Light-weight, Power Efficient, and General Purpose for Semantic Segmentation"
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-
-from core.models.base_models import eespnet, EESP
-from core.nn import _ConvBNPReLU, _BNPReLU
-
-
-class ESPNetV2(nn.Module):
-    r"""ESPNetV2
-
-    Parameters
-    ----------
-    nclass : int
-        Number of categories for the training dataset.
-    backbone : string
-        Pre-trained dilated backbone network type (default:'resnet50'; 'resnet50',
-        'resnet101' or 'resnet152').
-    norm_layer : object
-        Normalization layer used in backbone network (default: :class:`nn.BatchNorm`;
-        for Synchronized Cross-GPU BachNormalization).
-    aux : bool
-        Auxiliary loss.
-
-    Reference:
-        Sachin Mehta, et al. "ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network."
-        arXiv preprint arXiv:1811.11431 (2018).
-    """
-
-    def __init__(self, nclass, backbone='', aux=False, jpu=False, pretrained_base=False, **kwargs):
-        super(ESPNetV2, self).__init__()
-        self.pretrained = eespnet(pretrained=pretrained_base, **kwargs)
-        self.proj_L4_C = _ConvBNPReLU(256, 128, 1, **kwargs)
-        self.pspMod = nn.Sequential(
-            EESP(256, 128, stride=1, k=4, r_lim=7, **kwargs),
-            _PSPModule(128, 128, **kwargs))
-        self.project_l3 = nn.Sequential(
-            nn.Dropout2d(0.1),
-            nn.Conv2d(128, nclass, 1, bias=False))
-        self.act_l3 = _BNPReLU(nclass, **kwargs)
-        self.project_l2 = _ConvBNPReLU(64 + nclass, nclass, 1, **kwargs)
-        self.project_l1 = nn.Sequential(
-            nn.Dropout2d(0.1),
-            nn.Conv2d(32 + nclass, nclass, 1, bias=False))
-
-        self.aux = aux
-
-        self.__setattr__('exclusive', ['proj_L4_C', 'pspMod', 'project_l3', 'act_l3', 'project_l2', 'project_l1'])
-
-    def forward(self, x):
-        size = x.size()[2:]
-        out_l1, out_l2, out_l3, out_l4 = self.pretrained(x, seg=True)
-        out_l4_proj = self.proj_L4_C(out_l4)
-        up_l4_to_l3 = F.interpolate(out_l4_proj, scale_factor=2, mode='bilinear', align_corners=True)
-        merged_l3_upl4 = self.pspMod(torch.cat([out_l3, up_l4_to_l3], 1))
-        proj_merge_l3_bef_act = self.project_l3(merged_l3_upl4)
-        proj_merge_l3 = self.act_l3(proj_merge_l3_bef_act)
-        out_up_l3 = F.interpolate(proj_merge_l3, scale_factor=2, mode='bilinear', align_corners=True)
-        merge_l2 = self.project_l2(torch.cat([out_l2, out_up_l3], 1))
-        out_up_l2 = F.interpolate(merge_l2, scale_factor=2, mode='bilinear', align_corners=True)
-        merge_l1 = self.project_l1(torch.cat([out_l1, out_up_l2], 1))
-
-        outputs = list()
-        merge1_l1 = F.interpolate(merge_l1, scale_factor=2, mode='bilinear', align_corners=True)
-        outputs.append(merge1_l1)
-        if self.aux:
-            # different from paper
-            auxout = F.interpolate(proj_merge_l3_bef_act, size, mode='bilinear', align_corners=True)
-            outputs.append(auxout)
-
-        return tuple(outputs)
-
-
-# different from PSPNet
-class _PSPModule(nn.Module):
-    def __init__(self, in_channels, out_channels=1024, sizes=(1, 2, 4, 8), **kwargs):
-        super(_PSPModule, self).__init__()
-        self.stages = nn.ModuleList(
-            [nn.Conv2d(in_channels, in_channels, 3, 1, 1, groups=in_channels, bias=False) for _ in sizes])
-        self.project = _ConvBNPReLU(in_channels * (len(sizes) + 1), out_channels, 1, 1, **kwargs)
-
-    def forward(self, x):
-        size = x.size()[2:]
-        feats = [x]
-        for stage in self.stages:
-            x = F.avg_pool2d(x, kernel_size=3, stride=2, padding=1)
-            upsampled = F.interpolate(stage(x), size, mode='bilinear', align_corners=True)
-            feats.append(upsampled)
-        return self.project(torch.cat(feats, dim=1))
-
-
-def get_espnet(dataset='pascal_voc', backbone='', pretrained=False, root='~/.torch/models',
-               pretrained_base=False, **kwargs):
-    acronyms = {
-        'pascal_voc': 'pascal_voc',
-        'pascal_aug': 'pascal_aug',
-        'ade20k': 'ade',
-        'coco': 'coco',
-        'citys': 'citys',
-    }
-    from core.data.dataloader import datasets
-    model = ESPNetV2(datasets[dataset].NUM_CLASS, backbone=backbone, pretrained_base=pretrained_base, **kwargs)
-    if pretrained:
-        from .model_store import get_model_file
-        device = torch.device(kwargs['local_rank'])
-        model.load_state_dict(torch.load(get_model_file('espnet_%s_%s' % (backbone, acronyms[dataset]), root=root),
-                              map_location=device))
-    return model
-
-
-def get_espnet_citys(**kwargs):
-    return get_espnet('citys', **kwargs)
-
-
-if __name__ == '__main__':
-    model = get_espnet_citys()
diff --git a/core/models/hrnet.py b/core/models/hrnet.py
deleted file mode 100644
index 8ad08e3f5..000000000
--- a/core/models/hrnet.py
+++ /dev/null
@@ -1,29 +0,0 @@
-"""High-Resolution Representations for Semantic Segmentation"""
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-
-class HRNet(nn.Module):
-    """HRNet
-
-        Parameters
-    ----------
-    nclass : int
-        Number of categories for the training dataset.
-    backbone : string
-        Pre-trained dilated backbone network type (default:'resnet50'; 'resnet50',
-        'resnet101' or 'resnet152').
-    norm_layer : object
-        Normalization layer used in backbone network (default: :class:`nn.BatchNorm`;
-        for Synchronized Cross-GPU BachNormalization).
-    aux : bool
-        Auxiliary loss.
-    Reference:
-        Ke Sun. "High-Resolution Representations for Labeling Pixels and Regions."
-        arXiv preprint arXiv:1904.04514 (2019).
-    """
-    def __init__(self, nclass, backbone='', aux=False, pretrained_base=False, **kwargs):
-        super(HRNet, self).__init__()
-
-    def forward(self, x):
-        pass
\ No newline at end of file
diff --git a/core/models/icnet.py b/core/models/icnet.py
deleted file mode 100644
index 94d03444f..000000000
--- a/core/models/icnet.py
+++ /dev/null
@@ -1,163 +0,0 @@
-"""Image Cascade Network"""
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-
-from .segbase import SegBaseModel
-
-__all__ = ['ICNet', 'get_icnet', 'get_icnet_resnet50_citys',
-           'get_icnet_resnet101_citys', 'get_icnet_resnet152_citys']
-
-
-class ICNet(SegBaseModel):
-    """Image Cascade Network"""
-
-    def __init__(self, nclass, backbone='resnet50', aux=False, jpu=False, pretrained_base=True, **kwargs):
-        super(ICNet, self).__init__(nclass, aux, backbone, pretrained_base=pretrained_base, **kwargs)
-        self.conv_sub1 = nn.Sequential(
-            _ConvBNReLU(3, 32, 3, 2, **kwargs),
-            _ConvBNReLU(32, 32, 3, 2, **kwargs),
-            _ConvBNReLU(32, 64, 3, 2, **kwargs)
-        )
-
-        self.ppm = PyramidPoolingModule()
-
-        self.head = _ICHead(nclass, **kwargs)
-
-        self.__setattr__('exclusive', ['conv_sub1', 'head'])
-
-    def forward(self, x):
-        # sub 1
-        x_sub1 = self.conv_sub1(x)
-
-        # sub 2
-        x_sub2 = F.interpolate(x, scale_factor=0.5, mode='bilinear', align_corners=True)
-        _, x_sub2, _, _ = self.base_forward(x_sub2)
-
-        # sub 4
-        x_sub4 = F.interpolate(x, scale_factor=0.25, mode='bilinear', align_corners=True)
-        _, _, _, x_sub4 = self.base_forward(x_sub4)
-        # add PyramidPoolingModule
-        x_sub4 = self.ppm(x_sub4)   
-        outputs = self.head(x_sub1, x_sub2, x_sub4)
-
-        return tuple(outputs)
-
-class PyramidPoolingModule(nn.Module):
-    def __init__(self, pyramids=[1,2,3,6]):
-        super(PyramidPoolingModule, self).__init__()
-        self.pyramids = pyramids
-
-    def forward(self, input):
-        feat = input
-        height, width = input.shape[2:]
-        for bin_size in self.pyramids:
-            x = F.adaptive_avg_pool2d(input, output_size=bin_size)
-            x = F.interpolate(x, size=(height, width), mode='bilinear', align_corners=True)
-            feat  = feat + x
-        return feat
-
-class _ICHead(nn.Module):
-    def __init__(self, nclass, norm_layer=nn.BatchNorm2d, **kwargs):
-        super(_ICHead, self).__init__()
-        #self.cff_12 = CascadeFeatureFusion(512, 64, 128, nclass, norm_layer, **kwargs)
-        self.cff_12 = CascadeFeatureFusion(128, 64, 128, nclass, norm_layer, **kwargs)  
-        self.cff_24 = CascadeFeatureFusion(2048, 512, 128, nclass, norm_layer, **kwargs)
-
-        self.conv_cls = nn.Conv2d(128, nclass, 1, bias=False)
-
-    def forward(self, x_sub1, x_sub2, x_sub4):
-        outputs = list()
-        x_cff_24, x_24_cls = self.cff_24(x_sub4, x_sub2)
-        outputs.append(x_24_cls)
-        #x_cff_12, x_12_cls = self.cff_12(x_sub2, x_sub1)
-        x_cff_12, x_12_cls = self.cff_12(x_cff_24, x_sub1)  
-        outputs.append(x_12_cls)
-        
-        up_x2 = F.interpolate(x_cff_12, scale_factor=2, mode='bilinear', align_corners=True)
-        up_x2 = self.conv_cls(up_x2)
-        outputs.append(up_x2)
-        up_x8 = F.interpolate(up_x2, scale_factor=4, mode='bilinear', align_corners=True)
-        outputs.append(up_x8)
-        # 1 -> 1/4 -> 1/8 -> 1/16
-        outputs.reverse()
-
-        return outputs
-
-
-class _ConvBNReLU(nn.Module):
-    def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, padding=1, dilation=1,
-                 groups=1, norm_layer=nn.BatchNorm2d, bias=False, **kwargs):
-        super(_ConvBNReLU, self).__init__()
-        self.conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding, dilation, groups, bias)
-        self.bn = norm_layer(out_channels)
-        self.relu = nn.ReLU(True)
-
-    def forward(self, x):
-        x = self.conv(x)
-        x = self.bn(x)
-        x = self.relu(x)
-        return x
-
-
-class CascadeFeatureFusion(nn.Module):
-    """CFF Unit"""
-
-    def __init__(self, low_channels, high_channels, out_channels, nclass, norm_layer=nn.BatchNorm2d, **kwargs):
-        super(CascadeFeatureFusion, self).__init__()
-        self.conv_low = nn.Sequential(
-            nn.Conv2d(low_channels, out_channels, 3, padding=2, dilation=2, bias=False),
-            norm_layer(out_channels)
-        )
-        self.conv_high = nn.Sequential(
-            nn.Conv2d(high_channels, out_channels, 1, bias=False),
-            norm_layer(out_channels)
-        )
-        self.conv_low_cls = nn.Conv2d(out_channels, nclass, 1, bias=False)
-
-    def forward(self, x_low, x_high):
-        x_low = F.interpolate(x_low, size=x_high.size()[2:], mode='bilinear', align_corners=True)
-        x_low = self.conv_low(x_low)
-        x_high = self.conv_high(x_high)
-        x = x_low + x_high
-        x = F.relu(x, inplace=True)
-        x_low_cls = self.conv_low_cls(x_low)
-
-        return x, x_low_cls
-
-
-def get_icnet(dataset='citys', backbone='resnet50', pretrained=False, root='~/.torch/models',
-              pretrained_base=True, **kwargs):
-    acronyms = {
-        'pascal_voc': 'pascal_voc',
-        'pascal_aug': 'pascal_aug',
-        'ade20k': 'ade',
-        'coco': 'coco',
-        'citys': 'citys',
-    }
-    from ..data.dataloader import datasets
-    model = ICNet(datasets[dataset].NUM_CLASS, backbone=backbone, pretrained_base=pretrained_base, **kwargs)
-    if pretrained:
-        from .model_store import get_model_file
-        device = torch.device(kwargs['local_rank'])
-        model.load_state_dict(torch.load(get_model_file('icnet_%s_%s' % (backbone, acronyms[dataset]), root=root),
-                              map_location=device))
-    return model
-
-
-def get_icnet_resnet50_citys(**kwargs):
-    return get_icnet('citys', 'resnet50', **kwargs)
-
-
-def get_icnet_resnet101_citys(**kwargs):
-    return get_icnet('citys', 'resnet101', **kwargs)
-
-
-def get_icnet_resnet152_citys(**kwargs):
-    return get_icnet('citys', 'resnet152', **kwargs)
-
-
-if __name__ == '__main__':
-    img = torch.randn(1, 3, 256, 256)
-    model = get_icnet_resnet50_citys()
-    outputs = model(img)
diff --git a/core/models/lednet.py b/core/models/lednet.py
deleted file mode 100644
index 5a6e6e5b6..000000000
--- a/core/models/lednet.py
+++ /dev/null
@@ -1,194 +0,0 @@
-"""LEDNet: A Lightweight Encoder-Decoder Network for Real-time Semantic Segmentation"""
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-
-from core.nn import _ConvBNReLU
-
-__all__ = ['LEDNet', 'get_lednet', 'get_lednet_citys']
-
-class LEDNet(nn.Module):
-    r"""LEDNet
-
-    Parameters
-    ----------
-    nclass : int
-        Number of categories for the training dataset.
-    backbone : string
-        Pre-trained dilated backbone network type (default:'resnet50'; 'resnet50',
-        'resnet101' or 'resnet152').
-    norm_layer : object
-        Normalization layer used in backbone network (default: :class:`nn.BatchNorm`;
-        for Synchronized Cross-GPU BachNormalization).
-    aux : bool
-        Auxiliary loss.
-
-    Reference:
-        Yu Wang, et al. "LEDNet: A Lightweight Encoder-Decoder Network for Real-Time Semantic Segmentation."
-        arXiv preprint arXiv:1905.02423 (2019).
-    """
-
-    def __init__(self, nclass, backbone='', aux=False, jpu=False, pretrained_base=True, **kwargs):
-        super(LEDNet, self).__init__()
-        self.encoder = nn.Sequential(
-            Downsampling(3, 32),
-            SSnbt(32, **kwargs), SSnbt(32, **kwargs), SSnbt(32, **kwargs),
-            Downsampling(32, 64),
-            SSnbt(64, **kwargs), SSnbt(64, **kwargs),
-            Downsampling(64, 128),
-            SSnbt(128, **kwargs),
-            SSnbt(128, 2, **kwargs),
-            SSnbt(128, 5, **kwargs),
-            SSnbt(128, 9, **kwargs),
-            SSnbt(128, 2, **kwargs),
-            SSnbt(128, 5, **kwargs),
-            SSnbt(128, 9, **kwargs),
-            SSnbt(128, 17, **kwargs),
-        )
-        self.decoder = APNModule(128, nclass)
-
-        self.__setattr__('exclusive', ['encoder', 'decoder'])
-
-    def forward(self, x):
-        size = x.size()[2:]
-        x = self.encoder(x)
-        x = self.decoder(x)
-        outputs = list()
-        x = F.interpolate(x, size, mode='bilinear', align_corners=True)
-        outputs.append(x)
-
-        return tuple(outputs)
-
-
-class Downsampling(nn.Module):
-    def __init__(self, in_channels, out_channels, **kwargs):
-        super(Downsampling, self).__init__()
-        self.conv1 = nn.Conv2d(in_channels, out_channels // 2, 3, 2, 2, bias=False)
-        self.conv2 = nn.Conv2d(in_channels, out_channels // 2, 3, 2, 2, bias=False)
-        self.pool = nn.MaxPool2d(kernel_size=2, stride=1)
-
-    def forward(self, x):
-        x1 = self.conv1(x)
-        x1 = self.pool(x1)
-
-        x2 = self.conv2(x)
-        x2 = self.pool(x2)
-
-        return torch.cat([x1, x2], dim=1)
-
-
-class SSnbt(nn.Module):
-    def __init__(self, in_channels, dilation=1, norm_layer=nn.BatchNorm2d, **kwargs):
-        super(SSnbt, self).__init__()
-        inter_channels = in_channels // 2
-        self.branch1 = nn.Sequential(
-            nn.Conv2d(inter_channels, inter_channels, (3, 1), padding=(1, 0), bias=False),
-            nn.ReLU(True),
-            nn.Conv2d(inter_channels, inter_channels, (1, 3), padding=(0, 1), bias=False),
-            norm_layer(inter_channels),
-            nn.ReLU(True),
-            nn.Conv2d(inter_channels, inter_channels, (3, 1), padding=(dilation, 0), dilation=(dilation, 1),
-                      bias=False),
-            nn.ReLU(True),
-            nn.Conv2d(inter_channels, inter_channels, (1, 3), padding=(0, dilation), dilation=(1, dilation),
-                      bias=False),
-            norm_layer(inter_channels),
-            nn.ReLU(True))
-
-        self.branch2 = nn.Sequential(
-            nn.Conv2d(inter_channels, inter_channels, (1, 3), padding=(0, 1), bias=False),
-            nn.ReLU(True),
-            nn.Conv2d(inter_channels, inter_channels, (3, 1), padding=(1, 0), bias=False),
-            norm_layer(inter_channels),
-            nn.ReLU(True),
-            nn.Conv2d(inter_channels, inter_channels, (1, 3), padding=(0, dilation), dilation=(1, dilation),
-                      bias=False),
-            nn.ReLU(True),
-            nn.Conv2d(inter_channels, inter_channels, (3, 1), padding=(dilation, 0), dilation=(dilation, 1),
-                      bias=False),
-            norm_layer(inter_channels),
-            nn.ReLU(True))
-
-        self.relu = nn.ReLU(True)
-
-    @staticmethod
-    def channel_shuffle(x, groups):
-        n, c, h, w = x.size()
-
-        channels_per_group = c // groups
-        x = x.view(n, groups, channels_per_group, h, w)
-        x = torch.transpose(x, 1, 2).contiguous()
-        x = x.view(n, -1, h, w)
-
-        return x
-
-    def forward(self, x):
-        # channels split
-        x1, x2 = x.split(x.size(1) // 2, 1)
-
-        x1 = self.branch1(x1)
-        x2 = self.branch2(x2)
-
-        out = torch.cat([x1, x2], dim=1)
-        out = self.relu(out + x)
-        out = self.channel_shuffle(out, groups=2)
-
-        return out
-
-
-class APNModule(nn.Module):
-    def __init__(self, in_channels, nclass, norm_layer=nn.BatchNorm2d, **kwargs):
-        super(APNModule, self).__init__()
-        self.conv1 = _ConvBNReLU(in_channels, in_channels, 3, 2, 1, norm_layer=norm_layer)
-        self.conv2 = _ConvBNReLU(in_channels, in_channels, 5, 2, 2, norm_layer=norm_layer)
-        self.conv3 = _ConvBNReLU(in_channels, in_channels, 7, 2, 3, norm_layer=norm_layer)
-        self.level1 = _ConvBNReLU(in_channels, nclass, 1, norm_layer=norm_layer)
-        self.level2 = _ConvBNReLU(in_channels, nclass, 1, norm_layer=norm_layer)
-        self.level3 = _ConvBNReLU(in_channels, nclass, 1, norm_layer=norm_layer)
-        self.level4 = _ConvBNReLU(in_channels, nclass, 1, norm_layer=norm_layer)
-        self.level5 = nn.Sequential(
-            nn.AdaptiveAvgPool2d(1),
-            _ConvBNReLU(in_channels, nclass, 1))
-
-    def forward(self, x):
-        w, h = x.size()[2:]
-        branch3 = self.conv1(x)
-        branch2 = self.conv2(branch3)
-        branch1 = self.conv3(branch2)
-
-        out = self.level1(branch1)
-        out = F.interpolate(out, ((w + 3) // 4, (h + 3) // 4), mode='bilinear', align_corners=True)
-        out = self.level2(branch2) + out
-        out = F.interpolate(out, ((w + 1) // 2, (h + 1) // 2), mode='bilinear', align_corners=True)
-        out = self.level3(branch3) + out
-        out = F.interpolate(out, (w, h), mode='bilinear', align_corners=True)
-        out = self.level4(x) * out
-        out = self.level5(x) + out
-        return out
-
-
-def get_lednet(dataset='citys', backbone='', pretrained=False, root='~/.torch/models',
-               pretrained_base=True, **kwargs):
-    acronyms = {
-        'pascal_voc': 'pascal_voc',
-        'pascal_aug': 'pascal_aug',
-        'ade20k': 'ade',
-        'coco': 'coco',
-        'citys': 'citys',
-    }
-    from ..data.dataloader import datasets
-    model = LEDNet(datasets[dataset].NUM_CLASS, backbone=backbone, pretrained_base=pretrained_base, **kwargs)
-    if pretrained:
-        from .model_store import get_model_file
-        device = torch.device(kwargs['local_rank'])
-        model.load_state_dict(torch.load(get_model_file('lednet_%s' % (acronyms[dataset]), root=root),
-                              map_location=device))
-    return model
-
-
-def get_lednet_citys(**kwargs):
-    return get_lednet('citys', **kwargs)
-
-
-if __name__ == '__main__':
-    model = get_lednet_citys()
diff --git a/core/models/ocnet.py b/core/models/ocnet.py
deleted file mode 100755
index 333294fd5..000000000
--- a/core/models/ocnet.py
+++ /dev/null
@@ -1,345 +0,0 @@
-""" Object Context Network for Scene Parsing"""
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-
-from .segbase import SegBaseModel
-from .fcn import _FCNHead
-
-__all__ = ['OCNet', 'get_ocnet', 'get_base_ocnet_resnet101_citys',
-           'get_pyramid_ocnet_resnet101_citys', 'get_asp_ocnet_resnet101_citys']
-
-
-class OCNet(SegBaseModel):
-    r"""OCNet
-
-    Parameters
-    ----------
-    nclass : int
-        Number of categories for the training dataset.
-    backbone : string
-        Pre-trained dilated backbone network type (default:'resnet50'; 'resnet50',
-        'resnet101' or 'resnet152').
-    norm_layer : object
-        Normalization layer used in backbone network (default: :class:`nn.BatchNorm`;
-        for Synchronized Cross-GPU BachNormalization).
-    aux : bool
-        Auxiliary loss.
-    Reference:
-        Yuhui Yuan, Jingdong Wang. "OCNet: Object Context Network for Scene Parsing."
-        arXiv preprint arXiv:1809.00916 (2018).
-    """
-
-    def __init__(self, nclass, backbone='resnet101', oc_arch='base', aux=False, pretrained_base=True, **kwargs):
-        super(OCNet, self).__init__(nclass, aux, backbone, pretrained_base=pretrained_base, **kwargs)
-        self.head = _OCHead(nclass, oc_arch, **kwargs)
-        if self.aux:
-            self.auxlayer = _FCNHead(1024, nclass, **kwargs)
-
-        self.__setattr__('exclusive', ['head', 'auxlayer'] if aux else ['head'])
-
-    def forward(self, x):
-        size = x.size()[2:]
-        _, _, c3, c4 = self.base_forward(x)
-        outputs = []
-        x = self.head(c4)
-        x = F.interpolate(x, size, mode='bilinear', align_corners=True)
-        outputs.append(x)
-
-        if self.aux:
-            auxout = self.auxlayer(c3)
-            auxout = F.interpolate(auxout, size, mode='bilinear', align_corners=True)
-            outputs.append(auxout)
-        return tuple(outputs)
-
-
-class _OCHead(nn.Module):
-    def __init__(self, nclass, oc_arch, norm_layer=nn.BatchNorm2d, **kwargs):
-        super(_OCHead, self).__init__()
-        if oc_arch == 'base':
-            self.context = nn.Sequential(
-                nn.Conv2d(2048, 512, 3, 1, padding=1, bias=False),
-                norm_layer(512),
-                nn.ReLU(True),
-                BaseOCModule(512, 512, 256, 256, scales=([1]), norm_layer=norm_layer, **kwargs))
-        elif oc_arch == 'pyramid':
-            self.context = nn.Sequential(
-                nn.Conv2d(2048, 512, 3, 1, padding=1, bias=False),
-                norm_layer(512),
-                nn.ReLU(True),
-                PyramidOCModule(512, 512, 256, 512, scales=([1, 2, 3, 6]), norm_layer=norm_layer, **kwargs))
-        elif oc_arch == 'asp':
-            self.context = ASPOCModule(2048, 512, 256, 512, norm_layer=norm_layer, **kwargs)
-        else:
-            raise ValueError("Unknown OC architecture!")
-
-        self.out = nn.Conv2d(512, nclass, 1)
-
-    def forward(self, x):
-        x = self.context(x)
-        return self.out(x)
-
-
-class BaseAttentionBlock(nn.Module):
-    """The basic implementation for self-attention block/non-local block."""
-
-    def __init__(self, in_channels, out_channels, key_channels, value_channels,
-                 scale=1, norm_layer=nn.BatchNorm2d, **kwargs):
-        super(BaseAttentionBlock, self).__init__()
-        self.scale = scale
-        self.key_channels = key_channels
-        self.value_channels = value_channels
-        if scale > 1:
-            self.pool = nn.MaxPool2d(scale)
-
-        self.f_value = nn.Conv2d(in_channels, value_channels, 1)
-        self.f_key = nn.Sequential(
-            nn.Conv2d(in_channels, key_channels, 1),
-            norm_layer(key_channels),
-            nn.ReLU(True)
-        )
-        self.f_query = self.f_key
-        self.W = nn.Conv2d(value_channels, out_channels, 1)
-        nn.init.constant_(self.W.weight, 0)
-        nn.init.constant_(self.W.bias, 0)
-
-    def forward(self, x):
-        batch_size, c, w, h = x.size()
-        if self.scale > 1:
-            x = self.pool(x)
-
-        value = self.f_value(x).view(batch_size, self.value_channels, -1).permute(0, 2, 1)
-        query = self.f_query(x).view(batch_size, self.key_channels, -1).permute(0, 2, 1)
-        key = self.f_key(x).view(batch_size, self.key_channels, -1)
-
-        sim_map = torch.bmm(query, key) * (self.key_channels ** -.5)
-        sim_map = F.softmax(sim_map, dim=-1)
-
-        context = torch.bmm(sim_map, value).permute(0, 2, 1).contiguous()
-        context = context.view(batch_size, self.value_channels, *x.size()[2:])
-        context = self.W(context)
-        if self.scale > 1:
-            context = F.interpolate(context, size=(w, h), mode='bilinear', align_corners=True)
-
-        return context
-
-
-class BaseOCModule(nn.Module):
-    """Base-OC"""
-
-    def __init__(self, in_channels, out_channels, key_channels, value_channels,
-                 scales=([1]), norm_layer=nn.BatchNorm2d, concat=True, **kwargs):
-        super(BaseOCModule, self).__init__()
-        self.stages = nn.ModuleList([
-            BaseAttentionBlock(in_channels, out_channels, key_channels, value_channels, scale, norm_layer, **kwargs)
-            for scale in scales])
-        in_channels = in_channels * 2 if concat else in_channels
-        self.project = nn.Sequential(
-            nn.Conv2d(in_channels, out_channels, 1),
-            norm_layer(out_channels),
-            nn.ReLU(True),
-            nn.Dropout2d(0.05)
-        )
-        self.concat = concat
-
-    def forward(self, x):
-        priors = [stage(x) for stage in self.stages]
-        context = priors[0]
-        for i in range(1, len(priors)):
-            context += priors[i]
-        if self.concat:
-            context = torch.cat([context, x], 1)
-        out = self.project(context)
-        return out
-
-
-class PyramidAttentionBlock(nn.Module):
-    """The basic implementation for pyramid self-attention block/non-local block"""
-
-    def __init__(self, in_channels, out_channels, key_channels, value_channels,
-                 scale=1, norm_layer=nn.BatchNorm2d, **kwargs):
-        super(PyramidAttentionBlock, self).__init__()
-        self.scale = scale
-        self.value_channels = value_channels
-        self.key_channels = key_channels
-
-        self.f_value = nn.Conv2d(in_channels, value_channels, 1)
-        self.f_key = nn.Sequential(
-            nn.Conv2d(in_channels, key_channels, 1),
-            norm_layer(key_channels),
-            nn.ReLU(True)
-        )
-        self.f_query = self.f_key
-        self.W = nn.Conv2d(value_channels, out_channels, 1)
-        nn.init.constant_(self.W.weight, 0)
-        nn.init.constant_(self.W.bias, 0)
-
-    def forward(self, x):
-        batch_size, c, w, h = x.size()
-
-        local_x = list()
-        local_y = list()
-        step_w, step_h = w // self.scale, h // self.scale
-        for i in range(self.scale):
-            for j in range(self.scale):
-                start_x, start_y = step_w * i, step_h * j
-                end_x, end_y = min(start_x + step_w, w), min(start_y + step_h, h)
-                if i == (self.scale - 1):
-                    end_x = w
-                if j == (self.scale - 1):
-                    end_y = h
-                local_x += [start_x, end_x]
-                local_y += [start_y, end_y]
-
-        value = self.f_value(x)
-        query = self.f_query(x)
-        key = self.f_key(x)
-
-        local_list = list()
-        local_block_cnt = (self.scale ** 2) * 2
-        for i in range(0, local_block_cnt, 2):
-            value_local = value[:, :, local_x[i]:local_x[i + 1], local_y[i]:local_y[i + 1]]
-            query_local = query[:, :, local_x[i]:local_x[i + 1], local_y[i]:local_y[i + 1]]
-            key_local = key[:, :, local_x[i]:local_x[i + 1], local_y[i]:local_y[i + 1]]
-
-            w_local, h_local = value_local.size(2), value_local.size(3)
-            value_local = value_local.contiguous().view(batch_size, self.value_channels, -1).permute(0, 2, 1)
-            query_local = query_local.contiguous().view(batch_size, self.key_channels, -1).permute(0, 2, 1)
-            key_local = key_local.contiguous().view(batch_size, self.key_channels, -1)
-
-            sim_map = torch.bmm(query_local, key_local) * (self.key_channels ** -.5)
-            sim_map = F.softmax(sim_map, dim=-1)
-
-            context_local = torch.bmm(sim_map, value_local).permute(0, 2, 1).contiguous()
-            context_local = context_local.view(batch_size, self.value_channels, w_local, h_local)
-            local_list.append(context_local)
-
-        context_list = list()
-        for i in range(0, self.scale):
-            row_tmp = list()
-            for j in range(self.scale):
-                row_tmp.append(local_list[j + i * self.scale])
-            context_list.append(torch.cat(row_tmp, 3))
-
-        context = torch.cat(context_list, 2)
-        context = self.W(context)
-
-        return context
-
-
-class PyramidOCModule(nn.Module):
-    """Pyramid-OC"""
-
-    def __init__(self, in_channels, out_channels, key_channels, value_channels,
-                 scales=([1]), norm_layer=nn.BatchNorm2d, **kwargs):
-        super(PyramidOCModule, self).__init__()
-        self.stages = nn.ModuleList([
-            PyramidAttentionBlock(in_channels, out_channels, key_channels, value_channels, scale, norm_layer, **kwargs)
-            for scale in scales])
-        self.up_dr = nn.Sequential(
-            nn.Conv2d(in_channels, in_channels * len(scales), 1),
-            norm_layer(in_channels * len(scales)),
-            nn.ReLU(True)
-        )
-        self.project = nn.Sequential(
-            nn.Conv2d(in_channels * len(scales) * 2, out_channels, 1),
-            norm_layer(out_channels),
-            nn.ReLU(True),
-            nn.Dropout2d(0.05)
-        )
-
-    def forward(self, x):
-        priors = [stage(x) for stage in self.stages]
-        context = [self.up_dr(x)]
-        for i in range(len(priors)):
-            context += [priors[i]]
-        context = torch.cat(context, 1)
-        out = self.project(context)
-        return out
-
-
-class ASPOCModule(nn.Module):
-    """ASP-OC"""
-
-    def __init__(self, in_channels, out_channels, key_channels, value_channels,
-                 atrous_rates=(12, 24, 36), norm_layer=nn.BatchNorm2d, **kwargs):
-        super(ASPOCModule, self).__init__()
-        self.context = nn.Sequential(
-            nn.Conv2d(in_channels, out_channels, 3, padding=1),
-            norm_layer(out_channels),
-            nn.ReLU(True),
-            BaseOCModule(out_channels, out_channels, key_channels, value_channels, ([2]), norm_layer, False, **kwargs))
-
-        rate1, rate2, rate3 = tuple(atrous_rates)
-        self.b1 = nn.Sequential(
-            nn.Conv2d(in_channels, out_channels, 3, padding=rate1, dilation=rate1, bias=False),
-            norm_layer(out_channels),
-            nn.ReLU(True))
-        self.b2 = nn.Sequential(
-            nn.Conv2d(in_channels, out_channels, 3, padding=rate2, dilation=rate2, bias=False),
-            norm_layer(out_channels),
-            nn.ReLU(True))
-        self.b3 = nn.Sequential(
-            nn.Conv2d(in_channels, out_channels, 3, padding=rate3, dilation=rate3, bias=False),
-            norm_layer(out_channels),
-            nn.ReLU(True))
-        self.b4 = nn.Sequential(
-            nn.Conv2d(in_channels, out_channels, 1, bias=False),
-            norm_layer(out_channels),
-            nn.ReLU(True))
-
-        self.project = nn.Sequential(
-            nn.Conv2d(out_channels * 5, out_channels, 1, bias=False),
-            norm_layer(out_channels),
-            nn.ReLU(True),
-            nn.Dropout2d(0.1)
-        )
-
-    def forward(self, x):
-        feat1 = self.context(x)
-        feat2 = self.b1(x)
-        feat3 = self.b2(x)
-        feat4 = self.b3(x)
-        feat5 = self.b4(x)
-        out = torch.cat((feat1, feat2, feat3, feat4, feat5), dim=1)
-        out = self.project(out)
-        return out
-
-
-def get_ocnet(dataset='citys', backbone='resnet50', oc_arch='base', pretrained=False, root='~/.torch/models',
-              pretrained_base=True, **kwargs):
-    acronyms = {
-        'pascal_voc': 'pascal_voc',
-        'pascal_aug': 'pascal_aug',
-        'ade20k': 'ade',
-        'coco': 'coco',
-        'citys': 'citys',
-    }
-    from ..data.dataloader import datasets
-    model = OCNet(datasets[dataset].NUM_CLASS, backbone=backbone, oc_arch=oc_arch,
-                  pretrained_base=pretrained_base, **kwargs)
-    if pretrained:
-        from .model_store import get_model_file
-        device = torch.device(kwargs['local_rank'])
-        model.load_state_dict(torch.load(get_model_file('%s_ocnet_%s_%s' % (
-            oc_arch, backbone, acronyms[dataset]), root=root),
-            map_location=device))
-    return model
-
-
-def get_base_ocnet_resnet101_citys(**kwargs):
-    return get_ocnet('citys', 'resnet101', 'base', **kwargs)
-
-
-def get_pyramid_ocnet_resnet101_citys(**kwargs):
-    return get_ocnet('citys', 'resnet101', 'pyramid', **kwargs)
-
-
-def get_asp_ocnet_resnet101_citys(**kwargs):
-    return get_ocnet('citys', 'resnet101', 'asp', **kwargs)
-
-
-if __name__ == '__main__':
-    img = torch.randn(1, 3, 256, 256)
-    model = get_asp_ocnet_resnet101_citys()
-    outputs = model(img)
diff --git a/core/models/psanet.py b/core/models/psanet.py
deleted file mode 100644
index c98ad4674..000000000
--- a/core/models/psanet.py
+++ /dev/null
@@ -1,162 +0,0 @@
-"""Point-wise Spatial Attention Network"""
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-
-from core.nn import _ConvBNReLU
-from core.models.segbase import SegBaseModel
-from core.models.fcn import _FCNHead
-
-__all__ = ['PSANet', 'get_psanet', 'get_psanet_resnet50_voc', 'get_psanet_resnet101_voc',
-           'get_psanet_resnet152_voc', 'get_psanet_resnet50_citys', 'get_psanet_resnet101_citys',
-           'get_psanet_resnet152_citys']
-
-
-class PSANet(SegBaseModel):
-    r"""PSANet
-
-    Parameters
-    ----------
-    nclass : int
-        Number of categories for the training dataset.
-    backbone : string
-        Pre-trained dilated backbone network type (default:'resnet50'; 'resnet50',
-        'resnet101' or 'resnet152').
-    norm_layer : object
-        Normalization layer used in backbone network (default: :class:`nn.BatchNorm`;
-        for Synchronized Cross-GPU BachNormalization).
-    aux : bool
-        Auxiliary loss.
-
-    Reference:
-        Hengshuang Zhao, et al. "PSANet: Point-wise Spatial Attention Network for Scene Parsing."
-        ECCV-2018.
-    """
-
-    def __init__(self, nclass, backbone='resnet', aux=False, pretrained_base=True, **kwargs):
-        super(PSANet, self).__init__(nclass, aux, backbone, pretrained_base=pretrained_base, **kwargs)
-        self.head = _PSAHead(nclass, **kwargs)
-        if aux:
-            self.auxlayer = _FCNHead(1024, nclass, **kwargs)
-
-        self.__setattr__('exclusive', ['head', 'auxlayer'] if aux else ['head'])
-
-    def forward(self, x):
-        size = x.size()[2:]
-        _, _, c3, c4 = self.base_forward(x)
-        outputs = list()
-        x = self.head(c4)
-        x = F.interpolate(x, size, mode='bilinear', align_corners=True)
-        outputs.append(x)
-
-        if self.aux:
-            auxout = self.auxlayer(c3)
-            auxout = F.interpolate(auxout, size, mode='bilinear', align_corners=True)
-            outputs.append(auxout)
-        return tuple(outputs)
-
-
-class _PSAHead(nn.Module):
-    def __init__(self, nclass, norm_layer=nn.BatchNorm2d, **kwargs):
-        super(_PSAHead, self).__init__()
-        # psa_out_channels = crop_size // 8 ** 2
-        self.psa = _PointwiseSpatialAttention(2048, 3600, norm_layer)
-
-        self.conv_post = _ConvBNReLU(1024, 2048, 1, norm_layer=norm_layer)
-        self.project = nn.Sequential(
-            _ConvBNReLU(4096, 512, 3, padding=1, norm_layer=norm_layer),
-            nn.Dropout2d(0.1, False),
-            nn.Conv2d(512, nclass, 1))
-
-    def forward(self, x):
-        global_feature = self.psa(x)
-        out = self.conv_post(global_feature)
-        out = torch.cat([x, out], dim=1)
-        out = self.project(out)
-
-        return out
-
-
-class _PointwiseSpatialAttention(nn.Module):
-    def __init__(self, in_channels, out_channels, norm_layer=nn.BatchNorm2d, **kwargs):
-        super(_PointwiseSpatialAttention, self).__init__()
-        reduced_channels = 512
-        self.collect_attention = _AttentionGeneration(in_channels, reduced_channels, out_channels, norm_layer)
-        self.distribute_attention = _AttentionGeneration(in_channels, reduced_channels, out_channels, norm_layer)
-
-    def forward(self, x):
-        collect_fm = self.collect_attention(x)
-        distribute_fm = self.distribute_attention(x)
-        psa_fm = torch.cat([collect_fm, distribute_fm], dim=1)
-        return psa_fm
-
-
-class _AttentionGeneration(nn.Module):
-    def __init__(self, in_channels, reduced_channels, out_channels, norm_layer, **kwargs):
-        super(_AttentionGeneration, self).__init__()
-        self.conv_reduce = _ConvBNReLU(in_channels, reduced_channels, 1, norm_layer=norm_layer)
-        self.attention = nn.Sequential(
-            _ConvBNReLU(reduced_channels, reduced_channels, 1, norm_layer=norm_layer),
-            nn.Conv2d(reduced_channels, out_channels, 1, bias=False))
-
-        self.reduced_channels = reduced_channels
-
-    def forward(self, x):
-        reduce_x = self.conv_reduce(x)
-        attention = self.attention(reduce_x)
-        n, c, h, w = attention.size()
-        attention = attention.view(n, c, -1)
-        reduce_x = reduce_x.view(n, self.reduced_channels, -1)
-        fm = torch.bmm(reduce_x, torch.softmax(attention, dim=1))
-        fm = fm.view(n, self.reduced_channels, h, w)
-
-        return fm
-
-
-def get_psanet(dataset='pascal_voc', backbone='resnet50', pretrained=False, root='~/.torch/models',
-               pretrained_base=True, **kwargs):
-    acronyms = {
-        'pascal_voc': 'pascal_voc',
-        'pascal_aug': 'pascal_aug',
-        'ade20k': 'ade',
-        'coco': 'coco',
-        'citys': 'citys',
-    }
-    from core.data.dataloader import datasets
-    model = PSANet(datasets[dataset].NUM_CLASS, backbone=backbone, pretrained_base=pretrained_base, **kwargs)
-    if pretrained:
-        from .model_store import get_model_file
-        device = torch.device(kwargs['local_rank'])
-        model.load_state_dict(torch.load(get_model_file('deeplabv3_%s_%s' % (backbone, acronyms[dataset]), root=root),
-                              map_location=device))
-    return model
-
-
-def get_psanet_resnet50_voc(**kwargs):
-    return get_psanet('pascal_voc', 'resnet50', **kwargs)
-
-
-def get_psanet_resnet101_voc(**kwargs):
-    return get_psanet('pascal_voc', 'resnet101', **kwargs)
-
-
-def get_psanet_resnet152_voc(**kwargs):
-    return get_psanet('pascal_voc', 'resnet152', **kwargs)
-
-
-def get_psanet_resnet50_citys(**kwargs):
-    return get_psanet('citys', 'resnet50', **kwargs)
-
-
-def get_psanet_resnet101_citys(**kwargs):
-    return get_psanet('citys', 'resnet101', **kwargs)
-
-
-def get_psanet_resnet152_citys(**kwargs):
-    return get_psanet('citys', 'resnet152', **kwargs)
-
-
-if __name__ == '__main__':
-    model = get_psanet_resnet50_voc()
-    img = torch.randn(1, 3, 480, 480)
-    output = model(img)
diff --git a/core/models/pspnet.py b/core/models/pspnet.py
deleted file mode 100644
index efeae6135..000000000
--- a/core/models/pspnet.py
+++ /dev/null
@@ -1,168 +0,0 @@
-"""Pyramid Scene Parsing Network"""
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-
-from .segbase import SegBaseModel
-from .fcn import _FCNHead
-
-__all__ = ['PSPNet', 'get_psp', 'get_psp_resnet50_voc', 'get_psp_resnet50_ade', 'get_psp_resnet101_voc',
-           'get_psp_resnet101_ade', 'get_psp_resnet101_citys', 'get_psp_resnet101_coco']
-
-
-class PSPNet(SegBaseModel):
-    r"""Pyramid Scene Parsing Network
-
-    Parameters
-    ----------
-    nclass : int
-        Number of categories for the training dataset.
-    backbone : string
-        Pre-trained dilated backbone network type (default:'resnet50'; 'resnet50',
-        'resnet101' or 'resnet152').
-    norm_layer : object
-        Normalization layer used in backbone network (default: :class:`nn.BatchNorm`;
-        for Synchronized Cross-GPU BachNormalization).
-    aux : bool
-        Auxiliary loss.
-
-    Reference:
-        Zhao, Hengshuang, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia.
-        "Pyramid scene parsing network." *CVPR*, 2017
-    """
-
-    def __init__(self, nclass, backbone='resnet50', aux=False, pretrained_base=True, **kwargs):
-        super(PSPNet, self).__init__(nclass, aux, backbone, pretrained_base=pretrained_base, **kwargs)
-        self.head = _PSPHead(nclass, **kwargs)
-        if self.aux:
-            self.auxlayer = _FCNHead(1024, nclass, **kwargs)
-
-        self.__setattr__('exclusive', ['head', 'auxlayer'] if aux else ['head'])
-
-    def forward(self, x):
-        size = x.size()[2:]
-        _, _, c3, c4 = self.base_forward(x)
-        outputs = []
-        x = self.head(c4)
-        x = F.interpolate(x, size, mode='bilinear', align_corners=True)
-        outputs.append(x)
-
-        if self.aux:
-            auxout = self.auxlayer(c3)
-            auxout = F.interpolate(auxout, size, mode='bilinear', align_corners=True)
-            outputs.append(auxout)
-        return tuple(outputs)
-
-
-def _PSP1x1Conv(in_channels, out_channels, norm_layer, norm_kwargs):
-    return nn.Sequential(
-        nn.Conv2d(in_channels, out_channels, 1, bias=False),
-        norm_layer(out_channels, **({} if norm_kwargs is None else norm_kwargs)),
-        nn.ReLU(True)
-    )
-
-
-class _PyramidPooling(nn.Module):
-    def __init__(self, in_channels, **kwargs):
-        super(_PyramidPooling, self).__init__()
-        out_channels = int(in_channels / 4)
-        self.avgpool1 = nn.AdaptiveAvgPool2d(1)
-        self.avgpool2 = nn.AdaptiveAvgPool2d(2)
-        self.avgpool3 = nn.AdaptiveAvgPool2d(3)
-        self.avgpool4 = nn.AdaptiveAvgPool2d(6)
-        self.conv1 = _PSP1x1Conv(in_channels, out_channels, **kwargs)
-        self.conv2 = _PSP1x1Conv(in_channels, out_channels, **kwargs)
-        self.conv3 = _PSP1x1Conv(in_channels, out_channels, **kwargs)
-        self.conv4 = _PSP1x1Conv(in_channels, out_channels, **kwargs)
-
-    def forward(self, x):
-        size = x.size()[2:]
-        feat1 = F.interpolate(self.conv1(self.avgpool1(x)), size, mode='bilinear', align_corners=True)
-        feat2 = F.interpolate(self.conv2(self.avgpool2(x)), size, mode='bilinear', align_corners=True)
-        feat3 = F.interpolate(self.conv3(self.avgpool3(x)), size, mode='bilinear', align_corners=True)
-        feat4 = F.interpolate(self.conv4(self.avgpool4(x)), size, mode='bilinear', align_corners=True)
-        return torch.cat([x, feat1, feat2, feat3, feat4], dim=1)
-
-
-class _PSPHead(nn.Module):
-    def __init__(self, nclass, norm_layer=nn.BatchNorm2d, norm_kwargs=None, **kwargs):
-        super(_PSPHead, self).__init__()
-        self.psp = _PyramidPooling(2048, norm_layer=norm_layer, norm_kwargs=norm_kwargs)
-        self.block = nn.Sequential(
-            nn.Conv2d(4096, 512, 3, padding=1, bias=False),
-            norm_layer(512, **({} if norm_kwargs is None else norm_kwargs)),
-            nn.ReLU(True),
-            nn.Dropout(0.1),
-            nn.Conv2d(512, nclass, 1)
-        )
-
-    def forward(self, x):
-        x = self.psp(x)
-        return self.block(x)
-
-
-def get_psp(dataset='pascal_voc', backbone='resnet50', pretrained=False, root='~/.torch/models',
-            pretrained_base=True, **kwargs):
-    r"""Pyramid Scene Parsing Network
-
-    Parameters
-    ----------
-    dataset : str, default pascal_voc
-        The dataset that model pretrained on. (pascal_voc, ade20k)
-    pretrained : bool or str
-        Boolean value controls whether to load the default pretrained weights for model.
-        String value represents the hashtag for a certain version of pretrained weights.
-    root : str, default '~/.torch/models'
-        Location for keeping the model parameters.
-    pretrained_base : bool or str, default True
-        This will load pretrained backbone network, that was trained on ImageNet.
-    Examples
-    --------
-    >>> model = get_psp(dataset='pascal_voc', backbone='resnet50', pretrained=False)
-    >>> print(model)
-    """
-    acronyms = {
-        'pascal_voc': 'pascal_voc',
-        'pascal_aug': 'pascal_aug',
-        'ade20k': 'ade',
-        'coco': 'coco',
-        'citys': 'citys',
-    }
-    from ..data.dataloader import datasets
-    model = PSPNet(datasets[dataset].NUM_CLASS, backbone=backbone, pretrained_base=pretrained_base, **kwargs)
-    if pretrained:
-        from .model_store import get_model_file
-        device = torch.device(kwargs['local_rank'])
-        model.load_state_dict(torch.load(get_model_file('psp_%s_%s' % (backbone, acronyms[dataset]), root=root),
-                              map_location=device))
-    return model
-
-
-def get_psp_resnet50_voc(**kwargs):
-    return get_psp('pascal_voc', 'resnet50', **kwargs)
-
-
-def get_psp_resnet50_ade(**kwargs):
-    return get_psp('ade20k', 'resnet50', **kwargs)
-
-
-def get_psp_resnet101_voc(**kwargs):
-    return get_psp('pascal_voc', 'resnet101', **kwargs)
-
-
-def get_psp_resnet101_ade(**kwargs):
-    return get_psp('ade20k', 'resnet101', **kwargs)
-
-
-def get_psp_resnet101_citys(**kwargs):
-    return get_psp('citys', 'resnet101', **kwargs)
-
-
-def get_psp_resnet101_coco(**kwargs):
-    return get_psp('coco', 'resnet101', **kwargs)
-
-
-if __name__ == '__main__':
-    model = get_psp_resnet50_voc()
-    img = torch.randn(4, 3, 480, 480)
-    output = model(img)
diff --git a/core/models/segbase.py b/core/models/segbase.py
deleted file mode 100644
index f1560936b..000000000
--- a/core/models/segbase.py
+++ /dev/null
@@ -1,60 +0,0 @@
-"""Base Model for Semantic Segmentation"""
-import torch.nn as nn
-
-from ..nn import JPU
-from .base_models.resnetv1b import resnet50_v1s, resnet101_v1s, resnet152_v1s
-
-__all__ = ['SegBaseModel']
-
-
-class SegBaseModel(nn.Module):
-    r"""Base Model for Semantic Segmentation
-
-    Parameters
-    ----------
-    backbone : string
-        Pre-trained dilated backbone network type (default:'resnet50'; 'resnet50',
-        'resnet101' or 'resnet152').
-    """
-
-    def __init__(self, nclass, aux, backbone='resnet50', jpu=False, pretrained_base=True, **kwargs):
-        super(SegBaseModel, self).__init__()
-        dilated = False if jpu else True
-        self.aux = aux
-        self.nclass = nclass
-        if backbone == 'resnet50':
-            self.pretrained = resnet50_v1s(pretrained=pretrained_base, dilated=dilated, **kwargs)
-        elif backbone == 'resnet101':
-            self.pretrained = resnet101_v1s(pretrained=pretrained_base, dilated=dilated, **kwargs)
-        elif backbone == 'resnet152':
-            self.pretrained = resnet152_v1s(pretrained=pretrained_base, dilated=dilated, **kwargs)
-        else:
-            raise RuntimeError('unknown backbone: {}'.format(backbone))
-
-        self.jpu = JPU([512, 1024, 2048], width=512, **kwargs) if jpu else None
-
-    def base_forward(self, x):
-        """forwarding pre-trained network"""
-        x = self.pretrained.conv1(x)
-        x = self.pretrained.bn1(x)
-        x = self.pretrained.relu(x)
-        x = self.pretrained.maxpool(x)
-        c1 = self.pretrained.layer1(x)
-        c2 = self.pretrained.layer2(c1)
-        c3 = self.pretrained.layer3(c2)
-        c4 = self.pretrained.layer4(c3)
-
-        if self.jpu:
-            return self.jpu(c1, c2, c3, c4)
-        else:
-            return c1, c2, c3, c4
-
-    def evaluate(self, x):
-        """evaluating network with inputs and targets"""
-        return self.forward(x)[0]
-
-    def demo(self, x):
-        pred = self.forward(x)
-        if self.aux:
-            pred = pred[0]
-        return pred
diff --git a/core/models/enet.py b/core/models/swnet.py
similarity index 59%
rename from core/models/enet.py
rename to core/models/swnet.py
index 853fc6571..e23f36841 100644
--- a/core/models/enet.py
+++ b/core/models/swnet.py
@@ -1,8 +1,8 @@
-"""Efficient Neural Network"""
+"""A improved slightweight model"""
 import torch
 import torch.nn as nn
 
-__all__ = ['ENet', 'get_enet', 'get_enet_citys']
+__all__ = ['swnet', 'get_swnet', 'get_swnet_citys']
 
 
 class ENet(nn.Module):
@@ -11,48 +11,83 @@ class ENet(nn.Module):
     def __init__(self, nclass, backbone='', aux=False, jpu=False, pretrained_base=None, **kwargs):
         super(ENet, self).__init__()
         self.initial = InitialBlock(13, **kwargs)
-
+#block 1:
         self.bottleneck1_0 = Bottleneck(16, 16, 64, downsampling=True, **kwargs)
         self.bottleneck1_1 = Bottleneck(64, 16, 64, **kwargs)
         self.bottleneck1_2 = Bottleneck(64, 16, 64, **kwargs)
         self.bottleneck1_3 = Bottleneck(64, 16, 64, **kwargs)
         self.bottleneck1_4 = Bottleneck(64, 16, 64, **kwargs)
-
+        self.bottleneck1_5 = Bottleneck(64, 16, 64, **kwargs)
+        self.bottleneck1_6 = Bottleneck(64, 16, 64, **kwargs)
+        self.bottleneck1_7 = Bottleneck(64, 16, 64, **kwargs)
+        self.bottleneck1_8 = Bottleneck(64, 16, 64, **kwargs)
+        self.bottleneck1_9 = Bottleneck(64, 16, 64, **kwargs)
+        self.bottleneck1_10 = Bottleneck(64, 16, 64, **kwargs)
+#blcok 2:
         self.bottleneck2_0 = Bottleneck(64, 32, 128, downsampling=True, **kwargs)
         self.bottleneck2_1 = Bottleneck(128, 32, 128, **kwargs)
         self.bottleneck2_2 = Bottleneck(128, 32, 128, dilation=2, **kwargs)
         self.bottleneck2_3 = Bottleneck(128, 32, 128, asymmetric=True, **kwargs)
         self.bottleneck2_4 = Bottleneck(128, 32, 128, dilation=4, **kwargs)
         self.bottleneck2_5 = Bottleneck(128, 32, 128, **kwargs)
-        self.bottleneck2_6 = Bottleneck(128, 32, 128, dilation=8, **kwargs)
-        self.bottleneck2_7 = Bottleneck(128, 32, 128, asymmetric=True, **kwargs)
-        self.bottleneck2_8 = Bottleneck(128, 32, 128, dilation=16, **kwargs)
-
-        self.bottleneck3_1 = Bottleneck(128, 32, 128, **kwargs)
-        self.bottleneck3_2 = Bottleneck(128, 32, 128, dilation=2, **kwargs)
-        self.bottleneck3_3 = Bottleneck(128, 32, 128, asymmetric=True, **kwargs)
-        self.bottleneck3_4 = Bottleneck(128, 32, 128, dilation=4, **kwargs)
+        self.bottleneck2_6 = Bottleneck(128, 32, 128, **kwargs)
+        self.bottleneck2_7 = Bottleneck(128, 32, 128, **kwargs)
+        self.bottleneck2_8 = Bottleneck(128, 32, 128, dilation=8, **kwargs)
+        self.bottleneck2_9 = Bottleneck(128, 32, 128, asymmetric=True, **kwargs)
+        self.bottleneck2_10 = Bottleneck(128, 32, 128, dilation=16, **kwargs)
+#block 3:
+        self.bottleneck3_0 = Bottleneck(128, 32, 128, **kwargs)
+        self.bottleneck3_1 = Bottleneck(128, 32, 128, dilation=2, **kwargs)
+        self.bottleneck3_2 = Bottleneck(128, 32, 128, asymmetric=True, **kwargs)
+        self.bottleneck3_3 = Bottleneck(128, 32, 128, dilation=4, **kwargs)
+        self.bottleneck3_4 = Bottleneck(128, 32, 128, **kwargs)
         self.bottleneck3_5 = Bottleneck(128, 32, 128, **kwargs)
-        self.bottleneck3_6 = Bottleneck(128, 32, 128, dilation=8, **kwargs)
-        self.bottleneck3_7 = Bottleneck(128, 32, 128, asymmetric=True, **kwargs)
-        self.bottleneck3_8 = Bottleneck(128, 32, 128, dilation=16, **kwargs)
-
+        self.bottleneck3_6 = Bottleneck(128, 32, 128, **kwargs)
+        self.bottleneck3_7 = Bottleneck(128, 32, 128, **kwargs)
+        self.bottleneck3_8 = Bottleneck(128, 32, 128, dilation=8, **kwargs)
+        self.bottleneck3_9 = Bottleneck(128, 32, 128, asymmetric=True, **kwargs)
+        self.bottleneck3_10 = Bottleneck(128, 32, 128, dilation=16, **kwargs)
+#block 4:
         self.bottleneck4_0 = UpsamplingBottleneck(128, 16, 64, **kwargs)
         self.bottleneck4_1 = Bottleneck(64, 16, 64, **kwargs)
         self.bottleneck4_2 = Bottleneck(64, 16, 64, **kwargs)
-
+        self.bottleneck4_3 = Bottleneck(64, 16, 64, **kwargs)
+        self.bottleneck4_4 = Bottleneck(64, 16, 64, **kwargs)
+        self.bottleneck4_5 = Bottleneck(64, 16, 64, **kwargs)
+        self.bottleneck4_6 = Bottleneck(64, 16, 64, **kwargs)
+        self.bottleneck4_7 = Bottleneck(64, 16, 64, **kwargs)
+        self.bottleneck4_8 = Bottleneck(64, 16, 64, **kwargs)
+        self.bottleneck4_9 = Bottleneck(64, 16, 64, **kwargs)
+        self.bottleneck4_10 = Bottleneck(64, 16, 64, **kwargs)
+#block 5:
         self.bottleneck5_0 = UpsamplingBottleneck(64, 4, 16, **kwargs)
         self.bottleneck5_1 = Bottleneck(16, 4, 16, **kwargs)
-
+        self.bottleneck5_2 = Bottleneck(16, 4, 16, **kwargs)
+        self.bottleneck5_3 = Bottleneck(16, 4, 16, **kwargs)
+        self.bottleneck5_4 = Bottleneck(16, 4, 16, **kwargs)
+        self.bottleneck5_5 = Bottleneck(16, 4, 16, **kwargs)
+        self.bottleneck5_6 = Bottleneck(16, 4, 16, **kwargs)
+        self.bottleneck5_7 = Bottleneck(16, 4, 16, **kwargs)
+        self.bottleneck5_8 = Bottleneck(16, 4, 16, **kwargs)
+        self.bottleneck5_9 = Bottleneck(16, 4, 16, **kwargs)
+        self.bottleneck5_10 = Bottleneck(16, 4, 16, **kwargs)
+#block 6:
         self.fullconv = nn.ConvTranspose2d(16, nclass, 2, 2, bias=False)
 
         self.__setattr__('exclusive', ['bottleneck1_0', 'bottleneck1_1', 'bottleneck1_2', 'bottleneck1_3',
-                                       'bottleneck1_4', 'bottleneck2_0', 'bottleneck2_1', 'bottleneck2_2',
-                                       'bottleneck2_3', 'bottleneck2_4', 'bottleneck2_5', 'bottleneck2_6',
-                                       'bottleneck2_7', 'bottleneck2_8', 'bottleneck3_1', 'bottleneck3_2',
-                                       'bottleneck3_3', 'bottleneck3_4', 'bottleneck3_5', 'bottleneck3_6',
-                                       'bottleneck3_7', 'bottleneck3_8', 'bottleneck4_0', 'bottleneck4_1',
-                                       'bottleneck4_2', 'bottleneck5_0', 'bottleneck5_1', 'fullconv'])
+                                       'bottleneck1_4', 'bottleneck1_5', 'bottleneck1_6', 'bottleneck1_7',
+                                       'bottleneck1_8', 'bottleneck1_9', 'bottleneck1_10','bottleneck2_0', 
+                                       'bottleneck2_1', 'bottleneck2_2', 'bottleneck2_3', 'bottleneck2_4', 
+                                       'bottleneck2_5', 'bottleneck2_6', 'bottleneck2_7', 'bottleneck2_8', 
+                                       'bottleneck2_9', 'bottleneck2_10','bottleneck3_0', 'bottleneck3_1', 
+                                       'bottleneck3_2', 'bottleneck3_3', 'bottleneck3_4', 'bottleneck3_5', 
+                                       'bottleneck3_6', 'bottleneck3_7', 'bottleneck3_8', 'bottleneck3_9', 
+                                       'bottleneck3_10','bottleneck4_0', 'bottleneck4_1', 'bottleneck4_2',
+                                       'bottleneck4_3', 'bottleneck4_4', 'bottleneck4_5', 'bottleneck4_6',
+                                       'bottleneck4_7', 'bottleneck4_8', 'bottleneck4_9', 'bottleneck4_10',
+                                       'bottleneck5_0', 'bottleneck5_1', 'bottleneck5_2', 'bottleneck5_3',
+                                       'bottleneck5_4', 'bottleneck5_5', 'bottleneck5_6', 'bottleneck5_7',
+                                       'bottleneck5_8', 'bottleneck5_9', 'bottleneck5_10','fullconv'])
 
     def forward(self, x):
         # init
@@ -64,7 +99,12 @@ def forward(self, x):
         x = self.bottleneck1_2(x)
         x = self.bottleneck1_3(x)
         x = self.bottleneck1_4(x)
-
+        x = self.bottleneck1_5(x)
+        x = self.bottleneck1_6(x)
+        x = self.bottleneck1_7(x)
+        x = self.bottleneck1_8(x)
+        x = self.bottleneck1_9(x)
+        x = self.bottleneck1_10(x)
         # stage 2
         x, max_indices2 = self.bottleneck2_0(x)
         x = self.bottleneck2_1(x)
@@ -75,39 +115,59 @@ def forward(self, x):
         x = self.bottleneck2_6(x)
         x = self.bottleneck2_7(x)
         x = self.bottleneck2_8(x)
-
+        x = self.bottleneck2_9(x)
+        x = self.bottleneck2_10(x)
         # stage 3
+        x = self.bottleneck3_0(x)
         x = self.bottleneck3_1(x)
         x = self.bottleneck3_2(x)
         x = self.bottleneck3_3(x)
         x = self.bottleneck3_4(x)
+        x = self.bottleneck3_5(x)
         x = self.bottleneck3_6(x)
         x = self.bottleneck3_7(x)
         x = self.bottleneck3_8(x)
+        x = self.bottleneck3_9(x)
+        x = self.bottleneck3_10(x)
 
         # stage 4
         x = self.bottleneck4_0(x, max_indices2)
         x = self.bottleneck4_1(x)
         x = self.bottleneck4_2(x)
-
+        x = self.bottleneck4_3(x)
+        x = self.bottleneck4_4(x)
+        x = self.bottleneck4_5(x)
+        x = self.bottleneck4_6(x)
+        x = self.bottleneck4_7(x)
+        x = self.bottleneck4_8(x)
+        x = self.bottleneck4_9(x)
+        x = self.bottleneck4_10(x)
         # stage 5
         x = self.bottleneck5_0(x, max_indices1)
         x = self.bottleneck5_1(x)
-
+        x = self.bottleneck5_2(x)
+        x = self.bottleneck5_3(x)
+        x = self.bottleneck5_4(x)
+        x = self.bottleneck5_5(x)
+        x = self.bottleneck5_6(x)
+        x = self.bottleneck5_7(x)
+        x = self.bottleneck5_8(x)
+        x = self.bottleneck5_9(x)
+        x = self.bottleneck5_10(x)
         # out
         x = self.fullconv(x)
         return tuple([x])
 
 
 class InitialBlock(nn.Module):
-    """ENet initial block"""
+    """swnet initial block"""
 
     def __init__(self, out_channels, norm_layer=nn.BatchNorm2d, **kwargs):
         super(InitialBlock, self).__init__()
         self.conv = nn.Conv2d(3, out_channels, 3, 2, 1, bias=False)
         self.maxpool = nn.MaxPool2d(2, 2)
         self.bn = norm_layer(out_channels + 3)
-        self.act = nn.PReLU()
+        self.act = nn.RReLU()
 
     def forward(self, x):
         x_conv = self.conv(x)
@@ -135,14 +195,14 @@ def __init__(self, in_channels, inter_channels, out_channels, dilation=1, asymme
         self.conv1 = nn.Sequential(
             nn.Conv2d(in_channels, inter_channels, 1, bias=False),
             norm_layer(inter_channels),
-            nn.PReLU()
+            nn.RReLU()
         )
 
         if downsampling:
             self.conv2 = nn.Sequential(
                 nn.Conv2d(inter_channels, inter_channels, 2, stride=2, bias=False),
                 norm_layer(inter_channels),
-                nn.PReLU()
+                nn.RReLU()
             )
         else:
             if asymmetric:
@@ -150,20 +210,20 @@ def __init__(self, in_channels, inter_channels, out_channels, dilation=1, asymme
                     nn.Conv2d(inter_channels, inter_channels, (5, 1), padding=(2, 0), bias=False),
                     nn.Conv2d(inter_channels, inter_channels, (1, 5), padding=(0, 2), bias=False),
                     norm_layer(inter_channels),
-                    nn.PReLU()
+                    nn.RReLU()
                 )
             else:
                 self.conv2 = nn.Sequential(
                     nn.Conv2d(inter_channels, inter_channels, 3, dilation=dilation, padding=dilation, bias=False),
                     norm_layer(inter_channels),
-                    nn.PReLU()
+                    nn.RReLU()
                 )
         self.conv3 = nn.Sequential(
             nn.Conv2d(inter_channels, out_channels, 1, bias=False),
             norm_layer(out_channels),
             nn.Dropout2d(0.1)
         )
-        self.act = nn.PReLU()
+        self.act = nn.RReLU()
 
     def forward(self, x):
         identity = x
@@ -196,15 +256,15 @@ def __init__(self, in_channels, inter_channels, out_channels, norm_layer=nn.Batc
         self.block = nn.Sequential(
             nn.Conv2d(in_channels, inter_channels, 1, bias=False),
             norm_layer(inter_channels),
-            nn.PReLU(),
+            nn.RReLU(),
             nn.ConvTranspose2d(inter_channels, inter_channels, 2, 2, bias=False),
             norm_layer(inter_channels),
-            nn.PReLU(),
+            nn.RReLU(),
             nn.Conv2d(inter_channels, out_channels, 1, bias=False),
             norm_layer(out_channels),
             nn.Dropout2d(0.1)
         )
-        self.act = nn.PReLU()
+        self.act = nn.RReLU()
 
     def forward(self, x, max_indices):
         out_up = self.conv(x)
diff --git a/scripts/demo.py b/scripts/demo.py
index bc5773307..5b34c8134 100644
--- a/scripts/demo.py
+++ b/scripts/demo.py
@@ -14,8 +14,8 @@
 
 parser = argparse.ArgumentParser(
     description='Predict segmentation result from a given image')
-parser.add_argument('--model', type=str, default='fcn32s_vgg16_voc',
-                    help='model name (default: fcn32_vgg16)')
+parser.add_argument('--model', type=str, default='swnet_resnet50_city',
+                    help='model name (default: swnet_resnet50)')
 parser.add_argument('--dataset', type=str, default='pascal_aug', choices=['pascal_voc, pascal_aug, ade20k, citys'],
                     help='dataset name (default: pascal_voc)')
 parser.add_argument('--save-folder', default='~/.torch/models',
diff --git a/scripts/fcn32s_vgg16_pascal_voc.sh b/scripts/fcn32s_vgg16_pascal_voc.sh
deleted file mode 100755
index 8e74c7584..000000000
--- a/scripts/fcn32s_vgg16_pascal_voc.sh
+++ /dev/null
@@ -1,6 +0,0 @@
-#!/usr/bin/env bash
-
-# train
-CUDA_VISIBLE_DEVICES=0 python train.py --model fcn32s \
-    --backbone vgg16 --dataset pascal_voc \
-    --lr 0.0001 --epochs 80
\ No newline at end of file
diff --git a/scripts/fcn32s_vgg16_pascal_voc_dist.sh b/scripts/fcn32s_vgg16_pascal_voc_dist.sh
deleted file mode 100755
index 5c826a44b..000000000
--- a/scripts/fcn32s_vgg16_pascal_voc_dist.sh
+++ /dev/null
@@ -1,7 +0,0 @@
-#!/usr/bin/env bash
-
-# train
-export NGPUS=4
-CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --model fcn32s \
-    --backbone vgg16 --dataset pascal_voc \
-    --lr 0.01 --epochs 80 --batch_size 16
\ No newline at end of file
diff --git a/scripts/swnet_resnet50_citys.sh b/scripts/swnet_resnet50_citys.sh
new file mode 100644
index 000000000..3cf9a0d48
--- /dev/null
+++ b/scripts/swnet_resnet50_citys.sh
@@ -0,0 +1,6 @@
+#!/usr/bin/env bash
+
+# train
+CUDA_VISIBLE_DEVICES=0 python train.py --model enet \
+    --backbone resnet50 --dataset citys \
+    --lr 0.0001 --epochs 50
diff --git a/scripts/train.py b/scripts/train.py
index e57f43899..51723ccf9 100644
--- a/scripts/train.py
+++ b/scripts/train.py
@@ -28,14 +28,11 @@ def parse_args():
     parser = argparse.ArgumentParser(description='Semantic Segmentation Training With Pytorch')
     # model and dataset
     parser.add_argument('--model', type=str, default='fcn',
-                        choices=['fcn32s', 'fcn16s', 'fcn8s', 'fcn', 'psp', 'deeplabv3', 
-                            'deeplabv3_plus', 'danet', 'denseaspp', 'bisenet', 'encnet', 
-                            'dunet', 'icnet', 'enet', 'ocnet', 'psanet', 'cgnet', 'espnet', 
-                            'lednet', 'dfanet'],
+                        choices=['swnet'],
                         help='model name (default: fcn32s)')
     parser.add_argument('--backbone', type=str, default='resnet50',
-                        choices=['vgg16', 'resnet18', 'resnet50', 'resnet101', 'resnet152', 
-                            'densenet121', 'densenet161', 'densenet169', 'densenet201'],
+                        choices=['resnet50', 'resnet101', 'resnet152', 
+                            ],
                         help='backbone name (default: vgg16)')
     parser.add_argument('--dataset', type=str, default='pascal_voc',
                         choices=['pascal_voc', 'pascal_aug', 'ade20k', 'citys', 'sbu'],