diff --git a/README.md b/README.md index d73bc462c..abed09648 100644 --- a/README.md +++ b/README.md @@ -1,213 +1,49 @@ -# Semantic Segmentation on PyTorch - -English | [简体中文](/README_zh-CN.md) - -[![python-image]][python-url] -[![pytorch-image]][pytorch-url] -[![lic-image]][lic-url] +# slightweight Segmentation This project aims at providing a concise, easy-to-use, modifiable reference implementation for semantic segmentation models using PyTorch. -
+stage 1: # Installation -## Installation - -``` -# semantic-segmentation-pytorch dependencies +stage 2: # dependencies pip install ninja tqdm -# follow PyTorch installation in https://pytorch.org/get-started/locally/ +stage 3: # follow PyTorch installation in https://pytorch.org/get-started/locally/ conda install pytorch torchvision -c pytorch -# install PyTorch Segmentation -git clone https://github.com/Tramac/awesome-semantic-segmentation-pytorch.git -``` - -## Usage -### Train ------------------ -- **Single GPU training** -``` -# for example, train fcn32_vgg16_pascal_voc: -python train.py --model fcn32s --backbone vgg16 --dataset pascal_voc --lr 0.0001 --epochs 50 -``` -- **Multi-GPU training** - -``` -# for example, train fcn32_vgg16_pascal_voc with 4 GPUs: -export NGPUS=4 -python -m torch.distributed.launch --nproc_per_node=$NGPUS train.py --model fcn32s --backbone vgg16 --dataset pascal_voc --lr 0.0001 --epochs 50 -``` - -### Evaluation ------------------ -- **Single GPU evaluating** -``` -# for example, evaluate fcn32_vgg16_pascal_voc -python eval.py --model fcn32s --backbone vgg16 --dataset pascal_voc -``` -- **Multi-GPU evaluating** -``` -# for example, evaluate fcn32_vgg16_pascal_voc with 4 GPUs: -export NGPUS=4 -python -m torch.distributed.launch --nproc_per_node=$NGPUS eval.py --model fcn32s --backbone vgg16 --dataset pascal_voc -``` +stage 4: # for example, train swnet_resnet_citys: +python train.py --model swnet --backbone resnet --dataset citys --lr 0.0001 --epochs 50 + + +stage 5: # for example, evaluate swnet_resnet_citys +python eval.py --model swnet --backbone resnet --dataset citys + ### Demo -``` + cd ./scripts #for new users: -python demo.py --model fcn32s_vgg16_voc --input-pic ../tests/test_img.jpg +python demo.py --model swnet_resnet_citys --input-pic ../tests/test_img.jpg #you should add 'test.jpg' by yourself -python demo.py --model fcn32s_vgg16_voc --input-pic ../datasets/test.jpg -``` - -``` -.{SEG_ROOT} -├── scripts -│ ├── demo.py -│ ├── eval.py -│ └── train.py -``` - -## Support - -#### Model - -- [FCN](https://arxiv.org/abs/1411.4038) -- [ENet](https://arxiv.org/pdf/1606.02147) -- [PSPNet](https://arxiv.org/pdf/1612.01105) -- [ICNet](https://arxiv.org/pdf/1704.08545) -- [DeepLabv3](https://arxiv.org/abs/1706.05587) -- [DeepLabv3+](https://arxiv.org/pdf/1802.02611) -- [DenseASPP](http://openaccess.thecvf.com/content_cvpr_2018/papers/Yang_DenseASPP_for_Semantic_CVPR_2018_paper.pdf) -- [EncNet](https://arxiv.org/abs/1803.08904v1) -- [BiSeNet](https://arxiv.org/abs/1808.00897) -- [PSANet](https://hszhao.github.io/papers/eccv18_psanet.pdf) -- [DANet](https://arxiv.org/pdf/1809.02983) -- [OCNet](https://arxiv.org/pdf/1809.00916) -- [CGNet](https://arxiv.org/pdf/1811.08201) -- [ESPNetv2](https://arxiv.org/abs/1811.11431) -- [DUNet(DUpsampling)](https://arxiv.org/abs/1903.02120) -- [FastFCN(JPU)](https://arxiv.org/abs/1903.11816) -- [LEDNet](https://arxiv.org/abs/1905.02423) -- [Fast-SCNN](https://github.com/Tramac/Fast-SCNN-pytorch) -- [LightSeg](https://github.com/Tramac/Lightweight-Segmentation) -- [DFANet](https://arxiv.org/abs/1904.02216) - -[DETAILS](https://github.com/Tramac/awesome-semantic-segmentation-pytorch/blob/master/docs/DETAILS.md) for model & backbone. -``` -.{SEG_ROOT} -├── core -│ ├── models -│ │ ├── bisenet.py -│ │ ├── danet.py -│ │ ├── deeplabv3.py -│ │ ├── deeplabv3+.py -│ │ ├── denseaspp.py -│ │ ├── dunet.py -│ │ ├── encnet.py -│ │ ├── fcn.py -│ │ ├── pspnet.py -│ │ ├── icnet.py -│ │ ├── enet.py -│ │ ├── ocnet.py -│ │ ├── psanet.py -│ │ ├── cgnet.py -│ │ ├── espnet.py -│ │ ├── lednet.py -│ │ ├── dfanet.py -│ │ ├── ...... -``` +python demo.py --model swnet_resnet_citys --input-pic ../datasets/test.jpg + +### performance evaluation + +![image](https://user-images.githubusercontent.com/43395674/159203398-86f4874e-7b0f-48a3-8414-cdf662d56f99.png) +![image](https://user-images.githubusercontent.com/43395674/159203405-7b656176-2e93-4d67-98e6-6d650204b0d6.png) + +![image](https://user-images.githubusercontent.com/43395674/159203470-99a509cc-68cc-4fa4-be65-43e0c9204cb1.png) +![image](https://user-images.githubusercontent.com/43395674/159203480-10ff8f81-965f-419c-ab98-83fade7b3b65.png) + +### a experiment on a simulative scene based on Jetson +![image](https://user-images.githubusercontent.com/43395674/159203486-19980424-c6c4-4644-a44b-9f52085b2067.png) + #### Dataset You can run script to download dataset, such as: -``` + cd ./core/data/downloader python ade20k.py --download-dir ../datasets/ade -``` - -| Dataset | training set | validation set | testing set | -| :----------------------------------------------------------: | :----------: | :------------: | :---------: | -| [VOC2012](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar) | 1464 | 1449 | ✘ | -| [VOCAug](http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/semantic_contours/benchmark.tgz) | 11355 | 2857 | ✘ | -| [ADK20K](http://groups.csail.mit.edu/vision/datasets/ADE20K/) | 20210 | 2000 | ✘ | -| [Cityscapes](https://www.cityscapes-dataset.com/downloads/) | 2975 | 500 | ✘ | -| [COCO](http://cocodataset.org/#download) | | | | -| [SBU-shadow](http://www3.cs.stonybrook.edu/~cvl/content/datasets/shadow_db/SBU-shadow.zip) | 4085 | 638 | ✘ | -| [LIP(Look into Person)](http://sysu-hcp.net/lip/) | 30462 | 10000 | 10000 | - -``` -.{SEG_ROOT} -├── core -│ ├── data -│ │ ├── dataloader -│ │ │ ├── ade.py -│ │ │ ├── cityscapes.py -│ │ │ ├── mscoco.py -│ │ │ ├── pascal_aug.py -│ │ │ ├── pascal_voc.py -│ │ │ ├── sbu_shadow.py -│ │ └── downloader -│ │ ├── ade20k.py -│ │ ├── cityscapes.py -│ │ ├── mscoco.py -│ │ ├── pascal_voc.py -│ │ └── sbu_shadow.py -``` - -## Result -- **PASCAL VOC 2012** - -|Methods|Backbone|TrainSet|EvalSet|crops_size|epochs|JPU|Mean IoU|pixAcc| -|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:| -|FCN32s|vgg16|train|val|480|60|✘|47.50|85.39| -|FCN16s|vgg16|train|val|480|60|✘|49.16|85.98| -|FCN8s|vgg16|train|val|480|60|✘|48.87|85.02| -|FCN32s|resnet50|train|val|480|50|✘|54.60|88.57| -|PSPNet|resnet50|train|val|480|60|✘|63.44|89.78| -|DeepLabv3|resnet50|train|val|480|60|✘|60.15|88.36| - -Note: `lr=1e-4, batch_size=4, epochs=80`. - -## Overfitting Test -See [TEST](https://github.com/Tramac/Awesome-semantic-segmentation-pytorch/tree/master/tests) for details. - -``` -.{SEG_ROOT} -├── tests -│ └── test_model.py -``` - -## To Do -- [x] add train script -- [ ] remove syncbn -- [ ] train & evaluate -- [x] test distributed training -- [x] fix syncbn ([Why SyncBN?](https://tramac.github.io/2019/02/25/%E8%B7%A8%E5%8D%A1%E5%90%8C%E6%AD%A5%20Batch%20Normalization[%E8%BD%AC]/)) -- [x] add distributed ([How DIST?]("https://tramac.github.io/2019/03/06/%E5%88%86%E5%B8%83%E5%BC%8F%E8%AE%AD%E7%BB%83-PyTorch/")) - -## References -- [PyTorch-Encoding](https://github.com/zhanghang1989/PyTorch-Encoding) -- [maskrcnn-benchmark](https://github.com/facebookresearch/maskrcnn-benchmark) -- [gloun-cv](https://github.com/dmlc/gluon-cv) -- [imagenet](https://github.com/pytorch/examples/tree/master/imagenet) - - - - - -[python-image]: https://img.shields.io/badge/Python-2.x|3.x-ff69b4.svg -[python-url]: https://www.python.org/ -[pytorch-image]: https://img.shields.io/badge/PyTorch-1.1-2BAF2B.svg -[pytorch-url]: https://pytorch.org/ -[lic-image]: https://img.shields.io/badge/Apache-2.0-blue.svg -[lic-url]: https://github.com/Tramac/Awesome-semantic-segmentation-pytorch/blob/master/LICENSE + +Acknowledgement: we thanks the code support from "awesome-semantic-segmentation-pytorch (https://github.com/Tramac/Awesome-semantic-segmentation-pytorch)". The swnet is a improvement from enet. + diff --git a/core/models/bisenet.py b/core/models/bisenet.py deleted file mode 100644 index fac6c0955..000000000 --- a/core/models/bisenet.py +++ /dev/null @@ -1,220 +0,0 @@ -"""Bilateral Segmentation Network""" -import torch -import torch.nn as nn -import torch.nn.functional as F - -from core.models.base_models.resnet import resnet18 -from core.nn import _ConvBNReLU - -__all__ = ['BiSeNet', 'get_bisenet', 'get_bisenet_resnet18_citys'] - - -class BiSeNet(nn.Module): - def __init__(self, nclass, backbone='resnet18', aux=False, jpu=False, pretrained_base=True, **kwargs): - super(BiSeNet, self).__init__() - self.aux = aux - self.spatial_path = SpatialPath(3, 128, **kwargs) - self.context_path = ContextPath(backbone, pretrained_base, **kwargs) - self.ffm = FeatureFusion(256, 256, 4, **kwargs) - self.head = _BiSeHead(256, 64, nclass, **kwargs) - if aux: - self.auxlayer1 = _BiSeHead(128, 256, nclass, **kwargs) - self.auxlayer2 = _BiSeHead(128, 256, nclass, **kwargs) - - self.__setattr__('exclusive', - ['spatial_path', 'context_path', 'ffm', 'head', 'auxlayer1', 'auxlayer2'] if aux else [ - 'spatial_path', 'context_path', 'ffm', 'head']) - - def forward(self, x): - size = x.size()[2:] - spatial_out = self.spatial_path(x) - context_out = self.context_path(x) - fusion_out = self.ffm(spatial_out, context_out[-1]) - outputs = [] - x = self.head(fusion_out) - x = F.interpolate(x, size, mode='bilinear', align_corners=True) - outputs.append(x) - - if self.aux: - auxout1 = self.auxlayer1(context_out[0]) - auxout1 = F.interpolate(auxout1, size, mode='bilinear', align_corners=True) - outputs.append(auxout1) - auxout2 = self.auxlayer2(context_out[1]) - auxout2 = F.interpolate(auxout2, size, mode='bilinear', align_corners=True) - outputs.append(auxout2) - return tuple(outputs) - - -class _BiSeHead(nn.Module): - def __init__(self, in_channels, inter_channels, nclass, norm_layer=nn.BatchNorm2d, **kwargs): - super(_BiSeHead, self).__init__() - self.block = nn.Sequential( - _ConvBNReLU(in_channels, inter_channels, 3, 1, 1, norm_layer=norm_layer), - nn.Dropout(0.1), - nn.Conv2d(inter_channels, nclass, 1) - ) - - def forward(self, x): - x = self.block(x) - return x - - -class SpatialPath(nn.Module): - """Spatial path""" - - def __init__(self, in_channels, out_channels, norm_layer=nn.BatchNorm2d, **kwargs): - super(SpatialPath, self).__init__() - inter_channels = 64 - self.conv7x7 = _ConvBNReLU(in_channels, inter_channels, 7, 2, 3, norm_layer=norm_layer) - self.conv3x3_1 = _ConvBNReLU(inter_channels, inter_channels, 3, 2, 1, norm_layer=norm_layer) - self.conv3x3_2 = _ConvBNReLU(inter_channels, inter_channels, 3, 2, 1, norm_layer=norm_layer) - self.conv1x1 = _ConvBNReLU(inter_channels, out_channels, 1, 1, 0, norm_layer=norm_layer) - - def forward(self, x): - x = self.conv7x7(x) - x = self.conv3x3_1(x) - x = self.conv3x3_2(x) - x = self.conv1x1(x) - - return x - - -class _GlobalAvgPooling(nn.Module): - def __init__(self, in_channels, out_channels, norm_layer, **kwargs): - super(_GlobalAvgPooling, self).__init__() - self.gap = nn.Sequential( - nn.AdaptiveAvgPool2d(1), - nn.Conv2d(in_channels, out_channels, 1, bias=False), - norm_layer(out_channels), - nn.ReLU(True) - ) - - def forward(self, x): - size = x.size()[2:] - pool = self.gap(x) - out = F.interpolate(pool, size, mode='bilinear', align_corners=True) - return out - - -class AttentionRefinmentModule(nn.Module): - def __init__(self, in_channels, out_channels, norm_layer=nn.BatchNorm2d, **kwargs): - super(AttentionRefinmentModule, self).__init__() - self.conv3x3 = _ConvBNReLU(in_channels, out_channels, 3, 1, 1, norm_layer=norm_layer) - self.channel_attention = nn.Sequential( - nn.AdaptiveAvgPool2d(1), - _ConvBNReLU(out_channels, out_channels, 1, 1, 0, norm_layer=norm_layer), - nn.Sigmoid() - ) - - def forward(self, x): - x = self.conv3x3(x) - attention = self.channel_attention(x) - x = x * attention - return x - - -class ContextPath(nn.Module): - def __init__(self, backbone='resnet18', pretrained_base=True, norm_layer=nn.BatchNorm2d, **kwargs): - super(ContextPath, self).__init__() - if backbone == 'resnet18': - pretrained = resnet18(pretrained=pretrained_base, **kwargs) - else: - raise RuntimeError('unknown backbone: {}'.format(backbone)) - self.conv1 = pretrained.conv1 - self.bn1 = pretrained.bn1 - self.relu = pretrained.relu - self.maxpool = pretrained.maxpool - self.layer1 = pretrained.layer1 - self.layer2 = pretrained.layer2 - self.layer3 = pretrained.layer3 - self.layer4 = pretrained.layer4 - - inter_channels = 128 - self.global_context = _GlobalAvgPooling(512, inter_channels, norm_layer) - - self.arms = nn.ModuleList( - [AttentionRefinmentModule(512, inter_channels, norm_layer, **kwargs), - AttentionRefinmentModule(256, inter_channels, norm_layer, **kwargs)] - ) - self.refines = nn.ModuleList( - [_ConvBNReLU(inter_channels, inter_channels, 3, 1, 1, norm_layer=norm_layer), - _ConvBNReLU(inter_channels, inter_channels, 3, 1, 1, norm_layer=norm_layer)] - ) - - def forward(self, x): - x = self.conv1(x) - x = self.bn1(x) - x = self.relu(x) - x = self.maxpool(x) - x = self.layer1(x) - - context_blocks = [] - context_blocks.append(x) - x = self.layer2(x) - context_blocks.append(x) - c3 = self.layer3(x) - context_blocks.append(c3) - c4 = self.layer4(c3) - context_blocks.append(c4) - context_blocks.reverse() - - global_context = self.global_context(c4) - last_feature = global_context - context_outputs = [] - for i, (feature, arm, refine) in enumerate(zip(context_blocks[:2], self.arms, self.refines)): - feature = arm(feature) - feature += last_feature - last_feature = F.interpolate(feature, size=context_blocks[i + 1].size()[2:], - mode='bilinear', align_corners=True) - last_feature = refine(last_feature) - context_outputs.append(last_feature) - - return context_outputs - - -class FeatureFusion(nn.Module): - def __init__(self, in_channels, out_channels, reduction=1, norm_layer=nn.BatchNorm2d, **kwargs): - super(FeatureFusion, self).__init__() - self.conv1x1 = _ConvBNReLU(in_channels, out_channels, 1, 1, 0, norm_layer=norm_layer, **kwargs) - self.channel_attention = nn.Sequential( - nn.AdaptiveAvgPool2d(1), - _ConvBNReLU(out_channels, out_channels // reduction, 1, 1, 0, norm_layer=norm_layer), - _ConvBNReLU(out_channels // reduction, out_channels, 1, 1, 0, norm_layer=norm_layer), - nn.Sigmoid() - ) - - def forward(self, x1, x2): - fusion = torch.cat([x1, x2], dim=1) - out = self.conv1x1(fusion) - attention = self.channel_attention(out) - out = out + out * attention - return out - - -def get_bisenet(dataset='citys', backbone='resnet18', pretrained=False, root='~/.torch/models', - pretrained_base=True, **kwargs): - acronyms = { - 'pascal_voc': 'pascal_voc', - 'pascal_aug': 'pascal_aug', - 'ade20k': 'ade', - 'coco': 'coco', - 'citys': 'citys', - } - from ..data.dataloader import datasets - model = BiSeNet(datasets[dataset].NUM_CLASS, backbone=backbone, pretrained_base=pretrained_base, **kwargs) - if pretrained: - from .model_store import get_model_file - device = torch.device(kwargs['local_rank']) - model.load_state_dict(torch.load(get_model_file('bisenet_%s_%s' % (backbone, acronyms[dataset]), root=root), - map_location=device)) - return model - - -def get_bisenet_resnet18_citys(**kwargs): - return get_bisenet('citys', 'resnet18', **kwargs) - - -if __name__ == '__main__': - img = torch.randn(2, 3, 224, 224) - model = BiSeNet(19, backbone='resnet18') - print(model.exclusive) diff --git a/core/models/cgnet.py b/core/models/cgnet.py deleted file mode 100644 index 9cae5c837..000000000 --- a/core/models/cgnet.py +++ /dev/null @@ -1,210 +0,0 @@ -"""Context Guided Network for Semantic Segmentation""" -import torch -import torch.nn as nn -import torch.nn.functional as F - -from core.nn import _ConvBNPReLU, _BNPReLU - -__all__ = ['CGNet', 'get_cgnet', 'get_cgnet_citys'] - - -class CGNet(nn.Module): - r"""CGNet - - Parameters - ---------- - nclass : int - Number of categories for the training dataset. - norm_layer : object - Normalization layer used in backbone network (default: :class:`nn.BatchNorm`; - for Synchronized Cross-GPU BachNormalization). - aux : bool - Auxiliary loss. - - Reference: - Tianyi Wu, et al. "CGNet: A Light-weight Context Guided Network for Semantic Segmentation." - arXiv preprint arXiv:1811.08201 (2018). - """ - - def __init__(self, nclass, backbone='', aux=False, jpu=False, pretrained_base=True, M=3, N=21, **kwargs): - super(CGNet, self).__init__() - # stage 1 - self.stage1_0 = _ConvBNPReLU(3, 32, 3, 2, 1, **kwargs) - self.stage1_1 = _ConvBNPReLU(32, 32, 3, 1, 1, **kwargs) - self.stage1_2 = _ConvBNPReLU(32, 32, 3, 1, 1, **kwargs) - - self.sample1 = _InputInjection(1) - self.sample2 = _InputInjection(2) - self.bn_prelu1 = _BNPReLU(32 + 3, **kwargs) - - # stage 2 - self.stage2_0 = ContextGuidedBlock(32 + 3, 64, dilation=2, reduction=8, down=True, residual=False, **kwargs) - self.stage2 = nn.ModuleList() - for i in range(0, M - 1): - self.stage2.append(ContextGuidedBlock(64, 64, dilation=2, reduction=8, **kwargs)) - self.bn_prelu2 = _BNPReLU(128 + 3, **kwargs) - - # stage 3 - self.stage3_0 = ContextGuidedBlock(128 + 3, 128, dilation=4, reduction=16, down=True, residual=False, **kwargs) - self.stage3 = nn.ModuleList() - for i in range(0, N - 1): - self.stage3.append(ContextGuidedBlock(128, 128, dilation=4, reduction=16, **kwargs)) - self.bn_prelu3 = _BNPReLU(256, **kwargs) - - self.head = nn.Sequential( - nn.Dropout2d(0.1, False), - nn.Conv2d(256, nclass, 1)) - - self.__setattr__('exclusive', ['stage1_0', 'stage1_1', 'stage1_2', 'sample1', 'sample2', - 'bn_prelu1', 'stage2_0', 'stage2', 'bn_prelu2', 'stage3_0', - 'stage3', 'bn_prelu3', 'head']) - - def forward(self, x): - size = x.size()[2:] - # stage1 - out0 = self.stage1_0(x) - out0 = self.stage1_1(out0) - out0 = self.stage1_2(out0) - - inp1 = self.sample1(x) - inp2 = self.sample2(x) - - # stage 2 - out0_cat = self.bn_prelu1(torch.cat([out0, inp1], dim=1)) - out1_0 = self.stage2_0(out0_cat) - for i, layer in enumerate(self.stage2): - if i == 0: - out1 = layer(out1_0) - else: - out1 = layer(out1) - out1_cat = self.bn_prelu2(torch.cat([out1, out1_0, inp2], dim=1)) - - # stage 3 - out2_0 = self.stage3_0(out1_cat) - for i, layer in enumerate(self.stage3): - if i == 0: - out2 = layer(out2_0) - else: - out2 = layer(out2) - out2_cat = self.bn_prelu3(torch.cat([out2_0, out2], dim=1)) - - outputs = [] - out = self.head(out2_cat) - out = F.interpolate(out, size, mode='bilinear', align_corners=True) - outputs.append(out) - return tuple(outputs) - - -class _ChannelWiseConv(nn.Module): - def __init__(self, in_channels, out_channels, dilation=1, **kwargs): - super(_ChannelWiseConv, self).__init__() - self.conv = nn.Conv2d(in_channels, out_channels, 3, 1, dilation, dilation, groups=in_channels, bias=False) - - def forward(self, x): - x = self.conv(x) - return x - - -class _FGlo(nn.Module): - def __init__(self, in_channels, reduction=16, **kwargs): - super(_FGlo, self).__init__() - self.gap = nn.AdaptiveAvgPool2d(1) - self.fc = nn.Sequential( - nn.Linear(in_channels, in_channels // reduction), - nn.ReLU(True), - nn.Linear(in_channels // reduction, in_channels), - nn.Sigmoid()) - - def forward(self, x): - n, c, _, _ = x.size() - out = self.gap(x).view(n, c) - out = self.fc(out).view(n, c, 1, 1) - return x * out - - -class _InputInjection(nn.Module): - def __init__(self, ratio): - super(_InputInjection, self).__init__() - self.pool = nn.ModuleList() - for i in range(0, ratio): - self.pool.append(nn.AvgPool2d(3, 2, 1)) - - def forward(self, x): - for pool in self.pool: - x = pool(x) - return x - - -class _ConcatInjection(nn.Module): - def __init__(self, in_channels, norm_layer=nn.BatchNorm2d, **kwargs): - super(_ConcatInjection, self).__init__() - self.bn = norm_layer(in_channels) - self.prelu = nn.PReLU(in_channels) - - def forward(self, x1, x2): - out = torch.cat([x1, x2], dim=1) - out = self.bn(out) - out = self.prelu(out) - return out - - -class ContextGuidedBlock(nn.Module): - def __init__(self, in_channels, out_channels, dilation=2, reduction=16, down=False, - residual=True, norm_layer=nn.BatchNorm2d, **kwargs): - super(ContextGuidedBlock, self).__init__() - inter_channels = out_channels // 2 if not down else out_channels - if down: - self.conv = _ConvBNPReLU(in_channels, inter_channels, 3, 2, 1, norm_layer=norm_layer, **kwargs) - self.reduce = nn.Conv2d(inter_channels * 2, out_channels, 1, bias=False) - else: - self.conv = _ConvBNPReLU(in_channels, inter_channels, 1, 1, 0, norm_layer=norm_layer, **kwargs) - self.f_loc = _ChannelWiseConv(inter_channels, inter_channels, **kwargs) - self.f_sur = _ChannelWiseConv(inter_channels, inter_channels, dilation, **kwargs) - self.bn = norm_layer(inter_channels * 2) - self.prelu = nn.PReLU(inter_channels * 2) - self.f_glo = _FGlo(out_channels, reduction, **kwargs) - self.down = down - self.residual = residual - - def forward(self, x): - out = self.conv(x) - loc = self.f_loc(out) - sur = self.f_sur(out) - - joi_feat = torch.cat([loc, sur], dim=1) - joi_feat = self.prelu(self.bn(joi_feat)) - if self.down: - joi_feat = self.reduce(joi_feat) - - out = self.f_glo(joi_feat) - if self.residual: - out = out + x - - return out - - -def get_cgnet(dataset='citys', backbone='', pretrained=False, root='~/.torch/models', pretrained_base=True, **kwargs): - acronyms = { - 'pascal_voc': 'pascal_voc', - 'pascal_aug': 'pascal_aug', - 'ade20k': 'ade', - 'coco': 'coco', - 'citys': 'citys', - } - from core.data.dataloader import datasets - model = CGNet(datasets[dataset].NUM_CLASS, backbone=backbone, pretrained_base=pretrained_base, **kwargs) - if pretrained: - from .model_store import get_model_file - device = torch.device(kwargs['local_rank']) - model.load_state_dict(torch.load(get_model_file('cgnet_%s' % (acronyms[dataset]), root=root), - map_location=device)) - return model - - -def get_cgnet_citys(**kwargs): - return get_cgnet('citys', '', **kwargs) - - -if __name__ == '__main__': - model = get_cgnet_citys() - print(model) diff --git a/core/models/danet.py b/core/models/danet.py deleted file mode 100644 index 0e8de740b..000000000 --- a/core/models/danet.py +++ /dev/null @@ -1,215 +0,0 @@ -"""Dual Attention Network""" -import torch -import torch.nn as nn -import torch.nn.functional as F - -from .segbase import SegBaseModel - -__all__ = ['DANet', 'get_danet', 'get_danet_resnet50_citys', - 'get_danet_resnet101_citys', 'get_danet_resnet152_citys'] - - -class DANet(SegBaseModel): - r"""Pyramid Scene Parsing Network - - Parameters - ---------- - nclass : int - Number of categories for the training dataset. - backbone : string - Pre-trained dilated backbone network type (default:'resnet50'; 'resnet50', - 'resnet101' or 'resnet152'). - norm_layer : object - Normalization layer used in backbone network (default: :class:`mxnet.gluon.nn.BatchNorm`; - for Synchronized Cross-GPU BachNormalization). - aux : bool - Auxiliary loss. - Reference: - Jun Fu, Jing Liu, Haijie Tian, Yong Li, Yongjun Bao, Zhiwei Fang,and Hanqing Lu. - "Dual Attention Network for Scene Segmentation." *CVPR*, 2019 - """ - - def __init__(self, nclass, backbone='resnet50', aux=True, pretrained_base=True, **kwargs): - super(DANet, self).__init__(nclass, aux, backbone, pretrained_base=pretrained_base, **kwargs) - self.head = _DAHead(2048, nclass, aux, **kwargs) - - self.__setattr__('exclusive', ['head']) - - def forward(self, x): - size = x.size()[2:] - _, _, c3, c4 = self.base_forward(x) - outputs = [] - x = self.head(c4) - x0 = F.interpolate(x[0], size, mode='bilinear', align_corners=True) - outputs.append(x0) - - if self.aux: - x1 = F.interpolate(x[1], size, mode='bilinear', align_corners=True) - x2 = F.interpolate(x[2], size, mode='bilinear', align_corners=True) - outputs.append(x1) - outputs.append(x2) - return outputs - - -class _PositionAttentionModule(nn.Module): - """ Position attention module""" - - def __init__(self, in_channels, **kwargs): - super(_PositionAttentionModule, self).__init__() - self.conv_b = nn.Conv2d(in_channels, in_channels // 8, 1) - self.conv_c = nn.Conv2d(in_channels, in_channels // 8, 1) - self.conv_d = nn.Conv2d(in_channels, in_channels, 1) - self.alpha = nn.Parameter(torch.zeros(1)) - self.softmax = nn.Softmax(dim=-1) - - def forward(self, x): - batch_size, _, height, width = x.size() - feat_b = self.conv_b(x).view(batch_size, -1, height * width).permute(0, 2, 1) - feat_c = self.conv_c(x).view(batch_size, -1, height * width) - attention_s = self.softmax(torch.bmm(feat_b, feat_c)) - feat_d = self.conv_d(x).view(batch_size, -1, height * width) - feat_e = torch.bmm(feat_d, attention_s.permute(0, 2, 1)).view(batch_size, -1, height, width) - out = self.alpha * feat_e + x - - return out - - -class _ChannelAttentionModule(nn.Module): - """Channel attention module""" - - def __init__(self, **kwargs): - super(_ChannelAttentionModule, self).__init__() - self.beta = nn.Parameter(torch.zeros(1)) - self.softmax = nn.Softmax(dim=-1) - - def forward(self, x): - batch_size, _, height, width = x.size() - feat_a = x.view(batch_size, -1, height * width) - feat_a_transpose = x.view(batch_size, -1, height * width).permute(0, 2, 1) - attention = torch.bmm(feat_a, feat_a_transpose) - attention_new = torch.max(attention, dim=-1, keepdim=True)[0].expand_as(attention) - attention - attention = self.softmax(attention_new) - - feat_e = torch.bmm(attention, feat_a).view(batch_size, -1, height, width) - out = self.beta * feat_e + x - - return out - - -class _DAHead(nn.Module): - def __init__(self, in_channels, nclass, aux=True, norm_layer=nn.BatchNorm2d, norm_kwargs=None, **kwargs): - super(_DAHead, self).__init__() - self.aux = aux - inter_channels = in_channels // 4 - self.conv_p1 = nn.Sequential( - nn.Conv2d(in_channels, inter_channels, 3, padding=1, bias=False), - norm_layer(inter_channels, **({} if norm_kwargs is None else norm_kwargs)), - nn.ReLU(True) - ) - self.conv_c1 = nn.Sequential( - nn.Conv2d(in_channels, inter_channels, 3, padding=1, bias=False), - norm_layer(inter_channels, **({} if norm_kwargs is None else norm_kwargs)), - nn.ReLU(True) - ) - self.pam = _PositionAttentionModule(inter_channels, **kwargs) - self.cam = _ChannelAttentionModule(**kwargs) - self.conv_p2 = nn.Sequential( - nn.Conv2d(inter_channels, inter_channels, 3, padding=1, bias=False), - norm_layer(inter_channels, **({} if norm_kwargs is None else norm_kwargs)), - nn.ReLU(True) - ) - self.conv_c2 = nn.Sequential( - nn.Conv2d(inter_channels, inter_channels, 3, padding=1, bias=False), - norm_layer(inter_channels, **({} if norm_kwargs is None else norm_kwargs)), - nn.ReLU(True) - ) - self.out = nn.Sequential( - nn.Dropout(0.1), - nn.Conv2d(inter_channels, nclass, 1) - ) - if aux: - self.conv_p3 = nn.Sequential( - nn.Dropout(0.1), - nn.Conv2d(inter_channels, nclass, 1) - ) - self.conv_c3 = nn.Sequential( - nn.Dropout(0.1), - nn.Conv2d(inter_channels, nclass, 1) - ) - - def forward(self, x): - feat_p = self.conv_p1(x) - feat_p = self.pam(feat_p) - feat_p = self.conv_p2(feat_p) - - feat_c = self.conv_c1(x) - feat_c = self.cam(feat_c) - feat_c = self.conv_c2(feat_c) - - feat_fusion = feat_p + feat_c - - outputs = [] - fusion_out = self.out(feat_fusion) - outputs.append(fusion_out) - if self.aux: - p_out = self.conv_p3(feat_p) - c_out = self.conv_c3(feat_c) - outputs.append(p_out) - outputs.append(c_out) - - return tuple(outputs) - - -def get_danet(dataset='citys', backbone='resnet50', pretrained=False, - root='~/.torch/models', pretrained_base=True, **kwargs): - r"""Dual Attention Network - - Parameters - ---------- - dataset : str, default pascal_voc - The dataset that model pretrained on. (pascal_voc, ade20k) - pretrained : bool or str - Boolean value controls whether to load the default pretrained weights for model. - String value represents the hashtag for a certain version of pretrained weights. - root : str, default '~/.torch/models' - Location for keeping the model parameters. - pretrained_base : bool or str, default True - This will load pretrained backbone network, that was trained on ImageNet. - Examples - -------- - >>> model = get_danet(dataset='pascal_voc', backbone='resnet50', pretrained=False) - >>> print(model) - """ - acronyms = { - 'pascal_voc': 'pascal_voc', - 'pascal_aug': 'pascal_aug', - 'ade20k': 'ade', - 'coco': 'coco', - 'citys': 'citys', - } - from ..data.dataloader import datasets - model = DANet(datasets[dataset].NUM_CLASS, backbone=backbone, pretrained_base=pretrained_base, **kwargs) - if pretrained: - from .model_store import get_model_file - device = torch.device(kwargs['local_rank']) - model.load_state_dict(torch.load(get_model_file('danet_%s_%s' % (backbone, acronyms[dataset]), root=root), - map_location=device)) - return model - - -def get_danet_resnet50_citys(**kwargs): - return get_danet('citys', 'resnet50', **kwargs) - - -def get_danet_resnet101_citys(**kwargs): - return get_danet('citys', 'resnet101', **kwargs) - - -def get_danet_resnet152_citys(**kwargs): - return get_danet('citys', 'resnet152', **kwargs) - - -if __name__ == '__main__': - img = torch.randn(2, 3, 480, 480) - model = get_danet_resnet50_citys() - outputs = model(img) diff --git a/core/models/deeplabv3.py b/core/models/deeplabv3.py deleted file mode 100644 index 98d0c02a3..000000000 --- a/core/models/deeplabv3.py +++ /dev/null @@ -1,185 +0,0 @@ -"""Pyramid Scene Parsing Network""" -import torch -import torch.nn as nn -import torch.nn.functional as F - -from .segbase import SegBaseModel -from .fcn import _FCNHead - -__all__ = ['DeepLabV3', 'get_deeplabv3', 'get_deeplabv3_resnet50_voc', 'get_deeplabv3_resnet101_voc', - 'get_deeplabv3_resnet152_voc', 'get_deeplabv3_resnet50_ade', 'get_deeplabv3_resnet101_ade', - 'get_deeplabv3_resnet152_ade'] - - -class DeepLabV3(SegBaseModel): - r"""DeepLabV3 - - Parameters - ---------- - nclass : int - Number of categories for the training dataset. - backbone : string - Pre-trained dilated backbone network type (default:'resnet50'; 'resnet50', - 'resnet101' or 'resnet152'). - norm_layer : object - Normalization layer used in backbone network (default: :class:`nn.BatchNorm`; - for Synchronized Cross-GPU BachNormalization). - aux : bool - Auxiliary loss. - - Reference: - Chen, Liang-Chieh, et al. "Rethinking atrous convolution for semantic image segmentation." - arXiv preprint arXiv:1706.05587 (2017). - """ - - def __init__(self, nclass, backbone='resnet50', aux=False, pretrained_base=True, **kwargs): - super(DeepLabV3, self).__init__(nclass, aux, backbone, pretrained_base=pretrained_base, **kwargs) - self.head = _DeepLabHead(nclass, **kwargs) - if self.aux: - self.auxlayer = _FCNHead(1024, nclass, **kwargs) - - self.__setattr__('exclusive', ['head', 'auxlayer'] if aux else ['head']) - - def forward(self, x): - size = x.size()[2:] - _, _, c3, c4 = self.base_forward(x) - outputs = [] - x = self.head(c4) - x = F.interpolate(x, size, mode='bilinear', align_corners=True) - outputs.append(x) - - if self.aux: - auxout = self.auxlayer(c3) - auxout = F.interpolate(auxout, size, mode='bilinear', align_corners=True) - outputs.append(auxout) - return tuple(outputs) - - -class _DeepLabHead(nn.Module): - def __init__(self, nclass, norm_layer=nn.BatchNorm2d, norm_kwargs=None, **kwargs): - super(_DeepLabHead, self).__init__() - self.aspp = _ASPP(2048, [12, 24, 36], norm_layer=norm_layer, norm_kwargs=norm_kwargs, **kwargs) - self.block = nn.Sequential( - nn.Conv2d(256, 256, 3, padding=1, bias=False), - norm_layer(256, **({} if norm_kwargs is None else norm_kwargs)), - nn.ReLU(True), - nn.Dropout(0.1), - nn.Conv2d(256, nclass, 1) - ) - - def forward(self, x): - x = self.aspp(x) - return self.block(x) - - -class _ASPPConv(nn.Module): - def __init__(self, in_channels, out_channels, atrous_rate, norm_layer, norm_kwargs): - super(_ASPPConv, self).__init__() - self.block = nn.Sequential( - nn.Conv2d(in_channels, out_channels, 3, padding=atrous_rate, dilation=atrous_rate, bias=False), - norm_layer(out_channels, **({} if norm_kwargs is None else norm_kwargs)), - nn.ReLU(True) - ) - - def forward(self, x): - return self.block(x) - - -class _AsppPooling(nn.Module): - def __init__(self, in_channels, out_channels, norm_layer, norm_kwargs, **kwargs): - super(_AsppPooling, self).__init__() - self.gap = nn.Sequential( - nn.AdaptiveAvgPool2d(1), - nn.Conv2d(in_channels, out_channels, 1, bias=False), - norm_layer(out_channels, **({} if norm_kwargs is None else norm_kwargs)), - nn.ReLU(True) - ) - - def forward(self, x): - size = x.size()[2:] - pool = self.gap(x) - out = F.interpolate(pool, size, mode='bilinear', align_corners=True) - return out - - -class _ASPP(nn.Module): - def __init__(self, in_channels, atrous_rates, norm_layer, norm_kwargs, **kwargs): - super(_ASPP, self).__init__() - out_channels = 256 - self.b0 = nn.Sequential( - nn.Conv2d(in_channels, out_channels, 1, bias=False), - norm_layer(out_channels, **({} if norm_kwargs is None else norm_kwargs)), - nn.ReLU(True) - ) - - rate1, rate2, rate3 = tuple(atrous_rates) - self.b1 = _ASPPConv(in_channels, out_channels, rate1, norm_layer, norm_kwargs) - self.b2 = _ASPPConv(in_channels, out_channels, rate2, norm_layer, norm_kwargs) - self.b3 = _ASPPConv(in_channels, out_channels, rate3, norm_layer, norm_kwargs) - self.b4 = _AsppPooling(in_channels, out_channels, norm_layer=norm_layer, norm_kwargs=norm_kwargs) - - self.project = nn.Sequential( - nn.Conv2d(5 * out_channels, out_channels, 1, bias=False), - norm_layer(out_channels, **({} if norm_kwargs is None else norm_kwargs)), - nn.ReLU(True), - nn.Dropout(0.5) - ) - - def forward(self, x): - feat1 = self.b0(x) - feat2 = self.b1(x) - feat3 = self.b2(x) - feat4 = self.b3(x) - feat5 = self.b4(x) - x = torch.cat((feat1, feat2, feat3, feat4, feat5), dim=1) - x = self.project(x) - return x - - -def get_deeplabv3(dataset='pascal_voc', backbone='resnet50', pretrained=False, root='~/.torch/models', - pretrained_base=True, **kwargs): - acronyms = { - 'pascal_voc': 'pascal_voc', - 'pascal_aug': 'pascal_aug', - 'ade20k': 'ade', - 'coco': 'coco', - 'citys': 'citys', - } - from ..data.dataloader import datasets - model = DeepLabV3(datasets[dataset].NUM_CLASS, backbone=backbone, pretrained_base=pretrained_base, **kwargs) - if pretrained: - from .model_store import get_model_file - device = torch.device(kwargs['local_rank']) - model.load_state_dict(torch.load(get_model_file('deeplabv3_%s_%s' % (backbone, acronyms[dataset]), root=root), - map_location=device)) - return model - - -def get_deeplabv3_resnet50_voc(**kwargs): - return get_deeplabv3('pascal_voc', 'resnet50', **kwargs) - - -def get_deeplabv3_resnet101_voc(**kwargs): - return get_deeplabv3('pascal_voc', 'resnet101', **kwargs) - - -def get_deeplabv3_resnet152_voc(**kwargs): - return get_deeplabv3('pascal_voc', 'resnet152', **kwargs) - - -def get_deeplabv3_resnet50_ade(**kwargs): - return get_deeplabv3('ade20k', 'resnet50', **kwargs) - - -def get_deeplabv3_resnet101_ade(**kwargs): - return get_deeplabv3('ade20k', 'resnet101', **kwargs) - - -def get_deeplabv3_resnet152_ade(**kwargs): - return get_deeplabv3('ade20k', 'resnet152', **kwargs) - - -if __name__ == '__main__': - model = get_deeplabv3_resnet50_voc() - img = torch.randn(2, 3, 480, 480) - output = model(img) diff --git a/core/models/deeplabv3_plus.py b/core/models/deeplabv3_plus.py deleted file mode 100644 index 9b5a70355..000000000 --- a/core/models/deeplabv3_plus.py +++ /dev/null @@ -1,142 +0,0 @@ -import torch -import torch.nn as nn -import torch.nn.functional as F - -from .base_models.xception import get_xception -from .deeplabv3 import _ASPP -from .fcn import _FCNHead -from ..nn import _ConvBNReLU - -__all__ = ['DeepLabV3Plus', 'get_deeplabv3_plus', 'get_deeplabv3_plus_xception_voc'] - - -class DeepLabV3Plus(nn.Module): - r"""DeepLabV3Plus - Parameters - ---------- - nclass : int - Number of categories for the training dataset. - backbone : string - Pre-trained dilated backbone network type (default:'xception'). - norm_layer : object - Normalization layer used in backbone network (default: :class:`nn.BatchNorm`; - for Synchronized Cross-GPU BachNormalization). - aux : bool - Auxiliary loss. - - Reference: - Chen, Liang-Chieh, et al. "Encoder-Decoder with Atrous Separable Convolution for Semantic - Image Segmentation." - """ - - def __init__(self, nclass, backbone='xception', aux=True, pretrained_base=True, dilated=True, **kwargs): - super(DeepLabV3Plus, self).__init__() - self.aux = aux - self.nclass = nclass - output_stride = 8 if dilated else 32 - - self.pretrained = get_xception(pretrained=pretrained_base, output_stride=output_stride, **kwargs) - - # deeplabv3 plus - self.head = _DeepLabHead(nclass, **kwargs) - if aux: - self.auxlayer = _FCNHead(728, nclass, **kwargs) - - def base_forward(self, x): - # Entry flow - x = self.pretrained.conv1(x) - x = self.pretrained.bn1(x) - x = self.pretrained.relu(x) - - x = self.pretrained.conv2(x) - x = self.pretrained.bn2(x) - x = self.pretrained.relu(x) - - x = self.pretrained.block1(x) - # add relu here - x = self.pretrained.relu(x) - low_level_feat = x - - x = self.pretrained.block2(x) - x = self.pretrained.block3(x) - - # Middle flow - x = self.pretrained.midflow(x) - mid_level_feat = x - - # Exit flow - x = self.pretrained.block20(x) - x = self.pretrained.relu(x) - x = self.pretrained.conv3(x) - x = self.pretrained.bn3(x) - x = self.pretrained.relu(x) - - x = self.pretrained.conv4(x) - x = self.pretrained.bn4(x) - x = self.pretrained.relu(x) - - x = self.pretrained.conv5(x) - x = self.pretrained.bn5(x) - x = self.pretrained.relu(x) - return low_level_feat, mid_level_feat, x - - def forward(self, x): - size = x.size()[2:] - c1, c3, c4 = self.base_forward(x) - outputs = list() - x = self.head(c4, c1) - x = F.interpolate(x, size, mode='bilinear', align_corners=True) - outputs.append(x) - if self.aux: - auxout = self.auxlayer(c3) - auxout = F.interpolate(auxout, size, mode='bilinear', align_corners=True) - outputs.append(auxout) - return tuple(outputs) - - -class _DeepLabHead(nn.Module): - def __init__(self, nclass, c1_channels=128, norm_layer=nn.BatchNorm2d, **kwargs): - super(_DeepLabHead, self).__init__() - self.aspp = _ASPP(2048, [12, 24, 36], norm_layer=norm_layer, **kwargs) - self.c1_block = _ConvBNReLU(c1_channels, 48, 3, padding=1, norm_layer=norm_layer) - self.block = nn.Sequential( - _ConvBNReLU(304, 256, 3, padding=1, norm_layer=norm_layer), - nn.Dropout(0.5), - _ConvBNReLU(256, 256, 3, padding=1, norm_layer=norm_layer), - nn.Dropout(0.1), - nn.Conv2d(256, nclass, 1)) - - def forward(self, x, c1): - size = c1.size()[2:] - c1 = self.c1_block(c1) - x = self.aspp(x) - x = F.interpolate(x, size, mode='bilinear', align_corners=True) - return self.block(torch.cat([x, c1], dim=1)) - - -def get_deeplabv3_plus(dataset='pascal_voc', backbone='xception', pretrained=False, root='~/.torch/models', - pretrained_base=True, **kwargs): - acronyms = { - 'pascal_voc': 'pascal_voc', - 'pascal_aug': 'pascal_aug', - 'ade20k': 'ade', - 'coco': 'coco', - 'citys': 'citys', - } - from ..data.dataloader import datasets - model = DeepLabV3Plus(datasets[dataset].NUM_CLASS, backbone=backbone, pretrained_base=pretrained_base, **kwargs) - if pretrained: - from .model_store import get_model_file - device = torch.device(kwargs['local_rank']) - model.load_state_dict( - torch.load(get_model_file('deeplabv3_plus_%s_%s' % (backbone, acronyms[dataset]), root=root), - map_location=device)) - return model - - -def get_deeplabv3_plus_xception_voc(**kwargs): - return get_deeplabv3_plus('pascal_voc', 'xception', **kwargs) - - -if __name__ == '__main__': - model = get_deeplabv3_plus_xception_voc() diff --git a/core/models/denseaspp.py b/core/models/denseaspp.py deleted file mode 100644 index bc0ef927b..000000000 --- a/core/models/denseaspp.py +++ /dev/null @@ -1,178 +0,0 @@ -import torch -import torch.nn as nn -import torch.nn.functional as F - -from .base_models.densenet import * -from .fcn import _FCNHead - -__all__ = ['DenseASPP', 'get_denseaspp', 'get_denseaspp_densenet121_citys', - 'get_denseaspp_densenet161_citys', 'get_denseaspp_densenet169_citys', 'get_denseaspp_densenet201_citys'] - - -class DenseASPP(nn.Module): - def __init__(self, nclass, backbone='densenet121', aux=False, jpu=False, - pretrained_base=True, dilate_scale=8, **kwargs): - super(DenseASPP, self).__init__() - self.nclass = nclass - self.aux = aux - self.dilate_scale = dilate_scale - if backbone == 'densenet121': - self.pretrained = dilated_densenet121(dilate_scale, pretrained=pretrained_base, **kwargs) - elif backbone == 'densenet161': - self.pretrained = dilated_densenet161(dilate_scale, pretrained=pretrained_base, **kwargs) - elif backbone == 'densenet169': - self.pretrained = dilated_densenet169(dilate_scale, pretrained=pretrained_base, **kwargs) - elif backbone == 'densenet201': - self.pretrained = dilated_densenet201(dilate_scale, pretrained=pretrained_base, **kwargs) - else: - raise RuntimeError('unknown backbone: {}'.format(backbone)) - in_channels = self.pretrained.num_features - - self.head = _DenseASPPHead(in_channels, nclass) - - if aux: - self.auxlayer = _FCNHead(in_channels, nclass, **kwargs) - - self.__setattr__('exclusive', ['head', 'auxlayer'] if aux else ['head']) - - def forward(self, x): - size = x.size()[2:] - features = self.pretrained.features(x) - if self.dilate_scale > 8: - features = F.interpolate(features, scale_factor=2, mode='bilinear', align_corners=True) - outputs = [] - x = self.head(features) - x = F.interpolate(x, size, mode='bilinear', align_corners=True) - outputs.append(x) - - if self.aux: - auxout = self.auxlayer(features) - auxout = F.interpolate(auxout, size, mode='bilinear', align_corners=True) - outputs.append(auxout) - return tuple(outputs) - - -class _DenseASPPHead(nn.Module): - def __init__(self, in_channels, nclass, norm_layer=nn.BatchNorm2d, norm_kwargs=None, **kwargs): - super(_DenseASPPHead, self).__init__() - self.dense_aspp_block = _DenseASPPBlock(in_channels, 256, 64, norm_layer, norm_kwargs) - self.block = nn.Sequential( - nn.Dropout(0.1), - nn.Conv2d(in_channels + 5 * 64, nclass, 1) - ) - - def forward(self, x): - x = self.dense_aspp_block(x) - return self.block(x) - - -class _DenseASPPConv(nn.Sequential): - def __init__(self, in_channels, inter_channels, out_channels, atrous_rate, - drop_rate=0.1, norm_layer=nn.BatchNorm2d, norm_kwargs=None): - super(_DenseASPPConv, self).__init__() - self.add_module('conv1', nn.Conv2d(in_channels, inter_channels, 1)), - self.add_module('bn1', norm_layer(inter_channels, **({} if norm_kwargs is None else norm_kwargs))), - self.add_module('relu1', nn.ReLU(True)), - self.add_module('conv2', nn.Conv2d(inter_channels, out_channels, 3, dilation=atrous_rate, padding=atrous_rate)), - self.add_module('bn2', norm_layer(out_channels, **({} if norm_kwargs is None else norm_kwargs))), - self.add_module('relu2', nn.ReLU(True)), - self.drop_rate = drop_rate - - def forward(self, x): - features = super(_DenseASPPConv, self).forward(x) - if self.drop_rate > 0: - features = F.dropout(features, p=self.drop_rate, training=self.training) - return features - - -class _DenseASPPBlock(nn.Module): - def __init__(self, in_channels, inter_channels1, inter_channels2, - norm_layer=nn.BatchNorm2d, norm_kwargs=None): - super(_DenseASPPBlock, self).__init__() - self.aspp_3 = _DenseASPPConv(in_channels, inter_channels1, inter_channels2, 3, 0.1, - norm_layer, norm_kwargs) - self.aspp_6 = _DenseASPPConv(in_channels + inter_channels2 * 1, inter_channels1, inter_channels2, 6, 0.1, - norm_layer, norm_kwargs) - self.aspp_12 = _DenseASPPConv(in_channels + inter_channels2 * 2, inter_channels1, inter_channels2, 12, 0.1, - norm_layer, norm_kwargs) - self.aspp_18 = _DenseASPPConv(in_channels + inter_channels2 * 3, inter_channels1, inter_channels2, 18, 0.1, - norm_layer, norm_kwargs) - self.aspp_24 = _DenseASPPConv(in_channels + inter_channels2 * 4, inter_channels1, inter_channels2, 24, 0.1, - norm_layer, norm_kwargs) - - def forward(self, x): - aspp3 = self.aspp_3(x) - x = torch.cat([aspp3, x], dim=1) - - aspp6 = self.aspp_6(x) - x = torch.cat([aspp6, x], dim=1) - - aspp12 = self.aspp_12(x) - x = torch.cat([aspp12, x], dim=1) - - aspp18 = self.aspp_18(x) - x = torch.cat([aspp18, x], dim=1) - - aspp24 = self.aspp_24(x) - x = torch.cat([aspp24, x], dim=1) - - return x - - -def get_denseaspp(dataset='citys', backbone='densenet121', pretrained=False, - root='~/.torch/models', pretrained_base=True, **kwargs): - r"""DenseASPP - - Parameters - ---------- - dataset : str, default citys - The dataset that model pretrained on. (pascal_voc, ade20k) - pretrained : bool or str - Boolean value controls whether to load the default pretrained weights for model. - String value represents the hashtag for a certain version of pretrained weights. - root : str, default '~/.torch/models' - Location for keeping the model parameters. - pretrained_base : bool or str, default True - This will load pretrained backbone network, that was trained on ImageNet. - Examples - -------- - >>> model = get_denseaspp(dataset='citys', backbone='densenet121', pretrained=False) - >>> print(model) - """ - acronyms = { - 'pascal_voc': 'pascal_voc', - 'pascal_aug': 'pascal_aug', - 'ade20k': 'ade', - 'coco': 'coco', - 'citys': 'citys', - } - from ..data.dataloader import datasets - model = DenseASPP(datasets[dataset].NUM_CLASS, backbone=backbone, pretrained_base=pretrained_base, **kwargs) - if pretrained: - from .model_store import get_model_file - device = torch.device(kwargs['local_rank']) - model.load_state_dict(torch.load(get_model_file('denseaspp_%s_%s' % (backbone, acronyms[dataset]), root=root), - map_location=device)) - return model - - -def get_denseaspp_densenet121_citys(**kwargs): - return get_denseaspp('citys', 'densenet121', **kwargs) - - -def get_denseaspp_densenet161_citys(**kwargs): - return get_denseaspp('citys', 'densenet161', **kwargs) - - -def get_denseaspp_densenet169_citys(**kwargs): - return get_denseaspp('citys', 'densenet169', **kwargs) - - -def get_denseaspp_densenet201_citys(**kwargs): - return get_denseaspp('citys', 'densenet201', **kwargs) - - -if __name__ == '__main__': - img = torch.randn(2, 3, 480, 480) - model = get_denseaspp_densenet121_citys() - outputs = model(img) diff --git a/core/models/dfanet.py b/core/models/dfanet.py deleted file mode 100644 index dd43bff0f..000000000 --- a/core/models/dfanet.py +++ /dev/null @@ -1,111 +0,0 @@ -""" Deep Feature Aggregation""" -import torch -import torch.nn as nn -import torch.nn.functional as F - -from core.models.base_models import Enc, FCAttention, get_xception_a -from core.nn import _ConvBNReLU - -__all__ = ['DFANet', 'get_dfanet', 'get_dfanet_citys'] - - -class DFANet(nn.Module): - def __init__(self, nclass, backbone='', aux=False, jpu=False, pretrained_base=False, **kwargs): - super(DFANet, self).__init__() - self.pretrained = get_xception_a(pretrained_base, **kwargs) - - self.enc2_2 = Enc(240, 48, 4, **kwargs) - self.enc3_2 = Enc(144, 96, 6, **kwargs) - self.enc4_2 = Enc(288, 192, 4, **kwargs) - self.fca_2 = FCAttention(192, **kwargs) - - self.enc2_3 = Enc(240, 48, 4, **kwargs) - self.enc3_3 = Enc(144, 96, 6, **kwargs) - self.enc3_4 = Enc(288, 192, 4, **kwargs) - self.fca_3 = FCAttention(192, **kwargs) - - self.enc2_1_reduce = _ConvBNReLU(48, 32, 1, **kwargs) - self.enc2_2_reduce = _ConvBNReLU(48, 32, 1, **kwargs) - self.enc2_3_reduce = _ConvBNReLU(48, 32, 1, **kwargs) - self.conv_fusion = _ConvBNReLU(32, 32, 1, **kwargs) - - self.fca_1_reduce = _ConvBNReLU(192, 32, 1, **kwargs) - self.fca_2_reduce = _ConvBNReLU(192, 32, 1, **kwargs) - self.fca_3_reduce = _ConvBNReLU(192, 32, 1, **kwargs) - self.conv_out = nn.Conv2d(32, nclass, 1) - - self.__setattr__('exclusive', ['enc2_2', 'enc3_2', 'enc4_2', 'fca_2', 'enc2_3', 'enc3_3', 'enc3_4', 'fca_3', - 'enc2_1_reduce', 'enc2_2_reduce', 'enc2_3_reduce', 'conv_fusion', 'fca_1_reduce', - 'fca_2_reduce', 'fca_3_reduce', 'conv_out']) - - def forward(self, x): - # backbone - stage1_conv1 = self.pretrained.conv1(x) - stage1_enc2 = self.pretrained.enc2(stage1_conv1) - stage1_enc3 = self.pretrained.enc3(stage1_enc2) - stage1_enc4 = self.pretrained.enc4(stage1_enc3) - stage1_fca = self.pretrained.fca(stage1_enc4) - stage1_out = F.interpolate(stage1_fca, scale_factor=4, mode='bilinear', align_corners=True) - - # stage2 - stage2_enc2 = self.enc2_2(torch.cat([stage1_enc2, stage1_out], dim=1)) - stage2_enc3 = self.enc3_2(torch.cat([stage1_enc3, stage2_enc2], dim=1)) - stage2_enc4 = self.enc4_2(torch.cat([stage1_enc4, stage2_enc3], dim=1)) - stage2_fca = self.fca_2(stage2_enc4) - stage2_out = F.interpolate(stage2_fca, scale_factor=4, mode='bilinear', align_corners=True) - - # stage3 - stage3_enc2 = self.enc2_3(torch.cat([stage2_enc2, stage2_out], dim=1)) - stage3_enc3 = self.enc3_3(torch.cat([stage2_enc3, stage3_enc2], dim=1)) - stage3_enc4 = self.enc3_4(torch.cat([stage2_enc4, stage3_enc3], dim=1)) - stage3_fca = self.fca_3(stage3_enc4) - - stage1_enc2_decoder = self.enc2_1_reduce(stage1_enc2) - stage2_enc2_docoder = F.interpolate(self.enc2_2_reduce(stage2_enc2), scale_factor=2, - mode='bilinear', align_corners=True) - stage3_enc2_decoder = F.interpolate(self.enc2_3_reduce(stage3_enc2), scale_factor=4, - mode='bilinear', align_corners=True) - fusion = stage1_enc2_decoder + stage2_enc2_docoder + stage3_enc2_decoder - fusion = self.conv_fusion(fusion) - - stage1_fca_decoder = F.interpolate(self.fca_1_reduce(stage1_fca), scale_factor=4, - mode='bilinear', align_corners=True) - stage2_fca_decoder = F.interpolate(self.fca_2_reduce(stage2_fca), scale_factor=8, - mode='bilinear', align_corners=True) - stage3_fca_decoder = F.interpolate(self.fca_3_reduce(stage3_fca), scale_factor=16, - mode='bilinear', align_corners=True) - fusion = fusion + stage1_fca_decoder + stage2_fca_decoder + stage3_fca_decoder - - outputs = list() - out = self.conv_out(fusion) - out = F.interpolate(out, scale_factor=4, mode='bilinear', align_corners=True) - outputs.append(out) - - return tuple(outputs) - - -def get_dfanet(dataset='citys', backbone='', pretrained=False, root='~/.torch/models', - pretrained_base=True, **kwargs): - acronyms = { - 'pascal_voc': 'pascal_voc', - 'pascal_aug': 'pascal_aug', - 'ade20k': 'ade', - 'coco': 'coco', - 'citys': 'citys', - } - from ..data.dataloader import datasets - model = DFANet(datasets[dataset].NUM_CLASS, backbone=backbone, pretrained_base=pretrained_base, **kwargs) - if pretrained: - from .model_store import get_model_file - device = torch.device(kwargs['local_rank']) - model.load_state_dict(torch.load(get_model_file('dfanet_%s' % (acronyms[dataset]), root=root), - map_location=device)) - return model - - -def get_dfanet_citys(**kwargs): - return get_dfanet('citys', **kwargs) - - -if __name__ == '__main__': - model = get_dfanet_citys() diff --git a/core/models/dunet.py b/core/models/dunet.py deleted file mode 100644 index ed1eb9cb1..000000000 --- a/core/models/dunet.py +++ /dev/null @@ -1,155 +0,0 @@ -"""Decoders Matter for Semantic Segmentation""" -import torch -import torch.nn as nn -import torch.nn.functional as F - -from .segbase import SegBaseModel -from .fcn import _FCNHead - -__all__ = ['DUNet', 'get_dunet', 'get_dunet_resnet50_pascal_voc', - 'get_dunet_resnet101_pascal_voc', 'get_dunet_resnet152_pascal_voc'] - - -# The model may be wrong because lots of details missing in paper. -class DUNet(SegBaseModel): - """Decoders Matter for Semantic Segmentation - - Reference: - Zhi Tian, Tong He, Chunhua Shen, and Youliang Yan. - "Decoders Matter for Semantic Segmentation: - Data-Dependent Decoding Enables Flexible Feature Aggregation." CVPR, 2019 - """ - - def __init__(self, nclass, backbone='resnet50', aux=True, pretrained_base=True, **kwargs): - super(DUNet, self).__init__(nclass, aux, backbone, pretrained_base=pretrained_base, **kwargs) - self.head = _DUHead(2144, **kwargs) - self.dupsample = DUpsampling(256, nclass, scale_factor=8, **kwargs) - if aux: - self.auxlayer = _FCNHead(1024, 256, **kwargs) - self.aux_dupsample = DUpsampling(256, nclass, scale_factor=8, **kwargs) - - self.__setattr__('exclusive', - ['dupsample', 'head', 'auxlayer', 'aux_dupsample'] if aux else ['dupsample', 'head']) - - def forward(self, x): - c1, c2, c3, c4 = self.base_forward(x) - outputs = [] - x = self.head(c2, c3, c4) - x = self.dupsample(x) - outputs.append(x) - - if self.aux: - auxout = self.auxlayer(c3) - auxout = self.aux_dupsample(auxout) - outputs.append(auxout) - return tuple(outputs) - - -class FeatureFused(nn.Module): - """Module for fused features""" - - def __init__(self, inter_channels=48, norm_layer=nn.BatchNorm2d, **kwargs): - super(FeatureFused, self).__init__() - self.conv2 = nn.Sequential( - nn.Conv2d(512, inter_channels, 1, bias=False), - norm_layer(inter_channels), - nn.ReLU(True) - ) - self.conv3 = nn.Sequential( - nn.Conv2d(1024, inter_channels, 1, bias=False), - norm_layer(inter_channels), - nn.ReLU(True) - ) - - def forward(self, c2, c3, c4): - size = c4.size()[2:] - c2 = self.conv2(F.interpolate(c2, size, mode='bilinear', align_corners=True)) - c3 = self.conv3(F.interpolate(c3, size, mode='bilinear', align_corners=True)) - fused_feature = torch.cat([c4, c3, c2], dim=1) - return fused_feature - - -class _DUHead(nn.Module): - def __init__(self, in_channels, norm_layer=nn.BatchNorm2d, **kwargs): - super(_DUHead, self).__init__() - self.fuse = FeatureFused(norm_layer=norm_layer, **kwargs) - self.block = nn.Sequential( - nn.Conv2d(in_channels, 256, 3, padding=1, bias=False), - norm_layer(256), - nn.ReLU(True), - nn.Conv2d(256, 256, 3, padding=1, bias=False), - norm_layer(256), - nn.ReLU(True) - ) - - def forward(self, c2, c3, c4): - fused_feature = self.fuse(c2, c3, c4) - out = self.block(fused_feature) - return out - - -class DUpsampling(nn.Module): - """DUsampling module""" - - def __init__(self, in_channels, out_channels, scale_factor=2, **kwargs): - super(DUpsampling, self).__init__() - self.scale_factor = scale_factor - self.conv_w = nn.Conv2d(in_channels, out_channels * scale_factor * scale_factor, 1, bias=False) - - def forward(self, x): - x = self.conv_w(x) - n, c, h, w = x.size() - - # N, C, H, W --> N, W, H, C - x = x.permute(0, 3, 2, 1).contiguous() - - # N, W, H, C --> N, W, H * scale, C // scale - x = x.view(n, w, h * self.scale_factor, c // self.scale_factor) - - # N, W, H * scale, C // scale --> N, H * scale, W, C // scale - x = x.permute(0, 2, 1, 3).contiguous() - - # N, H * scale, W, C // scale --> N, H * scale, W * scale, C // (scale ** 2) - x = x.view(n, h * self.scale_factor, w * self.scale_factor, c // (self.scale_factor * self.scale_factor)) - - # N, H * scale, W * scale, C // (scale ** 2) -- > N, C // (scale ** 2), H * scale, W * scale - x = x.permute(0, 3, 1, 2) - - return x - - -def get_dunet(dataset='pascal_voc', backbone='resnet50', pretrained=False, - root='~/.torch/models', pretrained_base=True, **kwargs): - acronyms = { - 'pascal_voc': 'pascal_voc', - 'pascal_aug': 'pascal_aug', - 'ade20k': 'ade', - 'coco': 'coco', - 'citys': 'citys', - } - from ..data.dataloader import datasets - model = DUNet(datasets[dataset].NUM_CLASS, backbone=backbone, pretrained_base=pretrained_base, **kwargs) - if pretrained: - from .model_store import get_model_file - device = torch.device(kwargs['local_rank']) - model.load_state_dict(torch.load(get_model_file('dunet_%s_%s' % (backbone, acronyms[dataset]), root=root), - map_location=device)) - return model - - -def get_dunet_resnet50_pascal_voc(**kwargs): - return get_dunet('pascal_voc', 'resnet50', **kwargs) - - -def get_dunet_resnet101_pascal_voc(**kwargs): - return get_dunet('pascal_voc', 'resnet101', **kwargs) - - -def get_dunet_resnet152_pascal_voc(**kwargs): - return get_dunet('pascal_voc', 'resnet152', **kwargs) - - -if __name__ == '__main__': - img = torch.randn(2, 3, 256, 256) - model = get_dunet_resnet50_pascal_voc() - outputs = model(img) diff --git a/core/models/encnet.py b/core/models/encnet.py deleted file mode 100644 index 585557bde..000000000 --- a/core/models/encnet.py +++ /dev/null @@ -1,212 +0,0 @@ -"""Context Encoding for Semantic Segmentation""" -import torch -import torch.nn as nn -import torch.nn.functional as F - -from .segbase import SegBaseModel -from .fcn import _FCNHead - -__all__ = ['EncNet', 'EncModule', 'get_encnet', 'get_encnet_resnet50_ade', - 'get_encnet_resnet101_ade', 'get_encnet_resnet152_ade'] - - -class EncNet(SegBaseModel): - def __init__(self, nclass, backbone='resnet50', aux=True, se_loss=True, lateral=False, - pretrained_base=True, **kwargs): - super(EncNet, self).__init__(nclass, aux, backbone, pretrained_base=pretrained_base, **kwargs) - self.head = _EncHead(2048, nclass, se_loss=se_loss, lateral=lateral, **kwargs) - if aux: - self.auxlayer = _FCNHead(1024, nclass, **kwargs) - - self.__setattr__('exclusive', ['head', 'auxlayer'] if aux else ['head']) - - def forward(self, x): - size = x.size()[2:] - features = self.base_forward(x) - - x = list(self.head(*features)) - x[0] = F.interpolate(x[0], size, mode='bilinear', align_corners=True) - if self.aux: - auxout = self.auxlayer(features[2]) - auxout = F.interpolate(auxout, size, mode='bilinear', align_corners=True) - x.append(auxout) - return tuple(x) - - -class _EncHead(nn.Module): - def __init__(self, in_channels, nclass, se_loss=True, lateral=True, - norm_layer=nn.BatchNorm2d, norm_kwargs=None, **kwargs): - super(_EncHead, self).__init__() - self.lateral = lateral - self.conv5 = nn.Sequential( - nn.Conv2d(in_channels, 512, 3, padding=1, bias=False), - norm_layer(512, **({} if norm_kwargs is None else norm_kwargs)), - nn.ReLU(True) - ) - if lateral: - self.connect = nn.ModuleList([ - nn.Sequential( - nn.Conv2d(512, 512, 1, bias=False), - norm_layer(512, **({} if norm_kwargs is None else norm_kwargs)), - nn.ReLU(True)), - nn.Sequential( - nn.Conv2d(1024, 512, 1, bias=False), - norm_layer(512, **({} if norm_kwargs is None else norm_kwargs)), - nn.ReLU(True)), - ]) - self.fusion = nn.Sequential( - nn.Conv2d(3 * 512, 512, 3, padding=1, bias=False), - norm_layer(512, **({} if norm_kwargs is None else norm_kwargs)), - nn.ReLU(True) - ) - self.encmodule = EncModule(512, nclass, ncodes=32, se_loss=se_loss, - norm_layer=norm_layer, norm_kwargs=norm_kwargs, **kwargs) - self.conv6 = nn.Sequential( - nn.Dropout(0.1, False), - nn.Conv2d(512, nclass, 1) - ) - - def forward(self, *inputs): - feat = self.conv5(inputs[-1]) - if self.lateral: - c2 = self.connect[0](inputs[1]) - c3 = self.connect[1](inputs[2]) - feat = self.fusion(torch.cat([feat, c2, c3], 1)) - outs = list(self.encmodule(feat)) - outs[0] = self.conv6(outs[0]) - return tuple(outs) - - -class EncModule(nn.Module): - def __init__(self, in_channels, nclass, ncodes=32, se_loss=True, - norm_layer=nn.BatchNorm2d, norm_kwargs=None, **kwargs): - super(EncModule, self).__init__() - self.se_loss = se_loss - self.encoding = nn.Sequential( - nn.Conv2d(in_channels, in_channels, 1, bias=False), - norm_layer(in_channels, **({} if norm_kwargs is None else norm_kwargs)), - nn.ReLU(True), - Encoding(D=in_channels, K=ncodes), - nn.BatchNorm1d(ncodes), - nn.ReLU(True), - Mean(dim=1) - ) - self.fc = nn.Sequential( - nn.Linear(in_channels, in_channels), - nn.Sigmoid() - ) - if self.se_loss: - self.selayer = nn.Linear(in_channels, nclass) - - def forward(self, x): - en = self.encoding(x) - b, c, _, _ = x.size() - gamma = self.fc(en) - y = gamma.view(b, c, 1, 1) - outputs = [F.relu_(x + x * y)] - if self.se_loss: - outputs.append(self.selayer(en)) - return tuple(outputs) - - -class Encoding(nn.Module): - def __init__(self, D, K): - super(Encoding, self).__init__() - # init codewords and smoothing factor - self.D, self.K = D, K - self.codewords = nn.Parameter(torch.Tensor(K, D), requires_grad=True) - self.scale = nn.Parameter(torch.Tensor(K), requires_grad=True) - self.reset_params() - - def reset_params(self): - std1 = 1. / ((self.K * self.D) ** (1 / 2)) - self.codewords.data.uniform_(-std1, std1) - self.scale.data.uniform_(-1, 0) - - def forward(self, X): - # input X is a 4D tensor - assert (X.size(1) == self.D) - B, D = X.size(0), self.D - if X.dim() == 3: - # BxDxN -> BxNxD - X = X.transpose(1, 2).contiguous() - elif X.dim() == 4: - # BxDxHxW -> Bx(HW)xD - X = X.view(B, D, -1).transpose(1, 2).contiguous() - else: - raise RuntimeError('Encoding Layer unknown input dims!') - # assignment weights BxNxK - A = F.softmax(self.scale_l2(X, self.codewords, self.scale), dim=2) - # aggregate - E = self.aggregate(A, X, self.codewords) - return E - - def __repr__(self): - return self.__class__.__name__ + '(' \ - + 'N x' + str(self.D) + '=>' + str(self.K) + 'x' \ - + str(self.D) + ')' - - @staticmethod - def scale_l2(X, C, S): - S = S.view(1, 1, C.size(0), 1) - X = X.unsqueeze(2).expand(X.size(0), X.size(1), C.size(0), C.size(1)) - C = C.unsqueeze(0).unsqueeze(0) - SL = S * (X - C) - SL = SL.pow(2).sum(3) - return SL - - @staticmethod - def aggregate(A, X, C): - A = A.unsqueeze(3) - X = X.unsqueeze(2).expand(X.size(0), X.size(1), C.size(0), C.size(1)) - C = C.unsqueeze(0).unsqueeze(0) - E = A * (X - C) - E = E.sum(1) - return E - - -class Mean(nn.Module): - def __init__(self, dim, keep_dim=False): - super(Mean, self).__init__() - self.dim = dim - self.keep_dim = keep_dim - - def forward(self, input): - return input.mean(self.dim, self.keep_dim) - - -def get_encnet(dataset='pascal_voc', backbone='resnet50', pretrained=False, root='~/.torch/models', - pretrained_base=True, **kwargs): - acronyms = { - 'pascal_voc': 'pascal_voc', - 'pascal_aug': 'pascal_aug', - 'ade20k': 'ade', - 'coco': 'coco', - 'citys': 'citys', - } - from ..data.dataloader import datasets - model = EncNet(datasets[dataset].NUM_CLASS, backbone=backbone, pretrained_base=pretrained_base, **kwargs) - if pretrained: - from .model_store import get_model_file - device = torch.device(kwargs['local_rank']) - model.load_state_dict(torch.load(get_model_file('encnet_%s_%s' % (backbone, acronyms[dataset]), root=root), - map_location=device)) - return model - - -def get_encnet_resnet50_ade(**kwargs): - return get_encnet('ade20k', 'resnet50', **kwargs) - - -def get_encnet_resnet101_ade(**kwargs): - return get_encnet('ade20k', 'resnet101', **kwargs) - - -def get_encnet_resnet152_ade(**kwargs): - return get_encnet('ade20k', 'resnet152', **kwargs) - - -if __name__ == '__main__': - img = torch.randn(2, 3, 224, 224) - model = get_encnet_resnet50_ade() - outputs = model(img) diff --git a/core/models/espnet.py b/core/models/espnet.py deleted file mode 100644 index 051058c1e..000000000 --- a/core/models/espnet.py +++ /dev/null @@ -1,117 +0,0 @@ -"ESPNetv2: A Light-weight, Power Efficient, and General Purpose for Semantic Segmentation" -import torch -import torch.nn as nn -import torch.nn.functional as F - -from core.models.base_models import eespnet, EESP -from core.nn import _ConvBNPReLU, _BNPReLU - - -class ESPNetV2(nn.Module): - r"""ESPNetV2 - - Parameters - ---------- - nclass : int - Number of categories for the training dataset. - backbone : string - Pre-trained dilated backbone network type (default:'resnet50'; 'resnet50', - 'resnet101' or 'resnet152'). - norm_layer : object - Normalization layer used in backbone network (default: :class:`nn.BatchNorm`; - for Synchronized Cross-GPU BachNormalization). - aux : bool - Auxiliary loss. - - Reference: - Sachin Mehta, et al. "ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network." - arXiv preprint arXiv:1811.11431 (2018). - """ - - def __init__(self, nclass, backbone='', aux=False, jpu=False, pretrained_base=False, **kwargs): - super(ESPNetV2, self).__init__() - self.pretrained = eespnet(pretrained=pretrained_base, **kwargs) - self.proj_L4_C = _ConvBNPReLU(256, 128, 1, **kwargs) - self.pspMod = nn.Sequential( - EESP(256, 128, stride=1, k=4, r_lim=7, **kwargs), - _PSPModule(128, 128, **kwargs)) - self.project_l3 = nn.Sequential( - nn.Dropout2d(0.1), - nn.Conv2d(128, nclass, 1, bias=False)) - self.act_l3 = _BNPReLU(nclass, **kwargs) - self.project_l2 = _ConvBNPReLU(64 + nclass, nclass, 1, **kwargs) - self.project_l1 = nn.Sequential( - nn.Dropout2d(0.1), - nn.Conv2d(32 + nclass, nclass, 1, bias=False)) - - self.aux = aux - - self.__setattr__('exclusive', ['proj_L4_C', 'pspMod', 'project_l3', 'act_l3', 'project_l2', 'project_l1']) - - def forward(self, x): - size = x.size()[2:] - out_l1, out_l2, out_l3, out_l4 = self.pretrained(x, seg=True) - out_l4_proj = self.proj_L4_C(out_l4) - up_l4_to_l3 = F.interpolate(out_l4_proj, scale_factor=2, mode='bilinear', align_corners=True) - merged_l3_upl4 = self.pspMod(torch.cat([out_l3, up_l4_to_l3], 1)) - proj_merge_l3_bef_act = self.project_l3(merged_l3_upl4) - proj_merge_l3 = self.act_l3(proj_merge_l3_bef_act) - out_up_l3 = F.interpolate(proj_merge_l3, scale_factor=2, mode='bilinear', align_corners=True) - merge_l2 = self.project_l2(torch.cat([out_l2, out_up_l3], 1)) - out_up_l2 = F.interpolate(merge_l2, scale_factor=2, mode='bilinear', align_corners=True) - merge_l1 = self.project_l1(torch.cat([out_l1, out_up_l2], 1)) - - outputs = list() - merge1_l1 = F.interpolate(merge_l1, scale_factor=2, mode='bilinear', align_corners=True) - outputs.append(merge1_l1) - if self.aux: - # different from paper - auxout = F.interpolate(proj_merge_l3_bef_act, size, mode='bilinear', align_corners=True) - outputs.append(auxout) - - return tuple(outputs) - - -# different from PSPNet -class _PSPModule(nn.Module): - def __init__(self, in_channels, out_channels=1024, sizes=(1, 2, 4, 8), **kwargs): - super(_PSPModule, self).__init__() - self.stages = nn.ModuleList( - [nn.Conv2d(in_channels, in_channels, 3, 1, 1, groups=in_channels, bias=False) for _ in sizes]) - self.project = _ConvBNPReLU(in_channels * (len(sizes) + 1), out_channels, 1, 1, **kwargs) - - def forward(self, x): - size = x.size()[2:] - feats = [x] - for stage in self.stages: - x = F.avg_pool2d(x, kernel_size=3, stride=2, padding=1) - upsampled = F.interpolate(stage(x), size, mode='bilinear', align_corners=True) - feats.append(upsampled) - return self.project(torch.cat(feats, dim=1)) - - -def get_espnet(dataset='pascal_voc', backbone='', pretrained=False, root='~/.torch/models', - pretrained_base=False, **kwargs): - acronyms = { - 'pascal_voc': 'pascal_voc', - 'pascal_aug': 'pascal_aug', - 'ade20k': 'ade', - 'coco': 'coco', - 'citys': 'citys', - } - from core.data.dataloader import datasets - model = ESPNetV2(datasets[dataset].NUM_CLASS, backbone=backbone, pretrained_base=pretrained_base, **kwargs) - if pretrained: - from .model_store import get_model_file - device = torch.device(kwargs['local_rank']) - model.load_state_dict(torch.load(get_model_file('espnet_%s_%s' % (backbone, acronyms[dataset]), root=root), - map_location=device)) - return model - - -def get_espnet_citys(**kwargs): - return get_espnet('citys', **kwargs) - - -if __name__ == '__main__': - model = get_espnet_citys() diff --git a/core/models/hrnet.py b/core/models/hrnet.py deleted file mode 100644 index 8ad08e3f5..000000000 --- a/core/models/hrnet.py +++ /dev/null @@ -1,29 +0,0 @@ -"""High-Resolution Representations for Semantic Segmentation""" -import torch -import torch.nn as nn -import torch.nn.functional as F - -class HRNet(nn.Module): - """HRNet - - Parameters - ---------- - nclass : int - Number of categories for the training dataset. - backbone : string - Pre-trained dilated backbone network type (default:'resnet50'; 'resnet50', - 'resnet101' or 'resnet152'). - norm_layer : object - Normalization layer used in backbone network (default: :class:`nn.BatchNorm`; - for Synchronized Cross-GPU BachNormalization). - aux : bool - Auxiliary loss. - Reference: - Ke Sun. "High-Resolution Representations for Labeling Pixels and Regions." - arXiv preprint arXiv:1904.04514 (2019). - """ - def __init__(self, nclass, backbone='', aux=False, pretrained_base=False, **kwargs): - super(HRNet, self).__init__() - - def forward(self, x): - pass \ No newline at end of file diff --git a/core/models/icnet.py b/core/models/icnet.py deleted file mode 100644 index 94d03444f..000000000 --- a/core/models/icnet.py +++ /dev/null @@ -1,163 +0,0 @@ -"""Image Cascade Network""" -import torch -import torch.nn as nn -import torch.nn.functional as F - -from .segbase import SegBaseModel - -__all__ = ['ICNet', 'get_icnet', 'get_icnet_resnet50_citys', - 'get_icnet_resnet101_citys', 'get_icnet_resnet152_citys'] - - -class ICNet(SegBaseModel): - """Image Cascade Network""" - - def __init__(self, nclass, backbone='resnet50', aux=False, jpu=False, pretrained_base=True, **kwargs): - super(ICNet, self).__init__(nclass, aux, backbone, pretrained_base=pretrained_base, **kwargs) - self.conv_sub1 = nn.Sequential( - _ConvBNReLU(3, 32, 3, 2, **kwargs), - _ConvBNReLU(32, 32, 3, 2, **kwargs), - _ConvBNReLU(32, 64, 3, 2, **kwargs) - ) - - self.ppm = PyramidPoolingModule() - - self.head = _ICHead(nclass, **kwargs) - - self.__setattr__('exclusive', ['conv_sub1', 'head']) - - def forward(self, x): - # sub 1 - x_sub1 = self.conv_sub1(x) - - # sub 2 - x_sub2 = F.interpolate(x, scale_factor=0.5, mode='bilinear', align_corners=True) - _, x_sub2, _, _ = self.base_forward(x_sub2) - - # sub 4 - x_sub4 = F.interpolate(x, scale_factor=0.25, mode='bilinear', align_corners=True) - _, _, _, x_sub4 = self.base_forward(x_sub4) - # add PyramidPoolingModule - x_sub4 = self.ppm(x_sub4) - outputs = self.head(x_sub1, x_sub2, x_sub4) - - return tuple(outputs) - -class PyramidPoolingModule(nn.Module): - def __init__(self, pyramids=[1,2,3,6]): - super(PyramidPoolingModule, self).__init__() - self.pyramids = pyramids - - def forward(self, input): - feat = input - height, width = input.shape[2:] - for bin_size in self.pyramids: - x = F.adaptive_avg_pool2d(input, output_size=bin_size) - x = F.interpolate(x, size=(height, width), mode='bilinear', align_corners=True) - feat = feat + x - return feat - -class _ICHead(nn.Module): - def __init__(self, nclass, norm_layer=nn.BatchNorm2d, **kwargs): - super(_ICHead, self).__init__() - #self.cff_12 = CascadeFeatureFusion(512, 64, 128, nclass, norm_layer, **kwargs) - self.cff_12 = CascadeFeatureFusion(128, 64, 128, nclass, norm_layer, **kwargs) - self.cff_24 = CascadeFeatureFusion(2048, 512, 128, nclass, norm_layer, **kwargs) - - self.conv_cls = nn.Conv2d(128, nclass, 1, bias=False) - - def forward(self, x_sub1, x_sub2, x_sub4): - outputs = list() - x_cff_24, x_24_cls = self.cff_24(x_sub4, x_sub2) - outputs.append(x_24_cls) - #x_cff_12, x_12_cls = self.cff_12(x_sub2, x_sub1) - x_cff_12, x_12_cls = self.cff_12(x_cff_24, x_sub1) - outputs.append(x_12_cls) - - up_x2 = F.interpolate(x_cff_12, scale_factor=2, mode='bilinear', align_corners=True) - up_x2 = self.conv_cls(up_x2) - outputs.append(up_x2) - up_x8 = F.interpolate(up_x2, scale_factor=4, mode='bilinear', align_corners=True) - outputs.append(up_x8) - # 1 -> 1/4 -> 1/8 -> 1/16 - outputs.reverse() - - return outputs - - -class _ConvBNReLU(nn.Module): - def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, padding=1, dilation=1, - groups=1, norm_layer=nn.BatchNorm2d, bias=False, **kwargs): - super(_ConvBNReLU, self).__init__() - self.conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding, dilation, groups, bias) - self.bn = norm_layer(out_channels) - self.relu = nn.ReLU(True) - - def forward(self, x): - x = self.conv(x) - x = self.bn(x) - x = self.relu(x) - return x - - -class CascadeFeatureFusion(nn.Module): - """CFF Unit""" - - def __init__(self, low_channels, high_channels, out_channels, nclass, norm_layer=nn.BatchNorm2d, **kwargs): - super(CascadeFeatureFusion, self).__init__() - self.conv_low = nn.Sequential( - nn.Conv2d(low_channels, out_channels, 3, padding=2, dilation=2, bias=False), - norm_layer(out_channels) - ) - self.conv_high = nn.Sequential( - nn.Conv2d(high_channels, out_channels, 1, bias=False), - norm_layer(out_channels) - ) - self.conv_low_cls = nn.Conv2d(out_channels, nclass, 1, bias=False) - - def forward(self, x_low, x_high): - x_low = F.interpolate(x_low, size=x_high.size()[2:], mode='bilinear', align_corners=True) - x_low = self.conv_low(x_low) - x_high = self.conv_high(x_high) - x = x_low + x_high - x = F.relu(x, inplace=True) - x_low_cls = self.conv_low_cls(x_low) - - return x, x_low_cls - - -def get_icnet(dataset='citys', backbone='resnet50', pretrained=False, root='~/.torch/models', - pretrained_base=True, **kwargs): - acronyms = { - 'pascal_voc': 'pascal_voc', - 'pascal_aug': 'pascal_aug', - 'ade20k': 'ade', - 'coco': 'coco', - 'citys': 'citys', - } - from ..data.dataloader import datasets - model = ICNet(datasets[dataset].NUM_CLASS, backbone=backbone, pretrained_base=pretrained_base, **kwargs) - if pretrained: - from .model_store import get_model_file - device = torch.device(kwargs['local_rank']) - model.load_state_dict(torch.load(get_model_file('icnet_%s_%s' % (backbone, acronyms[dataset]), root=root), - map_location=device)) - return model - - -def get_icnet_resnet50_citys(**kwargs): - return get_icnet('citys', 'resnet50', **kwargs) - - -def get_icnet_resnet101_citys(**kwargs): - return get_icnet('citys', 'resnet101', **kwargs) - - -def get_icnet_resnet152_citys(**kwargs): - return get_icnet('citys', 'resnet152', **kwargs) - - -if __name__ == '__main__': - img = torch.randn(1, 3, 256, 256) - model = get_icnet_resnet50_citys() - outputs = model(img) diff --git a/core/models/lednet.py b/core/models/lednet.py deleted file mode 100644 index 5a6e6e5b6..000000000 --- a/core/models/lednet.py +++ /dev/null @@ -1,194 +0,0 @@ -"""LEDNet: A Lightweight Encoder-Decoder Network for Real-time Semantic Segmentation""" -import torch -import torch.nn as nn -import torch.nn.functional as F - -from core.nn import _ConvBNReLU - -__all__ = ['LEDNet', 'get_lednet', 'get_lednet_citys'] - -class LEDNet(nn.Module): - r"""LEDNet - - Parameters - ---------- - nclass : int - Number of categories for the training dataset. - backbone : string - Pre-trained dilated backbone network type (default:'resnet50'; 'resnet50', - 'resnet101' or 'resnet152'). - norm_layer : object - Normalization layer used in backbone network (default: :class:`nn.BatchNorm`; - for Synchronized Cross-GPU BachNormalization). - aux : bool - Auxiliary loss. - - Reference: - Yu Wang, et al. "LEDNet: A Lightweight Encoder-Decoder Network for Real-Time Semantic Segmentation." - arXiv preprint arXiv:1905.02423 (2019). - """ - - def __init__(self, nclass, backbone='', aux=False, jpu=False, pretrained_base=True, **kwargs): - super(LEDNet, self).__init__() - self.encoder = nn.Sequential( - Downsampling(3, 32), - SSnbt(32, **kwargs), SSnbt(32, **kwargs), SSnbt(32, **kwargs), - Downsampling(32, 64), - SSnbt(64, **kwargs), SSnbt(64, **kwargs), - Downsampling(64, 128), - SSnbt(128, **kwargs), - SSnbt(128, 2, **kwargs), - SSnbt(128, 5, **kwargs), - SSnbt(128, 9, **kwargs), - SSnbt(128, 2, **kwargs), - SSnbt(128, 5, **kwargs), - SSnbt(128, 9, **kwargs), - SSnbt(128, 17, **kwargs), - ) - self.decoder = APNModule(128, nclass) - - self.__setattr__('exclusive', ['encoder', 'decoder']) - - def forward(self, x): - size = x.size()[2:] - x = self.encoder(x) - x = self.decoder(x) - outputs = list() - x = F.interpolate(x, size, mode='bilinear', align_corners=True) - outputs.append(x) - - return tuple(outputs) - - -class Downsampling(nn.Module): - def __init__(self, in_channels, out_channels, **kwargs): - super(Downsampling, self).__init__() - self.conv1 = nn.Conv2d(in_channels, out_channels // 2, 3, 2, 2, bias=False) - self.conv2 = nn.Conv2d(in_channels, out_channels // 2, 3, 2, 2, bias=False) - self.pool = nn.MaxPool2d(kernel_size=2, stride=1) - - def forward(self, x): - x1 = self.conv1(x) - x1 = self.pool(x1) - - x2 = self.conv2(x) - x2 = self.pool(x2) - - return torch.cat([x1, x2], dim=1) - - -class SSnbt(nn.Module): - def __init__(self, in_channels, dilation=1, norm_layer=nn.BatchNorm2d, **kwargs): - super(SSnbt, self).__init__() - inter_channels = in_channels // 2 - self.branch1 = nn.Sequential( - nn.Conv2d(inter_channels, inter_channels, (3, 1), padding=(1, 0), bias=False), - nn.ReLU(True), - nn.Conv2d(inter_channels, inter_channels, (1, 3), padding=(0, 1), bias=False), - norm_layer(inter_channels), - nn.ReLU(True), - nn.Conv2d(inter_channels, inter_channels, (3, 1), padding=(dilation, 0), dilation=(dilation, 1), - bias=False), - nn.ReLU(True), - nn.Conv2d(inter_channels, inter_channels, (1, 3), padding=(0, dilation), dilation=(1, dilation), - bias=False), - norm_layer(inter_channels), - nn.ReLU(True)) - - self.branch2 = nn.Sequential( - nn.Conv2d(inter_channels, inter_channels, (1, 3), padding=(0, 1), bias=False), - nn.ReLU(True), - nn.Conv2d(inter_channels, inter_channels, (3, 1), padding=(1, 0), bias=False), - norm_layer(inter_channels), - nn.ReLU(True), - nn.Conv2d(inter_channels, inter_channels, (1, 3), padding=(0, dilation), dilation=(1, dilation), - bias=False), - nn.ReLU(True), - nn.Conv2d(inter_channels, inter_channels, (3, 1), padding=(dilation, 0), dilation=(dilation, 1), - bias=False), - norm_layer(inter_channels), - nn.ReLU(True)) - - self.relu = nn.ReLU(True) - - @staticmethod - def channel_shuffle(x, groups): - n, c, h, w = x.size() - - channels_per_group = c // groups - x = x.view(n, groups, channels_per_group, h, w) - x = torch.transpose(x, 1, 2).contiguous() - x = x.view(n, -1, h, w) - - return x - - def forward(self, x): - # channels split - x1, x2 = x.split(x.size(1) // 2, 1) - - x1 = self.branch1(x1) - x2 = self.branch2(x2) - - out = torch.cat([x1, x2], dim=1) - out = self.relu(out + x) - out = self.channel_shuffle(out, groups=2) - - return out - - -class APNModule(nn.Module): - def __init__(self, in_channels, nclass, norm_layer=nn.BatchNorm2d, **kwargs): - super(APNModule, self).__init__() - self.conv1 = _ConvBNReLU(in_channels, in_channels, 3, 2, 1, norm_layer=norm_layer) - self.conv2 = _ConvBNReLU(in_channels, in_channels, 5, 2, 2, norm_layer=norm_layer) - self.conv3 = _ConvBNReLU(in_channels, in_channels, 7, 2, 3, norm_layer=norm_layer) - self.level1 = _ConvBNReLU(in_channels, nclass, 1, norm_layer=norm_layer) - self.level2 = _ConvBNReLU(in_channels, nclass, 1, norm_layer=norm_layer) - self.level3 = _ConvBNReLU(in_channels, nclass, 1, norm_layer=norm_layer) - self.level4 = _ConvBNReLU(in_channels, nclass, 1, norm_layer=norm_layer) - self.level5 = nn.Sequential( - nn.AdaptiveAvgPool2d(1), - _ConvBNReLU(in_channels, nclass, 1)) - - def forward(self, x): - w, h = x.size()[2:] - branch3 = self.conv1(x) - branch2 = self.conv2(branch3) - branch1 = self.conv3(branch2) - - out = self.level1(branch1) - out = F.interpolate(out, ((w + 3) // 4, (h + 3) // 4), mode='bilinear', align_corners=True) - out = self.level2(branch2) + out - out = F.interpolate(out, ((w + 1) // 2, (h + 1) // 2), mode='bilinear', align_corners=True) - out = self.level3(branch3) + out - out = F.interpolate(out, (w, h), mode='bilinear', align_corners=True) - out = self.level4(x) * out - out = self.level5(x) + out - return out - - -def get_lednet(dataset='citys', backbone='', pretrained=False, root='~/.torch/models', - pretrained_base=True, **kwargs): - acronyms = { - 'pascal_voc': 'pascal_voc', - 'pascal_aug': 'pascal_aug', - 'ade20k': 'ade', - 'coco': 'coco', - 'citys': 'citys', - } - from ..data.dataloader import datasets - model = LEDNet(datasets[dataset].NUM_CLASS, backbone=backbone, pretrained_base=pretrained_base, **kwargs) - if pretrained: - from .model_store import get_model_file - device = torch.device(kwargs['local_rank']) - model.load_state_dict(torch.load(get_model_file('lednet_%s' % (acronyms[dataset]), root=root), - map_location=device)) - return model - - -def get_lednet_citys(**kwargs): - return get_lednet('citys', **kwargs) - - -if __name__ == '__main__': - model = get_lednet_citys() diff --git a/core/models/ocnet.py b/core/models/ocnet.py deleted file mode 100755 index 333294fd5..000000000 --- a/core/models/ocnet.py +++ /dev/null @@ -1,345 +0,0 @@ -""" Object Context Network for Scene Parsing""" -import torch -import torch.nn as nn -import torch.nn.functional as F - -from .segbase import SegBaseModel -from .fcn import _FCNHead - -__all__ = ['OCNet', 'get_ocnet', 'get_base_ocnet_resnet101_citys', - 'get_pyramid_ocnet_resnet101_citys', 'get_asp_ocnet_resnet101_citys'] - - -class OCNet(SegBaseModel): - r"""OCNet - - Parameters - ---------- - nclass : int - Number of categories for the training dataset. - backbone : string - Pre-trained dilated backbone network type (default:'resnet50'; 'resnet50', - 'resnet101' or 'resnet152'). - norm_layer : object - Normalization layer used in backbone network (default: :class:`nn.BatchNorm`; - for Synchronized Cross-GPU BachNormalization). - aux : bool - Auxiliary loss. - Reference: - Yuhui Yuan, Jingdong Wang. "OCNet: Object Context Network for Scene Parsing." - arXiv preprint arXiv:1809.00916 (2018). - """ - - def __init__(self, nclass, backbone='resnet101', oc_arch='base', aux=False, pretrained_base=True, **kwargs): - super(OCNet, self).__init__(nclass, aux, backbone, pretrained_base=pretrained_base, **kwargs) - self.head = _OCHead(nclass, oc_arch, **kwargs) - if self.aux: - self.auxlayer = _FCNHead(1024, nclass, **kwargs) - - self.__setattr__('exclusive', ['head', 'auxlayer'] if aux else ['head']) - - def forward(self, x): - size = x.size()[2:] - _, _, c3, c4 = self.base_forward(x) - outputs = [] - x = self.head(c4) - x = F.interpolate(x, size, mode='bilinear', align_corners=True) - outputs.append(x) - - if self.aux: - auxout = self.auxlayer(c3) - auxout = F.interpolate(auxout, size, mode='bilinear', align_corners=True) - outputs.append(auxout) - return tuple(outputs) - - -class _OCHead(nn.Module): - def __init__(self, nclass, oc_arch, norm_layer=nn.BatchNorm2d, **kwargs): - super(_OCHead, self).__init__() - if oc_arch == 'base': - self.context = nn.Sequential( - nn.Conv2d(2048, 512, 3, 1, padding=1, bias=False), - norm_layer(512), - nn.ReLU(True), - BaseOCModule(512, 512, 256, 256, scales=([1]), norm_layer=norm_layer, **kwargs)) - elif oc_arch == 'pyramid': - self.context = nn.Sequential( - nn.Conv2d(2048, 512, 3, 1, padding=1, bias=False), - norm_layer(512), - nn.ReLU(True), - PyramidOCModule(512, 512, 256, 512, scales=([1, 2, 3, 6]), norm_layer=norm_layer, **kwargs)) - elif oc_arch == 'asp': - self.context = ASPOCModule(2048, 512, 256, 512, norm_layer=norm_layer, **kwargs) - else: - raise ValueError("Unknown OC architecture!") - - self.out = nn.Conv2d(512, nclass, 1) - - def forward(self, x): - x = self.context(x) - return self.out(x) - - -class BaseAttentionBlock(nn.Module): - """The basic implementation for self-attention block/non-local block.""" - - def __init__(self, in_channels, out_channels, key_channels, value_channels, - scale=1, norm_layer=nn.BatchNorm2d, **kwargs): - super(BaseAttentionBlock, self).__init__() - self.scale = scale - self.key_channels = key_channels - self.value_channels = value_channels - if scale > 1: - self.pool = nn.MaxPool2d(scale) - - self.f_value = nn.Conv2d(in_channels, value_channels, 1) - self.f_key = nn.Sequential( - nn.Conv2d(in_channels, key_channels, 1), - norm_layer(key_channels), - nn.ReLU(True) - ) - self.f_query = self.f_key - self.W = nn.Conv2d(value_channels, out_channels, 1) - nn.init.constant_(self.W.weight, 0) - nn.init.constant_(self.W.bias, 0) - - def forward(self, x): - batch_size, c, w, h = x.size() - if self.scale > 1: - x = self.pool(x) - - value = self.f_value(x).view(batch_size, self.value_channels, -1).permute(0, 2, 1) - query = self.f_query(x).view(batch_size, self.key_channels, -1).permute(0, 2, 1) - key = self.f_key(x).view(batch_size, self.key_channels, -1) - - sim_map = torch.bmm(query, key) * (self.key_channels ** -.5) - sim_map = F.softmax(sim_map, dim=-1) - - context = torch.bmm(sim_map, value).permute(0, 2, 1).contiguous() - context = context.view(batch_size, self.value_channels, *x.size()[2:]) - context = self.W(context) - if self.scale > 1: - context = F.interpolate(context, size=(w, h), mode='bilinear', align_corners=True) - - return context - - -class BaseOCModule(nn.Module): - """Base-OC""" - - def __init__(self, in_channels, out_channels, key_channels, value_channels, - scales=([1]), norm_layer=nn.BatchNorm2d, concat=True, **kwargs): - super(BaseOCModule, self).__init__() - self.stages = nn.ModuleList([ - BaseAttentionBlock(in_channels, out_channels, key_channels, value_channels, scale, norm_layer, **kwargs) - for scale in scales]) - in_channels = in_channels * 2 if concat else in_channels - self.project = nn.Sequential( - nn.Conv2d(in_channels, out_channels, 1), - norm_layer(out_channels), - nn.ReLU(True), - nn.Dropout2d(0.05) - ) - self.concat = concat - - def forward(self, x): - priors = [stage(x) for stage in self.stages] - context = priors[0] - for i in range(1, len(priors)): - context += priors[i] - if self.concat: - context = torch.cat([context, x], 1) - out = self.project(context) - return out - - -class PyramidAttentionBlock(nn.Module): - """The basic implementation for pyramid self-attention block/non-local block""" - - def __init__(self, in_channels, out_channels, key_channels, value_channels, - scale=1, norm_layer=nn.BatchNorm2d, **kwargs): - super(PyramidAttentionBlock, self).__init__() - self.scale = scale - self.value_channels = value_channels - self.key_channels = key_channels - - self.f_value = nn.Conv2d(in_channels, value_channels, 1) - self.f_key = nn.Sequential( - nn.Conv2d(in_channels, key_channels, 1), - norm_layer(key_channels), - nn.ReLU(True) - ) - self.f_query = self.f_key - self.W = nn.Conv2d(value_channels, out_channels, 1) - nn.init.constant_(self.W.weight, 0) - nn.init.constant_(self.W.bias, 0) - - def forward(self, x): - batch_size, c, w, h = x.size() - - local_x = list() - local_y = list() - step_w, step_h = w // self.scale, h // self.scale - for i in range(self.scale): - for j in range(self.scale): - start_x, start_y = step_w * i, step_h * j - end_x, end_y = min(start_x + step_w, w), min(start_y + step_h, h) - if i == (self.scale - 1): - end_x = w - if j == (self.scale - 1): - end_y = h - local_x += [start_x, end_x] - local_y += [start_y, end_y] - - value = self.f_value(x) - query = self.f_query(x) - key = self.f_key(x) - - local_list = list() - local_block_cnt = (self.scale ** 2) * 2 - for i in range(0, local_block_cnt, 2): - value_local = value[:, :, local_x[i]:local_x[i + 1], local_y[i]:local_y[i + 1]] - query_local = query[:, :, local_x[i]:local_x[i + 1], local_y[i]:local_y[i + 1]] - key_local = key[:, :, local_x[i]:local_x[i + 1], local_y[i]:local_y[i + 1]] - - w_local, h_local = value_local.size(2), value_local.size(3) - value_local = value_local.contiguous().view(batch_size, self.value_channels, -1).permute(0, 2, 1) - query_local = query_local.contiguous().view(batch_size, self.key_channels, -1).permute(0, 2, 1) - key_local = key_local.contiguous().view(batch_size, self.key_channels, -1) - - sim_map = torch.bmm(query_local, key_local) * (self.key_channels ** -.5) - sim_map = F.softmax(sim_map, dim=-1) - - context_local = torch.bmm(sim_map, value_local).permute(0, 2, 1).contiguous() - context_local = context_local.view(batch_size, self.value_channels, w_local, h_local) - local_list.append(context_local) - - context_list = list() - for i in range(0, self.scale): - row_tmp = list() - for j in range(self.scale): - row_tmp.append(local_list[j + i * self.scale]) - context_list.append(torch.cat(row_tmp, 3)) - - context = torch.cat(context_list, 2) - context = self.W(context) - - return context - - -class PyramidOCModule(nn.Module): - """Pyramid-OC""" - - def __init__(self, in_channels, out_channels, key_channels, value_channels, - scales=([1]), norm_layer=nn.BatchNorm2d, **kwargs): - super(PyramidOCModule, self).__init__() - self.stages = nn.ModuleList([ - PyramidAttentionBlock(in_channels, out_channels, key_channels, value_channels, scale, norm_layer, **kwargs) - for scale in scales]) - self.up_dr = nn.Sequential( - nn.Conv2d(in_channels, in_channels * len(scales), 1), - norm_layer(in_channels * len(scales)), - nn.ReLU(True) - ) - self.project = nn.Sequential( - nn.Conv2d(in_channels * len(scales) * 2, out_channels, 1), - norm_layer(out_channels), - nn.ReLU(True), - nn.Dropout2d(0.05) - ) - - def forward(self, x): - priors = [stage(x) for stage in self.stages] - context = [self.up_dr(x)] - for i in range(len(priors)): - context += [priors[i]] - context = torch.cat(context, 1) - out = self.project(context) - return out - - -class ASPOCModule(nn.Module): - """ASP-OC""" - - def __init__(self, in_channels, out_channels, key_channels, value_channels, - atrous_rates=(12, 24, 36), norm_layer=nn.BatchNorm2d, **kwargs): - super(ASPOCModule, self).__init__() - self.context = nn.Sequential( - nn.Conv2d(in_channels, out_channels, 3, padding=1), - norm_layer(out_channels), - nn.ReLU(True), - BaseOCModule(out_channels, out_channels, key_channels, value_channels, ([2]), norm_layer, False, **kwargs)) - - rate1, rate2, rate3 = tuple(atrous_rates) - self.b1 = nn.Sequential( - nn.Conv2d(in_channels, out_channels, 3, padding=rate1, dilation=rate1, bias=False), - norm_layer(out_channels), - nn.ReLU(True)) - self.b2 = nn.Sequential( - nn.Conv2d(in_channels, out_channels, 3, padding=rate2, dilation=rate2, bias=False), - norm_layer(out_channels), - nn.ReLU(True)) - self.b3 = nn.Sequential( - nn.Conv2d(in_channels, out_channels, 3, padding=rate3, dilation=rate3, bias=False), - norm_layer(out_channels), - nn.ReLU(True)) - self.b4 = nn.Sequential( - nn.Conv2d(in_channels, out_channels, 1, bias=False), - norm_layer(out_channels), - nn.ReLU(True)) - - self.project = nn.Sequential( - nn.Conv2d(out_channels * 5, out_channels, 1, bias=False), - norm_layer(out_channels), - nn.ReLU(True), - nn.Dropout2d(0.1) - ) - - def forward(self, x): - feat1 = self.context(x) - feat2 = self.b1(x) - feat3 = self.b2(x) - feat4 = self.b3(x) - feat5 = self.b4(x) - out = torch.cat((feat1, feat2, feat3, feat4, feat5), dim=1) - out = self.project(out) - return out - - -def get_ocnet(dataset='citys', backbone='resnet50', oc_arch='base', pretrained=False, root='~/.torch/models', - pretrained_base=True, **kwargs): - acronyms = { - 'pascal_voc': 'pascal_voc', - 'pascal_aug': 'pascal_aug', - 'ade20k': 'ade', - 'coco': 'coco', - 'citys': 'citys', - } - from ..data.dataloader import datasets - model = OCNet(datasets[dataset].NUM_CLASS, backbone=backbone, oc_arch=oc_arch, - pretrained_base=pretrained_base, **kwargs) - if pretrained: - from .model_store import get_model_file - device = torch.device(kwargs['local_rank']) - model.load_state_dict(torch.load(get_model_file('%s_ocnet_%s_%s' % ( - oc_arch, backbone, acronyms[dataset]), root=root), - map_location=device)) - return model - - -def get_base_ocnet_resnet101_citys(**kwargs): - return get_ocnet('citys', 'resnet101', 'base', **kwargs) - - -def get_pyramid_ocnet_resnet101_citys(**kwargs): - return get_ocnet('citys', 'resnet101', 'pyramid', **kwargs) - - -def get_asp_ocnet_resnet101_citys(**kwargs): - return get_ocnet('citys', 'resnet101', 'asp', **kwargs) - - -if __name__ == '__main__': - img = torch.randn(1, 3, 256, 256) - model = get_asp_ocnet_resnet101_citys() - outputs = model(img) diff --git a/core/models/psanet.py b/core/models/psanet.py deleted file mode 100644 index c98ad4674..000000000 --- a/core/models/psanet.py +++ /dev/null @@ -1,162 +0,0 @@ -"""Point-wise Spatial Attention Network""" -import torch -import torch.nn as nn -import torch.nn.functional as F - -from core.nn import _ConvBNReLU -from core.models.segbase import SegBaseModel -from core.models.fcn import _FCNHead - -__all__ = ['PSANet', 'get_psanet', 'get_psanet_resnet50_voc', 'get_psanet_resnet101_voc', - 'get_psanet_resnet152_voc', 'get_psanet_resnet50_citys', 'get_psanet_resnet101_citys', - 'get_psanet_resnet152_citys'] - - -class PSANet(SegBaseModel): - r"""PSANet - - Parameters - ---------- - nclass : int - Number of categories for the training dataset. - backbone : string - Pre-trained dilated backbone network type (default:'resnet50'; 'resnet50', - 'resnet101' or 'resnet152'). - norm_layer : object - Normalization layer used in backbone network (default: :class:`nn.BatchNorm`; - for Synchronized Cross-GPU BachNormalization). - aux : bool - Auxiliary loss. - - Reference: - Hengshuang Zhao, et al. "PSANet: Point-wise Spatial Attention Network for Scene Parsing." - ECCV-2018. - """ - - def __init__(self, nclass, backbone='resnet', aux=False, pretrained_base=True, **kwargs): - super(PSANet, self).__init__(nclass, aux, backbone, pretrained_base=pretrained_base, **kwargs) - self.head = _PSAHead(nclass, **kwargs) - if aux: - self.auxlayer = _FCNHead(1024, nclass, **kwargs) - - self.__setattr__('exclusive', ['head', 'auxlayer'] if aux else ['head']) - - def forward(self, x): - size = x.size()[2:] - _, _, c3, c4 = self.base_forward(x) - outputs = list() - x = self.head(c4) - x = F.interpolate(x, size, mode='bilinear', align_corners=True) - outputs.append(x) - - if self.aux: - auxout = self.auxlayer(c3) - auxout = F.interpolate(auxout, size, mode='bilinear', align_corners=True) - outputs.append(auxout) - return tuple(outputs) - - -class _PSAHead(nn.Module): - def __init__(self, nclass, norm_layer=nn.BatchNorm2d, **kwargs): - super(_PSAHead, self).__init__() - # psa_out_channels = crop_size // 8 ** 2 - self.psa = _PointwiseSpatialAttention(2048, 3600, norm_layer) - - self.conv_post = _ConvBNReLU(1024, 2048, 1, norm_layer=norm_layer) - self.project = nn.Sequential( - _ConvBNReLU(4096, 512, 3, padding=1, norm_layer=norm_layer), - nn.Dropout2d(0.1, False), - nn.Conv2d(512, nclass, 1)) - - def forward(self, x): - global_feature = self.psa(x) - out = self.conv_post(global_feature) - out = torch.cat([x, out], dim=1) - out = self.project(out) - - return out - - -class _PointwiseSpatialAttention(nn.Module): - def __init__(self, in_channels, out_channels, norm_layer=nn.BatchNorm2d, **kwargs): - super(_PointwiseSpatialAttention, self).__init__() - reduced_channels = 512 - self.collect_attention = _AttentionGeneration(in_channels, reduced_channels, out_channels, norm_layer) - self.distribute_attention = _AttentionGeneration(in_channels, reduced_channels, out_channels, norm_layer) - - def forward(self, x): - collect_fm = self.collect_attention(x) - distribute_fm = self.distribute_attention(x) - psa_fm = torch.cat([collect_fm, distribute_fm], dim=1) - return psa_fm - - -class _AttentionGeneration(nn.Module): - def __init__(self, in_channels, reduced_channels, out_channels, norm_layer, **kwargs): - super(_AttentionGeneration, self).__init__() - self.conv_reduce = _ConvBNReLU(in_channels, reduced_channels, 1, norm_layer=norm_layer) - self.attention = nn.Sequential( - _ConvBNReLU(reduced_channels, reduced_channels, 1, norm_layer=norm_layer), - nn.Conv2d(reduced_channels, out_channels, 1, bias=False)) - - self.reduced_channels = reduced_channels - - def forward(self, x): - reduce_x = self.conv_reduce(x) - attention = self.attention(reduce_x) - n, c, h, w = attention.size() - attention = attention.view(n, c, -1) - reduce_x = reduce_x.view(n, self.reduced_channels, -1) - fm = torch.bmm(reduce_x, torch.softmax(attention, dim=1)) - fm = fm.view(n, self.reduced_channels, h, w) - - return fm - - -def get_psanet(dataset='pascal_voc', backbone='resnet50', pretrained=False, root='~/.torch/models', - pretrained_base=True, **kwargs): - acronyms = { - 'pascal_voc': 'pascal_voc', - 'pascal_aug': 'pascal_aug', - 'ade20k': 'ade', - 'coco': 'coco', - 'citys': 'citys', - } - from core.data.dataloader import datasets - model = PSANet(datasets[dataset].NUM_CLASS, backbone=backbone, pretrained_base=pretrained_base, **kwargs) - if pretrained: - from .model_store import get_model_file - device = torch.device(kwargs['local_rank']) - model.load_state_dict(torch.load(get_model_file('deeplabv3_%s_%s' % (backbone, acronyms[dataset]), root=root), - map_location=device)) - return model - - -def get_psanet_resnet50_voc(**kwargs): - return get_psanet('pascal_voc', 'resnet50', **kwargs) - - -def get_psanet_resnet101_voc(**kwargs): - return get_psanet('pascal_voc', 'resnet101', **kwargs) - - -def get_psanet_resnet152_voc(**kwargs): - return get_psanet('pascal_voc', 'resnet152', **kwargs) - - -def get_psanet_resnet50_citys(**kwargs): - return get_psanet('citys', 'resnet50', **kwargs) - - -def get_psanet_resnet101_citys(**kwargs): - return get_psanet('citys', 'resnet101', **kwargs) - - -def get_psanet_resnet152_citys(**kwargs): - return get_psanet('citys', 'resnet152', **kwargs) - - -if __name__ == '__main__': - model = get_psanet_resnet50_voc() - img = torch.randn(1, 3, 480, 480) - output = model(img) diff --git a/core/models/pspnet.py b/core/models/pspnet.py deleted file mode 100644 index efeae6135..000000000 --- a/core/models/pspnet.py +++ /dev/null @@ -1,168 +0,0 @@ -"""Pyramid Scene Parsing Network""" -import torch -import torch.nn as nn -import torch.nn.functional as F - -from .segbase import SegBaseModel -from .fcn import _FCNHead - -__all__ = ['PSPNet', 'get_psp', 'get_psp_resnet50_voc', 'get_psp_resnet50_ade', 'get_psp_resnet101_voc', - 'get_psp_resnet101_ade', 'get_psp_resnet101_citys', 'get_psp_resnet101_coco'] - - -class PSPNet(SegBaseModel): - r"""Pyramid Scene Parsing Network - - Parameters - ---------- - nclass : int - Number of categories for the training dataset. - backbone : string - Pre-trained dilated backbone network type (default:'resnet50'; 'resnet50', - 'resnet101' or 'resnet152'). - norm_layer : object - Normalization layer used in backbone network (default: :class:`nn.BatchNorm`; - for Synchronized Cross-GPU BachNormalization). - aux : bool - Auxiliary loss. - - Reference: - Zhao, Hengshuang, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. - "Pyramid scene parsing network." *CVPR*, 2017 - """ - - def __init__(self, nclass, backbone='resnet50', aux=False, pretrained_base=True, **kwargs): - super(PSPNet, self).__init__(nclass, aux, backbone, pretrained_base=pretrained_base, **kwargs) - self.head = _PSPHead(nclass, **kwargs) - if self.aux: - self.auxlayer = _FCNHead(1024, nclass, **kwargs) - - self.__setattr__('exclusive', ['head', 'auxlayer'] if aux else ['head']) - - def forward(self, x): - size = x.size()[2:] - _, _, c3, c4 = self.base_forward(x) - outputs = [] - x = self.head(c4) - x = F.interpolate(x, size, mode='bilinear', align_corners=True) - outputs.append(x) - - if self.aux: - auxout = self.auxlayer(c3) - auxout = F.interpolate(auxout, size, mode='bilinear', align_corners=True) - outputs.append(auxout) - return tuple(outputs) - - -def _PSP1x1Conv(in_channels, out_channels, norm_layer, norm_kwargs): - return nn.Sequential( - nn.Conv2d(in_channels, out_channels, 1, bias=False), - norm_layer(out_channels, **({} if norm_kwargs is None else norm_kwargs)), - nn.ReLU(True) - ) - - -class _PyramidPooling(nn.Module): - def __init__(self, in_channels, **kwargs): - super(_PyramidPooling, self).__init__() - out_channels = int(in_channels / 4) - self.avgpool1 = nn.AdaptiveAvgPool2d(1) - self.avgpool2 = nn.AdaptiveAvgPool2d(2) - self.avgpool3 = nn.AdaptiveAvgPool2d(3) - self.avgpool4 = nn.AdaptiveAvgPool2d(6) - self.conv1 = _PSP1x1Conv(in_channels, out_channels, **kwargs) - self.conv2 = _PSP1x1Conv(in_channels, out_channels, **kwargs) - self.conv3 = _PSP1x1Conv(in_channels, out_channels, **kwargs) - self.conv4 = _PSP1x1Conv(in_channels, out_channels, **kwargs) - - def forward(self, x): - size = x.size()[2:] - feat1 = F.interpolate(self.conv1(self.avgpool1(x)), size, mode='bilinear', align_corners=True) - feat2 = F.interpolate(self.conv2(self.avgpool2(x)), size, mode='bilinear', align_corners=True) - feat3 = F.interpolate(self.conv3(self.avgpool3(x)), size, mode='bilinear', align_corners=True) - feat4 = F.interpolate(self.conv4(self.avgpool4(x)), size, mode='bilinear', align_corners=True) - return torch.cat([x, feat1, feat2, feat3, feat4], dim=1) - - -class _PSPHead(nn.Module): - def __init__(self, nclass, norm_layer=nn.BatchNorm2d, norm_kwargs=None, **kwargs): - super(_PSPHead, self).__init__() - self.psp = _PyramidPooling(2048, norm_layer=norm_layer, norm_kwargs=norm_kwargs) - self.block = nn.Sequential( - nn.Conv2d(4096, 512, 3, padding=1, bias=False), - norm_layer(512, **({} if norm_kwargs is None else norm_kwargs)), - nn.ReLU(True), - nn.Dropout(0.1), - nn.Conv2d(512, nclass, 1) - ) - - def forward(self, x): - x = self.psp(x) - return self.block(x) - - -def get_psp(dataset='pascal_voc', backbone='resnet50', pretrained=False, root='~/.torch/models', - pretrained_base=True, **kwargs): - r"""Pyramid Scene Parsing Network - - Parameters - ---------- - dataset : str, default pascal_voc - The dataset that model pretrained on. (pascal_voc, ade20k) - pretrained : bool or str - Boolean value controls whether to load the default pretrained weights for model. - String value represents the hashtag for a certain version of pretrained weights. - root : str, default '~/.torch/models' - Location for keeping the model parameters. - pretrained_base : bool or str, default True - This will load pretrained backbone network, that was trained on ImageNet. - Examples - -------- - >>> model = get_psp(dataset='pascal_voc', backbone='resnet50', pretrained=False) - >>> print(model) - """ - acronyms = { - 'pascal_voc': 'pascal_voc', - 'pascal_aug': 'pascal_aug', - 'ade20k': 'ade', - 'coco': 'coco', - 'citys': 'citys', - } - from ..data.dataloader import datasets - model = PSPNet(datasets[dataset].NUM_CLASS, backbone=backbone, pretrained_base=pretrained_base, **kwargs) - if pretrained: - from .model_store import get_model_file - device = torch.device(kwargs['local_rank']) - model.load_state_dict(torch.load(get_model_file('psp_%s_%s' % (backbone, acronyms[dataset]), root=root), - map_location=device)) - return model - - -def get_psp_resnet50_voc(**kwargs): - return get_psp('pascal_voc', 'resnet50', **kwargs) - - -def get_psp_resnet50_ade(**kwargs): - return get_psp('ade20k', 'resnet50', **kwargs) - - -def get_psp_resnet101_voc(**kwargs): - return get_psp('pascal_voc', 'resnet101', **kwargs) - - -def get_psp_resnet101_ade(**kwargs): - return get_psp('ade20k', 'resnet101', **kwargs) - - -def get_psp_resnet101_citys(**kwargs): - return get_psp('citys', 'resnet101', **kwargs) - - -def get_psp_resnet101_coco(**kwargs): - return get_psp('coco', 'resnet101', **kwargs) - - -if __name__ == '__main__': - model = get_psp_resnet50_voc() - img = torch.randn(4, 3, 480, 480) - output = model(img) diff --git a/core/models/segbase.py b/core/models/segbase.py deleted file mode 100644 index f1560936b..000000000 --- a/core/models/segbase.py +++ /dev/null @@ -1,60 +0,0 @@ -"""Base Model for Semantic Segmentation""" -import torch.nn as nn - -from ..nn import JPU -from .base_models.resnetv1b import resnet50_v1s, resnet101_v1s, resnet152_v1s - -__all__ = ['SegBaseModel'] - - -class SegBaseModel(nn.Module): - r"""Base Model for Semantic Segmentation - - Parameters - ---------- - backbone : string - Pre-trained dilated backbone network type (default:'resnet50'; 'resnet50', - 'resnet101' or 'resnet152'). - """ - - def __init__(self, nclass, aux, backbone='resnet50', jpu=False, pretrained_base=True, **kwargs): - super(SegBaseModel, self).__init__() - dilated = False if jpu else True - self.aux = aux - self.nclass = nclass - if backbone == 'resnet50': - self.pretrained = resnet50_v1s(pretrained=pretrained_base, dilated=dilated, **kwargs) - elif backbone == 'resnet101': - self.pretrained = resnet101_v1s(pretrained=pretrained_base, dilated=dilated, **kwargs) - elif backbone == 'resnet152': - self.pretrained = resnet152_v1s(pretrained=pretrained_base, dilated=dilated, **kwargs) - else: - raise RuntimeError('unknown backbone: {}'.format(backbone)) - - self.jpu = JPU([512, 1024, 2048], width=512, **kwargs) if jpu else None - - def base_forward(self, x): - """forwarding pre-trained network""" - x = self.pretrained.conv1(x) - x = self.pretrained.bn1(x) - x = self.pretrained.relu(x) - x = self.pretrained.maxpool(x) - c1 = self.pretrained.layer1(x) - c2 = self.pretrained.layer2(c1) - c3 = self.pretrained.layer3(c2) - c4 = self.pretrained.layer4(c3) - - if self.jpu: - return self.jpu(c1, c2, c3, c4) - else: - return c1, c2, c3, c4 - - def evaluate(self, x): - """evaluating network with inputs and targets""" - return self.forward(x)[0] - - def demo(self, x): - pred = self.forward(x) - if self.aux: - pred = pred[0] - return pred diff --git a/core/models/enet.py b/core/models/swnet.py similarity index 59% rename from core/models/enet.py rename to core/models/swnet.py index 853fc6571..e23f36841 100644 --- a/core/models/enet.py +++ b/core/models/swnet.py @@ -1,8 +1,8 @@ -"""Efficient Neural Network""" +"""A improved slightweight model""" import torch import torch.nn as nn -__all__ = ['ENet', 'get_enet', 'get_enet_citys'] +__all__ = ['swnet', 'get_swnet', 'get_swnet_citys'] class ENet(nn.Module): @@ -11,48 +11,83 @@ class ENet(nn.Module): def __init__(self, nclass, backbone='', aux=False, jpu=False, pretrained_base=None, **kwargs): super(ENet, self).__init__() self.initial = InitialBlock(13, **kwargs) - +#block 1: self.bottleneck1_0 = Bottleneck(16, 16, 64, downsampling=True, **kwargs) self.bottleneck1_1 = Bottleneck(64, 16, 64, **kwargs) self.bottleneck1_2 = Bottleneck(64, 16, 64, **kwargs) self.bottleneck1_3 = Bottleneck(64, 16, 64, **kwargs) self.bottleneck1_4 = Bottleneck(64, 16, 64, **kwargs) - + self.bottleneck1_5 = Bottleneck(64, 16, 64, **kwargs) + self.bottleneck1_6 = Bottleneck(64, 16, 64, **kwargs) + self.bottleneck1_7 = Bottleneck(64, 16, 64, **kwargs) + self.bottleneck1_8 = Bottleneck(64, 16, 64, **kwargs) + self.bottleneck1_9 = Bottleneck(64, 16, 64, **kwargs) + self.bottleneck1_10 = Bottleneck(64, 16, 64, **kwargs) +#blcok 2: self.bottleneck2_0 = Bottleneck(64, 32, 128, downsampling=True, **kwargs) self.bottleneck2_1 = Bottleneck(128, 32, 128, **kwargs) self.bottleneck2_2 = Bottleneck(128, 32, 128, dilation=2, **kwargs) self.bottleneck2_3 = Bottleneck(128, 32, 128, asymmetric=True, **kwargs) self.bottleneck2_4 = Bottleneck(128, 32, 128, dilation=4, **kwargs) self.bottleneck2_5 = Bottleneck(128, 32, 128, **kwargs) - self.bottleneck2_6 = Bottleneck(128, 32, 128, dilation=8, **kwargs) - self.bottleneck2_7 = Bottleneck(128, 32, 128, asymmetric=True, **kwargs) - self.bottleneck2_8 = Bottleneck(128, 32, 128, dilation=16, **kwargs) - - self.bottleneck3_1 = Bottleneck(128, 32, 128, **kwargs) - self.bottleneck3_2 = Bottleneck(128, 32, 128, dilation=2, **kwargs) - self.bottleneck3_3 = Bottleneck(128, 32, 128, asymmetric=True, **kwargs) - self.bottleneck3_4 = Bottleneck(128, 32, 128, dilation=4, **kwargs) + self.bottleneck2_6 = Bottleneck(128, 32, 128, **kwargs) + self.bottleneck2_7 = Bottleneck(128, 32, 128, **kwargs) + self.bottleneck2_8 = Bottleneck(128, 32, 128, dilation=8, **kwargs) + self.bottleneck2_9 = Bottleneck(128, 32, 128, asymmetric=True, **kwargs) + self.bottleneck2_10 = Bottleneck(128, 32, 128, dilation=16, **kwargs) +#block 3: + self.bottleneck3_0 = Bottleneck(128, 32, 128, **kwargs) + self.bottleneck3_1 = Bottleneck(128, 32, 128, dilation=2, **kwargs) + self.bottleneck3_2 = Bottleneck(128, 32, 128, asymmetric=True, **kwargs) + self.bottleneck3_3 = Bottleneck(128, 32, 128, dilation=4, **kwargs) + self.bottleneck3_4 = Bottleneck(128, 32, 128, **kwargs) self.bottleneck3_5 = Bottleneck(128, 32, 128, **kwargs) - self.bottleneck3_6 = Bottleneck(128, 32, 128, dilation=8, **kwargs) - self.bottleneck3_7 = Bottleneck(128, 32, 128, asymmetric=True, **kwargs) - self.bottleneck3_8 = Bottleneck(128, 32, 128, dilation=16, **kwargs) - + self.bottleneck3_6 = Bottleneck(128, 32, 128, **kwargs) + self.bottleneck3_7 = Bottleneck(128, 32, 128, **kwargs) + self.bottleneck3_8 = Bottleneck(128, 32, 128, dilation=8, **kwargs) + self.bottleneck3_9 = Bottleneck(128, 32, 128, asymmetric=True, **kwargs) + self.bottleneck3_10 = Bottleneck(128, 32, 128, dilation=16, **kwargs) +#block 4: self.bottleneck4_0 = UpsamplingBottleneck(128, 16, 64, **kwargs) self.bottleneck4_1 = Bottleneck(64, 16, 64, **kwargs) self.bottleneck4_2 = Bottleneck(64, 16, 64, **kwargs) - + self.bottleneck4_3 = Bottleneck(64, 16, 64, **kwargs) + self.bottleneck4_4 = Bottleneck(64, 16, 64, **kwargs) + self.bottleneck4_5 = Bottleneck(64, 16, 64, **kwargs) + self.bottleneck4_6 = Bottleneck(64, 16, 64, **kwargs) + self.bottleneck4_7 = Bottleneck(64, 16, 64, **kwargs) + self.bottleneck4_8 = Bottleneck(64, 16, 64, **kwargs) + self.bottleneck4_9 = Bottleneck(64, 16, 64, **kwargs) + self.bottleneck4_10 = Bottleneck(64, 16, 64, **kwargs) +#block 5: self.bottleneck5_0 = UpsamplingBottleneck(64, 4, 16, **kwargs) self.bottleneck5_1 = Bottleneck(16, 4, 16, **kwargs) - + self.bottleneck5_2 = Bottleneck(16, 4, 16, **kwargs) + self.bottleneck5_3 = Bottleneck(16, 4, 16, **kwargs) + self.bottleneck5_4 = Bottleneck(16, 4, 16, **kwargs) + self.bottleneck5_5 = Bottleneck(16, 4, 16, **kwargs) + self.bottleneck5_6 = Bottleneck(16, 4, 16, **kwargs) + self.bottleneck5_7 = Bottleneck(16, 4, 16, **kwargs) + self.bottleneck5_8 = Bottleneck(16, 4, 16, **kwargs) + self.bottleneck5_9 = Bottleneck(16, 4, 16, **kwargs) + self.bottleneck5_10 = Bottleneck(16, 4, 16, **kwargs) +#block 6: self.fullconv = nn.ConvTranspose2d(16, nclass, 2, 2, bias=False) self.__setattr__('exclusive', ['bottleneck1_0', 'bottleneck1_1', 'bottleneck1_2', 'bottleneck1_3', - 'bottleneck1_4', 'bottleneck2_0', 'bottleneck2_1', 'bottleneck2_2', - 'bottleneck2_3', 'bottleneck2_4', 'bottleneck2_5', 'bottleneck2_6', - 'bottleneck2_7', 'bottleneck2_8', 'bottleneck3_1', 'bottleneck3_2', - 'bottleneck3_3', 'bottleneck3_4', 'bottleneck3_5', 'bottleneck3_6', - 'bottleneck3_7', 'bottleneck3_8', 'bottleneck4_0', 'bottleneck4_1', - 'bottleneck4_2', 'bottleneck5_0', 'bottleneck5_1', 'fullconv']) + 'bottleneck1_4', 'bottleneck1_5', 'bottleneck1_6', 'bottleneck1_7', + 'bottleneck1_8', 'bottleneck1_9', 'bottleneck1_10','bottleneck2_0', + 'bottleneck2_1', 'bottleneck2_2', 'bottleneck2_3', 'bottleneck2_4', + 'bottleneck2_5', 'bottleneck2_6', 'bottleneck2_7', 'bottleneck2_8', + 'bottleneck2_9', 'bottleneck2_10','bottleneck3_0', 'bottleneck3_1', + 'bottleneck3_2', 'bottleneck3_3', 'bottleneck3_4', 'bottleneck3_5', + 'bottleneck3_6', 'bottleneck3_7', 'bottleneck3_8', 'bottleneck3_9', + 'bottleneck3_10','bottleneck4_0', 'bottleneck4_1', 'bottleneck4_2', + 'bottleneck4_3', 'bottleneck4_4', 'bottleneck4_5', 'bottleneck4_6', + 'bottleneck4_7', 'bottleneck4_8', 'bottleneck4_9', 'bottleneck4_10', + 'bottleneck5_0', 'bottleneck5_1', 'bottleneck5_2', 'bottleneck5_3', + 'bottleneck5_4', 'bottleneck5_5', 'bottleneck5_6', 'bottleneck5_7', + 'bottleneck5_8', 'bottleneck5_9', 'bottleneck5_10','fullconv']) def forward(self, x): # init @@ -64,7 +99,12 @@ def forward(self, x): x = self.bottleneck1_2(x) x = self.bottleneck1_3(x) x = self.bottleneck1_4(x) - + x = self.bottleneck1_5(x) + x = self.bottleneck1_6(x) + x = self.bottleneck1_7(x) + x = self.bottleneck1_8(x) + x = self.bottleneck1_9(x) + x = self.bottleneck1_10(x) # stage 2 x, max_indices2 = self.bottleneck2_0(x) x = self.bottleneck2_1(x) @@ -75,39 +115,59 @@ def forward(self, x): x = self.bottleneck2_6(x) x = self.bottleneck2_7(x) x = self.bottleneck2_8(x) - + x = self.bottleneck2_9(x) + x = self.bottleneck2_10(x) # stage 3 + x = self.bottleneck3_0(x) x = self.bottleneck3_1(x) x = self.bottleneck3_2(x) x = self.bottleneck3_3(x) x = self.bottleneck3_4(x) + x = self.bottleneck3_5(x) x = self.bottleneck3_6(x) x = self.bottleneck3_7(x) x = self.bottleneck3_8(x) + x = self.bottleneck3_9(x) + x = self.bottleneck3_10(x) # stage 4 x = self.bottleneck4_0(x, max_indices2) x = self.bottleneck4_1(x) x = self.bottleneck4_2(x) - + x = self.bottleneck4_3(x) + x = self.bottleneck4_4(x) + x = self.bottleneck4_5(x) + x = self.bottleneck4_6(x) + x = self.bottleneck4_7(x) + x = self.bottleneck4_8(x) + x = self.bottleneck4_9(x) + x = self.bottleneck4_10(x) # stage 5 x = self.bottleneck5_0(x, max_indices1) x = self.bottleneck5_1(x) - + x = self.bottleneck5_2(x) + x = self.bottleneck5_3(x) + x = self.bottleneck5_4(x) + x = self.bottleneck5_5(x) + x = self.bottleneck5_6(x) + x = self.bottleneck5_7(x) + x = self.bottleneck5_8(x) + x = self.bottleneck5_9(x) + x = self.bottleneck5_10(x) # out x = self.fullconv(x) return tuple([x]) class InitialBlock(nn.Module): - """ENet initial block""" + """swnet initial block""" def __init__(self, out_channels, norm_layer=nn.BatchNorm2d, **kwargs): super(InitialBlock, self).__init__() self.conv = nn.Conv2d(3, out_channels, 3, 2, 1, bias=False) self.maxpool = nn.MaxPool2d(2, 2) self.bn = norm_layer(out_channels + 3) - self.act = nn.PReLU() + self.act = nn.RReLU() def forward(self, x): x_conv = self.conv(x) @@ -135,14 +195,14 @@ def __init__(self, in_channels, inter_channels, out_channels, dilation=1, asymme self.conv1 = nn.Sequential( nn.Conv2d(in_channels, inter_channels, 1, bias=False), norm_layer(inter_channels), - nn.PReLU() + nn.RReLU() ) if downsampling: self.conv2 = nn.Sequential( nn.Conv2d(inter_channels, inter_channels, 2, stride=2, bias=False), norm_layer(inter_channels), - nn.PReLU() + nn.RReLU() ) else: if asymmetric: @@ -150,20 +210,20 @@ def __init__(self, in_channels, inter_channels, out_channels, dilation=1, asymme nn.Conv2d(inter_channels, inter_channels, (5, 1), padding=(2, 0), bias=False), nn.Conv2d(inter_channels, inter_channels, (1, 5), padding=(0, 2), bias=False), norm_layer(inter_channels), - nn.PReLU() + nn.RReLU() ) else: self.conv2 = nn.Sequential( nn.Conv2d(inter_channels, inter_channels, 3, dilation=dilation, padding=dilation, bias=False), norm_layer(inter_channels), - nn.PReLU() + nn.RReLU() ) self.conv3 = nn.Sequential( nn.Conv2d(inter_channels, out_channels, 1, bias=False), norm_layer(out_channels), nn.Dropout2d(0.1) ) - self.act = nn.PReLU() + self.act = nn.RReLU() def forward(self, x): identity = x @@ -196,15 +256,15 @@ def __init__(self, in_channels, inter_channels, out_channels, norm_layer=nn.Batc self.block = nn.Sequential( nn.Conv2d(in_channels, inter_channels, 1, bias=False), norm_layer(inter_channels), - nn.PReLU(), + nn.RReLU(), nn.ConvTranspose2d(inter_channels, inter_channels, 2, 2, bias=False), norm_layer(inter_channels), - nn.PReLU(), + nn.RReLU(), nn.Conv2d(inter_channels, out_channels, 1, bias=False), norm_layer(out_channels), nn.Dropout2d(0.1) ) - self.act = nn.PReLU() + self.act = nn.RReLU() def forward(self, x, max_indices): out_up = self.conv(x) diff --git a/scripts/demo.py b/scripts/demo.py index bc5773307..5b34c8134 100644 --- a/scripts/demo.py +++ b/scripts/demo.py @@ -14,8 +14,8 @@ parser = argparse.ArgumentParser( description='Predict segmentation result from a given image') -parser.add_argument('--model', type=str, default='fcn32s_vgg16_voc', - help='model name (default: fcn32_vgg16)') +parser.add_argument('--model', type=str, default='swnet_resnet50_city', + help='model name (default: swnet_resnet50)') parser.add_argument('--dataset', type=str, default='pascal_aug', choices=['pascal_voc, pascal_aug, ade20k, citys'], help='dataset name (default: pascal_voc)') parser.add_argument('--save-folder', default='~/.torch/models', diff --git a/scripts/fcn32s_vgg16_pascal_voc.sh b/scripts/fcn32s_vgg16_pascal_voc.sh deleted file mode 100755 index 8e74c7584..000000000 --- a/scripts/fcn32s_vgg16_pascal_voc.sh +++ /dev/null @@ -1,6 +0,0 @@ -#!/usr/bin/env bash - -# train -CUDA_VISIBLE_DEVICES=0 python train.py --model fcn32s \ - --backbone vgg16 --dataset pascal_voc \ - --lr 0.0001 --epochs 80 \ No newline at end of file diff --git a/scripts/fcn32s_vgg16_pascal_voc_dist.sh b/scripts/fcn32s_vgg16_pascal_voc_dist.sh deleted file mode 100755 index 5c826a44b..000000000 --- a/scripts/fcn32s_vgg16_pascal_voc_dist.sh +++ /dev/null @@ -1,7 +0,0 @@ -#!/usr/bin/env bash - -# train -export NGPUS=4 -CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --model fcn32s \ - --backbone vgg16 --dataset pascal_voc \ - --lr 0.01 --epochs 80 --batch_size 16 \ No newline at end of file diff --git a/scripts/swnet_resnet50_citys.sh b/scripts/swnet_resnet50_citys.sh new file mode 100644 index 000000000..3cf9a0d48 --- /dev/null +++ b/scripts/swnet_resnet50_citys.sh @@ -0,0 +1,6 @@ +#!/usr/bin/env bash + +# train +CUDA_VISIBLE_DEVICES=0 python train.py --model enet \ + --backbone resnet50 --dataset citys \ + --lr 0.0001 --epochs 50 diff --git a/scripts/train.py b/scripts/train.py index e57f43899..51723ccf9 100644 --- a/scripts/train.py +++ b/scripts/train.py @@ -28,14 +28,11 @@ def parse_args(): parser = argparse.ArgumentParser(description='Semantic Segmentation Training With Pytorch') # model and dataset parser.add_argument('--model', type=str, default='fcn', - choices=['fcn32s', 'fcn16s', 'fcn8s', 'fcn', 'psp', 'deeplabv3', - 'deeplabv3_plus', 'danet', 'denseaspp', 'bisenet', 'encnet', - 'dunet', 'icnet', 'enet', 'ocnet', 'psanet', 'cgnet', 'espnet', - 'lednet', 'dfanet'], + choices=['swnet'], help='model name (default: fcn32s)') parser.add_argument('--backbone', type=str, default='resnet50', - choices=['vgg16', 'resnet18', 'resnet50', 'resnet101', 'resnet152', - 'densenet121', 'densenet161', 'densenet169', 'densenet201'], + choices=['resnet50', 'resnet101', 'resnet152', + ], help='backbone name (default: vgg16)') parser.add_argument('--dataset', type=str, default='pascal_voc', choices=['pascal_voc', 'pascal_aug', 'ade20k', 'citys', 'sbu'],