Neocognition |
1979 |
A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position |
ConvNet |
1989 |
Used back-propagation to learn the convolution kernel coefficients directly from images of hand-written numbers |
Lenet |
December 1998 |
Introduced Convolutions. |
Alex Net |
September 2012 |
Introduced ReLU activation and Dropout to CNNs. Winner ILSVRC 2012. |
ZfNet |
2013 |
ZFNet is a classic convolutional neural network. The design was motivated by visualizing intermediate feature layers and the operation of the classifier. Compared to AlexNet, the filter sizes are reduced and the stride of the convolutions are reduced. |
GoogleNet |
2014 |
One particular incarnation used in our submission for ILSVRC 2014 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection. |
VGG |
September 2014 |
Used large number of filters of small size in each layer to learn complex features. Achieved SOTA in ILSVRC 2014. |
Inception Net |
September 2014 |
Introduced Inception Modules consisting of multiple parallel convolutional layers, designed to recognize different features at multiple scales. |
HighwayNet |
2015 |
Introduced a new architecture designed to ease gradient-based training of very deep networks |
Inception Net v2 / Inception Net v3 |
December 2015 |
Design Optimizations of the Inception Modules which improved performance and accuracy. |
Res Net |
December 2015 |
Introduced residual connections, which are shortcuts that bypass one or more layers in the network. Winner ILSVRC 2015. |
Inception Net v4 / Inception ResNet |
February 2016 |
Hybrid approach combining Inception Net and ResNet. |
Dense Net |
August 2016 |
Each layer receives input from all the previous layers, creating a dense network of connections between the layers, allowing to learn more diverse features. |
DarkNet |
2016 |
A convolutional neural network that acts as a backbone for the YOLOv3 object detection approach. |
Xception |
October 2016 |
Based on InceptionV3 but uses depthwise separable convolutions instead on inception modules. |
Res Next |
November 2016 |
Built over ResNet, introduces the concept of grouped convolutions, where the filters in a convolutional layer are divided into multiple groups. |
FractalNet |
2017 |
The first simple alternative to ResNet. |
Capsule Networks |
2017 |
Proposed to improve the performance of CNNs, especially in terms of spatial hierarchies and rotation invariance. |
WideResNet |
2017 |
This paper first introduces a simple principle for reducing the descriptions of event sequences without loss of information. |
PolyNet |
2017 |
This paper proposes a novel synthetic network management model based on ForCES. This model regards the device under management (DUM) as forwarding element (FE). |
Pyramidal Net |
2017 |
A PyramidNet is a type of convolutional network where the key idea is to concentrate on the feature map dimension by increasing it gradually instead of by increasing it sharply at each residual unit with downsampling. In addition, the network architecture works as a mixture of both plain and residual networks by using zero-padded identity-mapping shortcut connections when increasing the feature map dimension. |
Squeeze and Excitation Nets |
2017 |
Focus on the channel relationship and propose a novel architectural unit, termed the "Squeeze-and-Excitation" (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels. These blocks can be stacked together to form SENet architectures that generalise extremely effectively across different datasets. |
Mobile Net V1 |
April 2017 |
Uses depthwise separable convolutions to reduce the number of parameters and computation required. |
CMPE-SE |
2018 |
Competitive squeeze and excitation networks |
RAN |
2018 |
Residual attention neural network. Residual Attention Network is built by stacking Attention Modules which generate attention-aware features. The attention-aware features from different modules change adaptively as layers going deeper. |
CB-CNN |
2018 |
Channel boosted CNN, This idea of Channel Boosting exploits both the channel dimension of CNN (learning from multiple input channels) and Transfer learning (TL). TL is utilized at two different stages; channel generation and channel exploitation. |
CBAM |
2018 |
Convolutional Block Attention Module, a simple yet effective attention module for feed-forward convolutional neural networks. Given an intermediate feature map, the module sequentially infers attention maps along two separate dimensions, channel and spatial, then the attention maps are multiplied to the input feature map for adaptive feature refinement. |
Mobile Net V2 |
January 2018 |
Built upon the MobileNetv1 architecture, uses inverted residuals and linear bottlenecks. |
Mobile Net V3 |
May 2019 |
Uses AutoML to find the best possible neural network architecture for a given problem. |
Efficient Net |
May 2019 |
Uses a compound scaling method to scale the network's depth, width, and resolution to achieve a high accuracy with a relatively low computational cost. |
NoisyStudent |
2020 |
Noisy Student Training extends the idea of self-training and distillation with the use of equal-or-larger student models and noise added to the student during learning. On ImageNet, we first train an EfficientNet model on labeled images and use it as a teacher to generate pseudo labels for 300M unlabeled images. |
Vision Transformer |
October 2020 |
Images are segmented into patches, which are treated as tokens and a sequence of linear embeddings of these patches are input to a Transformer |
SwAV |
2020 |
Self-supervised learning approach for image classification |
ResNesT |
2022 |
Designed to scale ResNet-style models to new levels of performance |
DeiT |
December 2020 |
A convolution-free vision transformer that uses a teacher-student strategy with attention-based distillation tokens. |
Swin Transformer |
March 2021 |
A hierarchical vision transformer that uses shifted windows to addresses the challenges of adapting the transformer model to computer vision. |
CaiT |
2021 |
Combines vision transformers with convolutional layers |
T2T-ViT |
2021 |
Improved transformer-based vision models with token-to-token vision transformers. |
TNT |
2021 |
Transformer in Transformer architecture for better hierarchical feature learning |
BEiT |
June 2021 |
Utilizes a masked image modeling task inspired by BERT in, involving image patches and visual tokens to pretrain vision Transformers. |
MobileViT |
October 2021 |
A lightweight vision transformer designed for mobile devices, effectively combining the strengths of CNNs and ViTs. |
Masked AutoEncoder |
November 2021 |
An encoder-decoder architecture that reconstructs input images by masking random patches and leveraging a high proportion of masking for self-supervision. |
CoAtNet |
2021 |
CoAtNets (Convolution and Self-Attention Network) |
ConvNeXt |
2021 |
A design that adopts a transformer-like architecture while being a convolutional network. It improves upon the designs of earlier CNNs. |
NFNet |
2021 |
High-Performance Large-Scale Image Recognition Without Normalization |
MLP-Mixer |
2021 |
Introduced mixer layers as an alternative to convolutional layers. |
gMLP |
2021 |
Gated activations for better gradient flow |
Conv Mixer |
January 2022 |
Processes image patches using standard convolutions for mixing spatial and channel dimensions. |
MViT |
2022 |
A multiview vision transformer, designed for processing videos, providing a way to integrate information from different frames efficiently. |
Shuffle Transformer |
2022 |
Combined shuffle units with transformer blocks for efficient processing |
BEiT |
2022 |
Introduces a BERT-style pre-training approach for image recognition, using masked image modeling. |
CrossViT |
2022 |
Combines vision transformers with convolutional layers |
Masked Autoencoders (MAE) |
2022 |
A self-supervised learning method where the model learns to reconstruct images from partial inputs, improving efficiency and performance. |
RegNet |
2023 |
Introduced a design space exploration approach to neural network architecture search, producing efficient and high-performing models for image classification and other tasks |