The paper proposes a deep learning based systematic approach that includes an effective Convolutional Neural Network (CNN) structure, a hierarchical training strategy, and a video codec oriented switchable mechanism. In brief, the contributions of this work are as follows:
- A novel network,named as Squeezeand-Excitation Filtering CNN (SEFCNN), is designed, which is comprised of two subnets: Feature EXtracting (FEX) net and Feature ENhancing (FEN) net. The FEX is a stack of convolutional layers characterizing the spatial and channel-wise correlation, while the FEN is a squeeze-andexcitation net that fully explores the relationship between channels.
- A hierarchical model training strategy is developed. During the encoding, (a) different Quantization Parameters (QPs) cause different levels of artifacts; (b) different frame types employ different coding tools and thus exhibit different artifact proprieties. In contrast to prior researches that design a single powerful network for all kinds of artifacts, we propose to hierarchically deploy two subnets for different coding scenarios.
- When incorporating CNN model into video encoder, we conduct an adaptive mechanism that switches between the CNN-based and the traditional methods to selectively enhance some frames or some regions of a frame. Compared to previous work that applies one model to every single frame, our approach takes advantage of coding reference structure and obtains the superiority in both encoder computational complexity and overall coding efficiency.