layout |
---|
default |
Existing image inpainting methods have shown impressive completion results for low-resolution images. However, most of these algorithms fail at high resolutions and require powerful hardware, limiting their deployment on edge devices. Motivated by this, we propose the first baseline for REal-Time High-resolution image INpainting on Edge Devices (RETHINED) that is able to inpaint at ultra-high-resolution and can run in real-time (
Given a high-resolution RGB image y ∈ ℝHHR × WHR × 3 (where HHR and WHR denote, respectively, the height and width of the high-resolution image in pixels) and a binary mask m ∈ ℝHHR × WHR containing the corrupted pixels, our goal is to fill-in with plausible information the masked image x = y ⊙ m.
To achieve this goal, we first downsample x to a lower resolution obtaining xLR ∈ ℝH × W × 3 (where H < HHR and W < WHR) and forward it to the coarse model, obtaining the coarse inpainted image x̂coarse of size H × W. Then, we use the NeuralPatchMatch module to refine x̂coarse by propagating known content from the input image xLR, obtaining x̂LR and the corresponding attention map A.
Finally our Attention Upscaling module uses the learned attention map A and x to resemble high texture details found on the base image, finally obtaining a high-resolution image x̂HR.
Figure 3. Proposed NeuralPatchMatch Inpainting Module. (Corrupted patches are displayed as red while uncorrupted ones as green .) First, we project patch embedding to embedding space of dimension dk (Sect. 3.2). Then token similarity is computed in a self-attention manner, obtaining attention map A (where lighter colors correspond to a large softmax value while darker colors correspond to a low value). The self-attention masking allows to inpaint only on corrupted regions, maintaining high-frequency details from uncorrupted zones. To obtain the final inpainted image, we mix the tokens via a weighted sum based on the attention map A.