論文筆記 Locality-Sensitive Deconvolution Networks with Gated Fusion for RGB-D Indoor Semantic Segmentation

阿新 • • 發佈：2018-09-29

extract pear rain bsp ble rgb oge nbsp png

技術分享圖片

用於RGB-D室內語義分割的具有門控融合的局部敏感反卷積網絡

abstract

problem: indoor semantic segmentation using RGB-D data

motivation: there is still room for improvements in two aspects:

boundary segmentation (邊界分割)---DeconvNet aggregates large context to predict the label of each pixel, inherently limiting the segmentation precision of object boundaries

技術分享圖片

RGB-D fusion (RGB-D 融合)---Recent state-of-the-art methods generally fuse RGB and depth networks with equal-weight score fusion, regardless of the varying contributions of the two modalities on delineating different categories in different scenes

技術分享圖片

method to adress problems above: Locality-sensitive DeconvNet; gated fusion layer

introduction

kinect-- capture high-quality synchronized visual (RGB data) and geometrical (depth data)

cues to depict one scene

DeconvNet-- learn to upsample the low-resolution label map of FCN into full resolution with more details

上圖(a)(b)是使用的two-stream DeconvNet followed by score fusion with equal-weight sum like FCN model[19]

體現的兩個有待改進那的兩個方面的例子

This paper aims to augment DeconvNet for indoor semantic segmentation with RGB-D data

Related work

仍然分為兩個方面，與motivation對應

Refine Boundaries for Semantic Segmentation

post-processing method

apply the superpixels generated by graph cuts to smooth the predictions[5,9]
adopt fully connected condition random fields (CRF) to optimize the holistic segmentation map[3,4]

designing particular deep learning models for dense prediction

CRF is incorporated into FCN by [29, 17] to encourage spatial and appearance consistency in the labelling outputs
Affinity CNNs [2, 20] embed additional pixel-wise similarity loss into FCN for dense prediction 相似性CNN？

add one data driven pooling layer on top of DeconvNet to smooth the predictions in every superpixel[12]

Combine RGB and Depth Data for Semantic Segmentation

[23, 22, 10] simply concatenate the handcrafted RGB and depth features to represent each pixel or superpixel
[7, 15] incorporate both the RGB and depth cues into graphical models like MRFs or CRFs for semantic segm
entation
RNN[16]

three levels of fusion: early middle late

[5] concatenate the RGB and depth image as four-channel input
[11] use two CNN to extract features from RGB and Depth images independently,then concatenate them
Long [19] also learn two independent CNN models but directly predict the score map of each modality, followed by score fusion with equal-weight sum

Proposed approach

LSD-GF

overall architecture

技術分享圖片

整體來看，分為3個組件

FCN is to learn robust feature representation for each pixel by aggregating multi-scale contextual cues.

ASPP[4] derived from VGG16

LS-DeconvNet is used to restore high-resolution and precise scene details based on the coarse FCN map

a gated fusion layer is introduced to fuse the RGB and depth cues effectively for accurate scene semantic segmentation

concatenate the prediction maps of RGB and depth to learn a weighted gate array

Locality-Sensitive DeconvNet

技術分享圖片

Locality-Sensitive Unpooling 局部敏感去池化

conventional unpooling 最大池化的逆過程，unpooling is helpful to reconstruct detailed

object boundaries, its capability can be limited a lot due to the excessive dependence on the input responding map with large context.

affinity matrix的來源是 RGB-D pixels

就像一個二維線性插值，更強調相鄰的相似像素

技術分享圖片

Deconvolution 反卷積

discontinuous boundary responses , to make up the missing detais

[21] 關於反卷積操作看這裏

Locality-Sensitive Average Pooling 局部敏感平均池化

進一步促進相似像素之間的連續性

傳統平均池化有缺點 to blur object boundaries and result in imprecise semantic segmentation map.

根據affinity matrix 只有相似的像素才會計入平均池化操作

技術分享圖片

can achieve consistent and robust feature representation for the consecutive object structures.

Gated Fusion 門控融合

3 layers concatenation layer/ convolution layer/ sigmoid layer

框圖

技術分享圖片

Implementation Details

preprocessing

affinity matrix A的計算

method[10] extract low-level RGB-D features(gradients over visual and geometrical cues) for each pixel, employ gPb-ucm[1] to generate over-segments. These over-segments can be used to calculate A by verifying that pairwise pixels belong to the same over-segment (similarity is 1) or not (similarity is 0). Note that we will scale A to match the resolution of the corresponding feature maps.

Optimization

two stages

train two independent locality-sensitive DeconvNets on RGB and depth for semantic segmentation without the gated fusion layer 先是分別訓練RGB和深度圖的兩個網絡，沒有融合層
In the second stage, we add the gated fusion layer, and then finetune the whole networks on the synchronized RGB and depth data.在第二階段，我們添加門控融合層，然後在同步RGB和深度數據上微調整個網絡

Experiments

Set up

datasets: 2 benchmark RGB-D dataset SUN RGB-D dataset [25] and the popular NYU-Depth v2 dataset

Metrics:pixel accuracy, mean accuracy, mean IOU and frequency weighted IOU

Overall Performance

table1

技術分享圖片

table2

技術分享圖片

ablation study

切除研究

removing or replacing each component independently or both together for semantic segmentation on the NYU-Depth v2 dataset

table3

技術分享圖片

We owe the improvement to the accurate recognition of some hard objects in the scene by gated fusion, such as box on the sofa and chair in the weak lights.

對結果的分析：

visualized Comparisons 可視化得比較

figure4

技術分享圖片

在 NYU-Depth v2 dataset 的實驗結果

分析在邊界和準確的識別物體上有提高

Conclusion

1) the localitysensitive deconvolution networks, which are designed for simultaneously upsamping the coarse fully convolutional maps and refining object boundaries; 2) gated fusion, which can adapt to the varying contributions of RGB and depth for better fusion of the two modalities for object recognition.

論文筆記 Locality-Sensitive Deconvolution Networks with Gated Fusion for RGB-D Indoor Semantic Segmentation

extract pear rain bsp ble rgb oge nbsp png 用於RGB-D室內語義分割的具有門控融合的局部敏感反卷積網絡 abstract problem: indoor semantic segmentation using RGB

深度學習論文筆記：Deep Residual Networks with Dynamically Weighted Wavelet Coefficients for Fault Diagnosis of Planetary Gearboxes

這篇文章將深度學習演算法應用於機械故障診斷，採用了“小波包分解+深度殘差網路(ResNet)”的思路，將機械振動訊號按照故障型別進行分類。文章的核心創新點：複雜旋轉機械系統的振動訊號包含著很多不同頻率的衝擊和振盪成分，而且不同頻帶內的振動成分在故障診斷中的重要程度經常是不同的，因此可以按照如下步驟設計深度

論文筆記 Locality-Sensitive Deconvolution Networks with Gated Fusion for RGB-D Indoor Semantic Segmentation

論文筆記 Locality-Sensitive Deconvolution Networks with Gated Fusion for RGB-D Indoor Semantic Segmentation

深度學習論文筆記：Deep Residual Networks with Dynamically Weighted Wavelet Coefficients for Fault Diagnosis of Planetary Gearboxes

論文筆記-Sequence to Sequence Learning with Neural Networks

論文筆記系列-Neural Architecture Search With Reinforcement Learning

論文筆記：Interpret Neural Networks by Identifying Critical Data Routing Paths

論文筆記《Fully Convolutional Networks for Semantic Segmentation》

論文筆記：Feature Pyramid Networks for Object Detection

part-aligned系列論文：1707.Deep Representation Learning with Part Loss for Person ReID 論文閱讀筆記

【深度學習論文筆記】Deep Neural Networks for Object Detection

論文筆記：Deep neural networks for YouTube recommendations

論文筆記：Dual Path Networks

【論文筆記】Generative Adversarial Networks

【論文筆記02】Text Understanding with the Attention Sum Reader Network

論文筆記《Chinese Lexical Analysis with Deep Bi-GRU-CRF Network》

論文筆記：Dual Skipping Networks 雙跳網路

【論文筆記】Neural Relation Extraction with Multi-lingual Attention

論文筆記：目標追蹤-CVPR2014-Adaptive Color Attributes for Real-time Visual Tracking

Convolutional Patch Networks with Spatial Prior for Road Detection and Urban Scene Understanding

論文筆記-Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation

論文筆記-DeepFM: A Factorization-Machine based Neural Network for CTR Prediction

論文筆記 Locality-Sensitive Deconvolution Networks with Gated Fusion for RGB-D Indoor Semantic Segmentation

相關推薦