網路結構之 Inception V3
1. 卷積網路結構的設計原則(principle)
-
[1] - 避免特徵表示的瓶頸(representational bottleneck),尤其是網路淺層結構. 前饋網路可以採用由輸入層到分類器或迴歸器的無環圖(acyclic graph) 來表示,其定義了資訊流的傳遞方向. 特徵表示瓶頸(representational bottleneck) 是指網路中間層會對特徵的維度進行較大的壓縮(pooling 等操作),從輸入到輸出特徵的尺寸明顯減少,出現特徵丟失. 理論上,由於特徵丟失的問題,比如特徵的關聯性結構,資訊內容不能僅僅由輸出的特徵表示. 只是提供了粗略的特徵估計. 優化網路結構,減少 Pooling 等導致的特徵丟失.
-
[2] - 高維的特徵表示,易於網路的收斂. 在卷積網路中增加每個區塊的啟用允許更多解開特徵(disentangled features). 網路訓練更快. 輸入資訊被分解,各子特徵間的關聯性低,子特徵內部的相關性強. 這樣,將強相關性的特徵聚合,更易於網路的收斂.
-
[3] - 低維特徵的空間聚合,不會導致特徵表示能力的丟失. 例如,在進行更加分散(如,3x3)的卷積前,可以在空間聚合(spatial aggregation)前,可以採用 1x1 卷積核降低輸入特徵的維度. 其原因,推測為:如果在空間聚合內容中使用輸出,臨近神經元的強關聯性,在降維時不會出現較多的資訊損失.
-
[4] - 平衡網路的寬度和深度.
2. 分解大尺寸核的卷積
Factorizing Convolutions with Large Filter Size
2.1 大尺寸卷積核分解為小尺寸卷積核
Factorization into smaller convolutions
一個 5x5 卷積核等價於兩個連續的 3x3 卷積核.
假設 5x5 和兩個連續的 3x3卷積輸出的特徵數相同,則計算量之比為:5x5/(3x3+3x3)=25/18
2.2 非對稱卷積的空間分解
Spatial Factorization into Asymmetric Convolutions
一個 3x3卷積核等價於兩個 3x1 卷積核.
將 7x7 卷積核分解成兩個卷積核(1x7, 7x1),3x3 卷積核分解為 (1x3, 3x1).
既可以加速計算,又通過將 1 個卷積層分解為 2 個卷積層,加深了網路深度,提高網路的非線性.
3. 有效減少網路尺寸
Efficient Grid Size Reduction
一般情況下,CNN 網路會採用 pooling 操作降低 feature maps 的網格尺寸. 為了避免出現特徵表示瓶頸(representational boottleneck)問題,在採用 max 或 ave pooling 操作前,將網路 filters 的啟用維度進行擴充套件(expanded).
例如:對於 k filters 的 dxd 網格,如果要得到 2k filters 的 (d/2)x(d/2) 網格,則,首先需要計算 2k filters 的 stride-1 卷積,然後再進行 pooling 操作. 總的計算量主要是在較大的網格上進行的 2dxdxkxk 次卷積操作.
一種可行的替代方式是,先 pooling 再卷積,則計算量為2(d/2)x(d/2)xkxk,將計算量減少到 1/4. 但,會出現特徵表示瓶頸問題,特徵表示的維度降低為 (d/2)x(d/2)xk,導致網路表徵能力不夠好(如圖 Figure 9).
這裡採用的方式,如 Figure 10,既去除了特徵表示瓶頸問題,同時減少了計算量. 採用兩個並行的步長 stride-2 操作.
4. Inception V3 網路結構.
採用 Figure 10 中的方法降低不同 Inception 模組間的網格尺寸.
採用 0-padding 的卷積,保持網格尺寸.
在 Inception 模組內部,也會採用 0-padding 的卷積來保持網格尺寸.
5. Tensorflow Slim 的 Inception V3 定義
"""
Inception V3 分類網路定義.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
from nets import inception_utils
slim = tf.contrib.slim
trunc_normal = lambda stddev: tf.truncated_normal_initializer(0.0, stddev)
def inception_v3_base(inputs,
final_endpoint='Mixed_7c',
min_depth=16,
depth_multiplier=1.0,
scope=None):
"""
Inception V3 基礎網路結構定義.
根據給定的輸入和最終網路節點構建 Inception V3 網路.
可以構建表格中從輸入到 inception 模組 Mixed_7c 的網路結構.
注:網路層的名字與論文裡的不對應,但,構建的網路相同.
old_names 到 new names 的對映:
Old name | New name
=======================================
conv0 | Conv2d_1a_3x3
conv1 | Conv2d_2a_3x3
conv2 | Conv2d_2b_3x3
pool1 | MaxPool_3a_3x3
conv3 | Conv2d_3b_1x1
conv4 | Conv2d_4a_3x3
pool2 | MaxPool_5a_3x3
mixed_35x35x256a | Mixed_5b
mixed_35x35x288a | Mixed_5c
mixed_35x35x288b | Mixed_5d
mixed_17x17x768a | Mixed_6a
mixed_17x17x768b | Mixed_6b
mixed_17x17x768c | Mixed_6c
mixed_17x17x768d | Mixed_6d
mixed_17x17x768e | Mixed_6e
mixed_8x8x1280a | Mixed_7a
mixed_8x8x2048a | Mixed_7b
mixed_8x8x2048b | Mixed_7c
Args:
inputs: Tensor,尺寸為 [batch_size, height, width, channels].
final_endpoint: 指定網路定義結束的節點endpoint,即網路深度.
候選值:['Conv2d_1a_3x3', 'Conv2d_2a_3x3', 'Conv2d_2b_3x3',
'MaxPool_3a_3x3', 'Conv2d_3b_1x1', 'Conv2d_4a_3x3',
'MaxPool_5a_3x3', 'Mixed_5b', 'Mixed_5c', 'Mixed_5d',
'Mixed_6a', 'Mixed_6b', 'Mixed_6c', 'Mixed_6d', 'Mixed_6e',
'Mixed_7a', 'Mixed_7b', 'Mixed_7c'].
min_depth: 所有卷積 ops 的最小深度值(通道數,depth value (number of channels)).
當 depth_multiplier < 1 時,強制執行;
當 depth_multiplier >= 1 時,不是主動約束項.
depth_multiplier: 所有卷積 ops 深度(depth (number of channels))的浮點數乘子.
該值必須大於 0.
一般是將該值設為 (0, 1) 間的浮點數值,以減少引數量或模型的計算量.
scope: 可選變數作用域 variable_scope.
Returns:
tensor_out: 對應到網路最終節點final_endpoint 的輸出張量Tensor.
end_points: 外部使用的啟用值集合,例如,summaries 和 losses.
Raises:
ValueError: if final_endpoint is not set to one of the predefined values,
or depth_multiplier <= 0
"""
# end_points 儲存相關外用的啟用值,例如 summaries 或 losses.
end_points = {}
if depth_multiplier <= 0:
raise ValueError('depth_multiplier is not greater than zero.')
depth = lambda d: max(int(d * depth_multiplier), min_depth)
with tf.variable_scope(scope, 'InceptionV3', [inputs]):
with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d], stride=1, padding='VALID'):
# 299 x 299 x 3
end_point = 'Conv2d_1a_3x3'
net = slim.conv2d(inputs, depth(32), [3, 3], stride=2, scope=end_point)
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# 149 x 149 x 32
end_point = 'Conv2d_2a_3x3'
net = slim.conv2d(net, depth(32), [3, 3], scope=end_point)
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# 147 x 147 x 32
end_point = 'Conv2d_2b_3x3'
net = slim.conv2d(net, depth(64), [3, 3], padding='SAME', scope=end_point)
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# 147 x 147 x 64
end_point = 'MaxPool_3a_3x3'
net = slim.max_pool2d(net, [3, 3], stride=2, scope=end_point)
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# 73 x 73 x 64
end_point = 'Conv2d_3b_1x1'
net = slim.conv2d(net, depth(80), [1, 1], scope=end_point)
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# 73 x 73 x 80.
end_point = 'Conv2d_4a_3x3'
net = slim.conv2d(net, depth(192), [3, 3], scope=end_point)
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# 71 x 71 x 192.
end_point = 'MaxPool_5a_3x3'
net = slim.max_pool2d(net, [3, 3], stride=2, scope=end_point)
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# 35 x 35 x 192.
# Inception blocks
with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d], stride=1, padding='SAME'):
# mixed: 35 x 35 x 256.
end_point = 'Mixed_5b'
with tf.variable_scope(end_point):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, depth(48), [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, depth(64), [5, 5], scope='Conv2d_0b_5x5')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, depth(96), [3, 3], scope='Conv2d_0b_3x3')
branch_2 = slim.conv2d(branch_2, depth(96), [3, 3], scope='Conv2d_0c_3x3')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, depth(32), [1, 1], scope='Conv2d_0b_1x1')
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# mixed_1: 35 x 35 x 288.
end_point = 'Mixed_5c'
with tf.variable_scope(end_point):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, depth(48), [1, 1], scope='Conv2d_0b_1x1')
branch_1 = slim.conv2d(branch_1, depth(64), [5, 5], scope='Conv_1_0c_5x5')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, depth(96), [3, 3], scope='Conv2d_0b_3x3')
branch_2 = slim.conv2d(branch_2, depth(96), [3, 3], scope='Conv2d_0c_3x3')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, depth(64), [1, 1], scope='Conv2d_0b_1x1')
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# mixed_2: 35 x 35 x 288.
end_point = 'Mixed_5d'
with tf.variable_scope(end_point):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, depth(48), [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, depth(64), [5, 5], scope='Conv2d_0b_5x5')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, depth(96), [3, 3], scope='Conv2d_0b_3x3')
branch_2 = slim.conv2d(branch_2, depth(96), [3, 3], scope='Conv2d_0c_3x3')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, depth(64), [1, 1], scope='Conv2d_0b_1x1')
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# mixed_3: 17 x 17 x 768.
end_point = 'Mixed_6a'
with tf.variable_scope(end_point):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, depth(384), [3, 3], stride=2,
padding='VALID', scope='Conv2d_1a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, depth(96), [3, 3], scope='Conv2d_0b_3x3')
branch_1 = slim.conv2d(branch_1, depth(96), [3, 3], stride=2,
padding='VALID', scope='Conv2d_1a_1x1')
with tf.variable_scope('Branch_2'):
branch_2 = slim.max_pool2d(net, [3, 3], stride=2,
padding='VALID', scope='MaxPool_1a_3x3')
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2])
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# mixed4: 17 x 17 x 768.
end_point = 'Mixed_6b'
with tf.variable_scope(end_point):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, depth(128), [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, depth(128), [1, 7], scope='Conv2d_0b_1x7')
branch_1 = slim.conv2d(branch_1, depth(192), [7, 1], scope='Conv2d_0c_7x1')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, depth(128), [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, depth(128), [7, 1], scope='Conv2d_0b_7x1')
branch_2 = slim.conv2d(branch_2, depth(128), [1, 7], scope='Conv2d_0c_1x7')
branch_2 = slim.conv2d(branch_2, depth(128), [7, 1], scope='Conv2d_0d_7x1')
branch_2 = slim.conv2d(branch_2, depth(192), [1, 7], scope='Conv2d_0e_1x7')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, depth(192), [1, 1], scope='Conv2d_0b_1x1')
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# mixed_5: 17 x 17 x 768.
end_point = 'Mixed_6c'
with tf.variable_scope(end_point):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, depth(160), [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, depth(160), [1, 7], scope='Conv2d_0b_1x7')
branch_1 = slim.conv2d(branch_1