1. 程式人生 > >網路結構之 Inception V3

網路結構之 Inception V3

1. 卷積網路結構的設計原則(principle)

  • [1] - 避免特徵表示的瓶頸(representational bottleneck),尤其是網路淺層結構. 前饋網路可以採用由輸入層到分類器或迴歸器的無環圖(acyclic graph) 來表示,其定義了資訊流的傳遞方向. 特徵表示瓶頸(representational bottleneck) 是指網路中間層會對特徵的維度進行較大的壓縮(pooling 等操作),從輸入到輸出特徵的尺寸明顯減少,出現特徵丟失. 理論上,由於特徵丟失的問題,比如特徵的關聯性結構,資訊內容不能僅僅由輸出的特徵表示. 只是提供了粗略的特徵估計. 優化網路結構,減少 Pooling 等導致的特徵丟失.

  • [2] - 高維的特徵表示,易於網路的收斂. 在卷積網路中增加每個區塊的啟用允許更多解開特徵(disentangled features). 網路訓練更快. 輸入資訊被分解,各子特徵間的關聯性低,子特徵內部的相關性強. 這樣,將強相關性的特徵聚合,更易於網路的收斂.

  • [3] - 低維特徵的空間聚合,不會導致特徵表示能力的丟失. 例如,在進行更加分散(如,3x3)的卷積前,可以在空間聚合(spatial aggregation)前,可以採用 1x1 卷積核降低輸入特徵的維度. 其原因,推測為:如果在空間聚合內容中使用輸出,臨近神經元的強關聯性,在降維時不會出現較多的資訊損失.

  • [4] - 平衡網路的寬度和深度.

    網路寬度和深度的平衡,才能達到最有的網路效能.

2. 分解大尺寸核的卷積

Factorizing Convolutions with Large Filter Size

2.1 大尺寸卷積核分解為小尺寸卷積核

Factorization into smaller convolutions

一個 5x5 卷積核等價於兩個連續的 3x3 卷積核. 假設 5x5 和兩個連續的 3x3卷積輸出的特徵數相同,則計算量之比為:5x5/(3x3+3x3)=25/18

2.2 非對稱卷積的空間分解

Spatial Factorization into Asymmetric Convolutions

一個 3x3卷積核等價於兩個 3x1 卷積核. 將 7x7 卷積核分解成兩個卷積核(1x7, 7x1),3x3 卷積核分解為 (1x3, 3x1). 既可以加速計算,又通過將 1 個卷積層分解為 2 個卷積層,加深了網路深度,提高網路的非線性.

3. 有效減少網路尺寸

Efficient Grid Size Reduction

一般情況下,CNN 網路會採用 pooling 操作降低 feature maps 的網格尺寸. 為了避免出現特徵表示瓶頸(representational boottleneck)問題,在採用 max 或 ave pooling 操作前,將網路 filters 的啟用維度進行擴充套件(expanded).

例如:對於 k filters 的 dxd 網格,如果要得到 2k filters 的 (d/2)x(d/2) 網格,則,首先需要計算 2k filters 的 stride-1 卷積,然後再進行 pooling 操作. 總的計算量主要是在較大的網格上進行的 2dxdxkxk 次卷積操作.

一種可行的替代方式是,先 pooling 再卷積,則計算量為2(d/2)x(d/2)xkxk,將計算量減少到 1/4. 但,會出現特徵表示瓶頸問題,特徵表示的維度降低為 (d/2)x(d/2)xk,導致網路表徵能力不夠好(如圖 Figure 9).

這裡採用的方式,如 Figure 10,既去除了特徵表示瓶頸問題,同時減少了計算量. 採用兩個並行的步長 stride-2 操作.

4. Inception V3 網路結構.

採用 Figure 10 中的方法降低不同 Inception 模組間的網格尺寸. 採用 0-padding 的卷積,保持網格尺寸. 在 Inception 模組內部,也會採用 0-padding 的卷積來保持網格尺寸.

5. Tensorflow Slim 的 Inception V3 定義

"""
Inception V3 分類網路定義.
"""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import tensorflow as tf

from nets import inception_utils

slim = tf.contrib.slim
trunc_normal = lambda stddev: tf.truncated_normal_initializer(0.0, stddev)


def inception_v3_base(inputs,
									final_endpoint='Mixed_7c',
									min_depth=16,
									depth_multiplier=1.0,
									scope=None):
  """
  Inception V3 基礎網路結構定義.
  
  根據給定的輸入和最終網路節點構建 Inception V3 網路. 
  可以構建表格中從輸入到 inception 模組 Mixed_7c 的網路結構.
  
  注:網路層的名字與論文裡的不對應,但,構建的網路相同.
  
  old_names 到 new names 的對映:
  Old name          | New name
  =======================================
  conv0             | Conv2d_1a_3x3
  conv1             | Conv2d_2a_3x3
  conv2             | Conv2d_2b_3x3
  pool1             | MaxPool_3a_3x3
  conv3             | Conv2d_3b_1x1
  conv4             | Conv2d_4a_3x3
  pool2             | MaxPool_5a_3x3
  mixed_35x35x256a  | Mixed_5b
  mixed_35x35x288a  | Mixed_5c
  mixed_35x35x288b  | Mixed_5d
  mixed_17x17x768a  | Mixed_6a
  mixed_17x17x768b  | Mixed_6b
  mixed_17x17x768c  | Mixed_6c
  mixed_17x17x768d  | Mixed_6d
  mixed_17x17x768e  | Mixed_6e
  mixed_8x8x1280a   | Mixed_7a
  mixed_8x8x2048a   | Mixed_7b
  mixed_8x8x2048b   | Mixed_7c

  Args:
    inputs: Tensor,尺寸為 [batch_size, height, width, channels].
    final_endpoint: 指定網路定義結束的節點endpoint,即網路深度.
                            候選值:['Conv2d_1a_3x3', 'Conv2d_2a_3x3', 'Conv2d_2b_3x3',
							'MaxPool_3a_3x3', 'Conv2d_3b_1x1', 'Conv2d_4a_3x3', 
							'MaxPool_5a_3x3', 'Mixed_5b', 'Mixed_5c', 'Mixed_5d', 
							'Mixed_6a', 'Mixed_6b', 'Mixed_6c', 'Mixed_6d', 'Mixed_6e', 
							'Mixed_7a', 'Mixed_7b', 'Mixed_7c'].
    min_depth: 所有卷積 ops 的最小深度值(通道數,depth value (number of channels)).
                      當 depth_multiplier < 1 時,強制執行;
                      當 depth_multiplier >= 1 時,不是主動約束項.
    depth_multiplier: 所有卷積 ops 深度(depth (number of channels))的浮點數乘子.
                                該值必須大於 0.
                                一般是將該值設為 (0, 1) 間的浮點數值,以減少引數量或模型的計算量.
    scope: 可選變數作用域 variable_scope.

  Returns:
    tensor_out: 對應到網路最終節點final_endpoint 的輸出張量Tensor.
    end_points: 外部使用的啟用值集合,例如,summaries 和 losses.

  Raises:
    ValueError: if final_endpoint is not set to one of the predefined values,
                or depth_multiplier <= 0
  """
  # end_points 儲存相關外用的啟用值,例如 summaries 或 losses.
  end_points = {}

  if depth_multiplier <= 0:
    raise ValueError('depth_multiplier is not greater than zero.')
  depth = lambda d: max(int(d * depth_multiplier), min_depth)

  with tf.variable_scope(scope, 'InceptionV3', [inputs]):
    with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d], stride=1, padding='VALID'):
      # 299 x 299 x 3
      end_point = 'Conv2d_1a_3x3'
      net = slim.conv2d(inputs, depth(32), [3, 3], stride=2, scope=end_point)
      end_points[end_point] = net
      if end_point == final_endpoint: return net, end_points
      # 149 x 149 x 32
      end_point = 'Conv2d_2a_3x3'
      net = slim.conv2d(net, depth(32), [3, 3], scope=end_point)
      end_points[end_point] = net
      if end_point == final_endpoint: return net, end_points
      # 147 x 147 x 32
      end_point = 'Conv2d_2b_3x3'
      net = slim.conv2d(net, depth(64), [3, 3], padding='SAME', scope=end_point)
      end_points[end_point] = net
      if end_point == final_endpoint: return net, end_points
      # 147 x 147 x 64
      end_point = 'MaxPool_3a_3x3'
      net = slim.max_pool2d(net, [3, 3], stride=2, scope=end_point)
      end_points[end_point] = net
      if end_point == final_endpoint: return net, end_points
      # 73 x 73 x 64
      end_point = 'Conv2d_3b_1x1'
      net = slim.conv2d(net, depth(80), [1, 1], scope=end_point)
      end_points[end_point] = net
      if end_point == final_endpoint: return net, end_points
      # 73 x 73 x 80.
      end_point = 'Conv2d_4a_3x3'
      net = slim.conv2d(net, depth(192), [3, 3], scope=end_point)
      end_points[end_point] = net
      if end_point == final_endpoint: return net, end_points
      # 71 x 71 x 192.
      end_point = 'MaxPool_5a_3x3'
      net = slim.max_pool2d(net, [3, 3], stride=2, scope=end_point)
      end_points[end_point] = net
      if end_point == final_endpoint: return net, end_points
      # 35 x 35 x 192.

    # Inception blocks
    with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d], stride=1, padding='SAME'):
      # mixed: 35 x 35 x 256.
      end_point = 'Mixed_5b'
      with tf.variable_scope(end_point):
        with tf.variable_scope('Branch_0'):
          branch_0 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1')
        with tf.variable_scope('Branch_1'):
          branch_1 = slim.conv2d(net, depth(48), [1, 1], scope='Conv2d_0a_1x1')
          branch_1 = slim.conv2d(branch_1, depth(64), [5, 5], scope='Conv2d_0b_5x5')
        with tf.variable_scope('Branch_2'):
          branch_2 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1')
          branch_2 = slim.conv2d(branch_2, depth(96), [3, 3], scope='Conv2d_0b_3x3')
          branch_2 = slim.conv2d(branch_2, depth(96), [3, 3], scope='Conv2d_0c_3x3')
        with tf.variable_scope('Branch_3'):
          branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
          branch_3 = slim.conv2d(branch_3, depth(32), [1, 1], scope='Conv2d_0b_1x1')
        net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
      end_points[end_point] = net
      if end_point == final_endpoint: return net, end_points

      # mixed_1: 35 x 35 x 288.
      end_point = 'Mixed_5c'
      with tf.variable_scope(end_point):
        with tf.variable_scope('Branch_0'):
          branch_0 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1')
        with tf.variable_scope('Branch_1'):
          branch_1 = slim.conv2d(net, depth(48), [1, 1], scope='Conv2d_0b_1x1')
          branch_1 = slim.conv2d(branch_1, depth(64), [5, 5], scope='Conv_1_0c_5x5')
        with tf.variable_scope('Branch_2'):
          branch_2 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1')
          branch_2 = slim.conv2d(branch_2, depth(96), [3, 3], scope='Conv2d_0b_3x3')
          branch_2 = slim.conv2d(branch_2, depth(96), [3, 3], scope='Conv2d_0c_3x3')
        with tf.variable_scope('Branch_3'):
          branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
          branch_3 = slim.conv2d(branch_3, depth(64), [1, 1], scope='Conv2d_0b_1x1')
        net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
      end_points[end_point] = net
      if end_point == final_endpoint: return net, end_points

      # mixed_2: 35 x 35 x 288.
      end_point = 'Mixed_5d'
      with tf.variable_scope(end_point):
        with tf.variable_scope('Branch_0'):
          branch_0 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1')
        with tf.variable_scope('Branch_1'):
          branch_1 = slim.conv2d(net, depth(48), [1, 1], scope='Conv2d_0a_1x1')
          branch_1 = slim.conv2d(branch_1, depth(64), [5, 5], scope='Conv2d_0b_5x5')
        with tf.variable_scope('Branch_2'):
          branch_2 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1')
          branch_2 = slim.conv2d(branch_2, depth(96), [3, 3], scope='Conv2d_0b_3x3')
          branch_2 = slim.conv2d(branch_2, depth(96), [3, 3], scope='Conv2d_0c_3x3')
        with tf.variable_scope('Branch_3'):
          branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
          branch_3 = slim.conv2d(branch_3, depth(64), [1, 1], scope='Conv2d_0b_1x1')
        net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
      end_points[end_point] = net
      if end_point == final_endpoint: return net, end_points

      # mixed_3: 17 x 17 x 768.
      end_point = 'Mixed_6a'
      with tf.variable_scope(end_point):
        with tf.variable_scope('Branch_0'):
          branch_0 = slim.conv2d(net, depth(384), [3, 3], stride=2, 
												padding='VALID', scope='Conv2d_1a_1x1')
        with tf.variable_scope('Branch_1'):
          branch_1 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1')
          branch_1 = slim.conv2d(branch_1, depth(96), [3, 3], scope='Conv2d_0b_3x3')
          branch_1 = slim.conv2d(branch_1, depth(96), [3, 3], stride=2,
												padding='VALID', scope='Conv2d_1a_1x1')
        with tf.variable_scope('Branch_2'):
          branch_2 = slim.max_pool2d(net, [3, 3], stride=2, 
														padding='VALID', scope='MaxPool_1a_3x3')
        net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2])
      end_points[end_point] = net
      if end_point == final_endpoint: return net, end_points

      # mixed4: 17 x 17 x 768.
      end_point = 'Mixed_6b'
      with tf.variable_scope(end_point):
        with tf.variable_scope('Branch_0'):
          branch_0 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1')
        with tf.variable_scope('Branch_1'):
          branch_1 = slim.conv2d(net, depth(128), [1, 1], scope='Conv2d_0a_1x1')
          branch_1 = slim.conv2d(branch_1, depth(128), [1, 7], scope='Conv2d_0b_1x7')
          branch_1 = slim.conv2d(branch_1, depth(192), [7, 1], scope='Conv2d_0c_7x1')
        with tf.variable_scope('Branch_2'):
          branch_2 = slim.conv2d(net, depth(128), [1, 1], scope='Conv2d_0a_1x1')
          branch_2 = slim.conv2d(branch_2, depth(128), [7, 1], scope='Conv2d_0b_7x1')
          branch_2 = slim.conv2d(branch_2, depth(128), [1, 7], scope='Conv2d_0c_1x7')
          branch_2 = slim.conv2d(branch_2, depth(128), [7, 1], scope='Conv2d_0d_7x1')
          branch_2 = slim.conv2d(branch_2, depth(192), [1, 7], scope='Conv2d_0e_1x7')
        with tf.variable_scope('Branch_3'):
          branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
          branch_3 = slim.conv2d(branch_3, depth(192), [1, 1], scope='Conv2d_0b_1x1')
        net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
      end_points[end_point] = net
      if end_point == final_endpoint: return net, end_points

      # mixed_5: 17 x 17 x 768.
      end_point = 'Mixed_6c'
      with tf.variable_scope(end_point):
        with tf.variable_scope('Branch_0'):
          branch_0 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1')
        with tf.variable_scope('Branch_1'):
          branch_1 = slim.conv2d(net, depth(160), [1, 1], scope='Conv2d_0a_1x1')
          branch_1 = slim.conv2d(branch_1, depth(160), [1, 7], scope='Conv2d_0b_1x7')
          branch_1 = slim.conv2d(branch_1