1. 程式人生 > >DeepLearing學習筆記-改善深層神經網路(第三週作業-TensorFlow使用)

DeepLearing學習筆記-改善深層神經網路(第三週作業-TensorFlow使用)

0- 背景:

採用TensorFlow的框架進行神經網路構建和結果預測

1- 環境依賴:

import math
import numpy as np
import h5py
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.python.framework import ops
from tf_utils import load_dataset, random_mini_batches, convert_to_one_hot, predict

%matplotlib inline
np.random.seed(1
)

注意,如果是在windows下安裝TensorFlow的話,目前只支援python3的,且是64位版,否則你可以嘗試下。
安裝方式直接pip install tensorflow即可。

TensorFlow基本操作:

代價函式計算:

loss=L(y^,y)=(y^(i)y(i))2(1)
y_hat = tf.constant(36, name='y_hat')#定義常量,且值為36
y = tf.constant(39, name='y')        #定義常量,且值為39

loss = tf.Variable((y - y_hat)**2, name='loss')  #定義變數:loss
init = tf.global_variables_initializer()#新增節點用於初始化所有的變數 # the loss variable will be initialized and ready to be computed with tf.Session() as session: # Create a session and print the output session.run(init)#變數初始化 print(session.run(loss))# Prints the loss

執行結果:

9

TensorFlow的編碼步驟:
1. 建立Tensors (即變數) ,這些Tensors均未被執行
2. 通過Tensors之間的運算操作,實現目標函式,比如代價函式
3. Tensors初始化
4. Session建立
5. Session執行,該步驟是對目標函式的執行

例如:

a = tf.constant(2)
b = tf.constant(10)
c = tf.multiply(a,b)
print(c)

執行結果:

Tensor("Mul:0", shape=(), dtype=int32)

執行結果並不是20,這是因為上述程式碼僅僅是進行 ‘computation graph’的構建,並未進行計算的執行操作。需要通過下面的Session和進行run操作:

sess = tf.Session()
print(sess.run(c))

輸出:

20

佔位符的使用:
佔位符的使用,使得後期可以通過”feed dictionary”進行傳值。當定義一個尚未賦值的變數的時候,就可以使用佔位符。

# Change the value of x in the feed_dict
x = tf.placeholder(tf.int64, name = 'x')
print(sess.run(2 * x, feed_dict = {x: 3}))
sess.close()

輸出:
6

1-1 線性函式:

計算方程: Y=WX+b的輸出,其中WX都是隨機矩陣,b是一個隨機向量。

假設W尺寸= (4, 3), X 尺寸= (3,1) , b尺寸= (4,1)。X的定義如下:

X = tf.constant(np.random.randn(3,1), name = "X")

線性函式的定義:

# GRADED FUNCTION: linear_function

def linear_function():
    """
    Implements a linear function: 
            Initializes W to be a random tensor of shape (4,3)
            Initializes X to be a random tensor of shape (3,1)
            Initializes b to be a random tensor of shape (4,1)
    Returns: 
    result -- runs the session for Y = WX + b 
    """

    np.random.seed(1)

    ### START CODE HERE ### (4 lines of code)
    X = tf.constant(np.random.randn(3,1), name = "x")
    #print(X)
    W = tf.Variable(np.random.randn(4,3), name = "w")#也可以常量,因為這裡有賦值
    b = tf.Variable(np.random.randn(4,1), name = "b")
    Y = tf.add(tf.matmul(W,X), b)
    ### END CODE HERE ### 

    # Create the session using tf.Session() and run it with sess.run(...) on the variable you want to calculate

    ### START CODE HERE ###
    sess = tf.Session()
    sess.run(tf.global_variables_initializer())#如果上述是定義成常量,就不需要這裡的變數初始化操作
    result = sess.run(Y)
    ### END CODE HERE ### 

    # close the session 
    sess.close()

    return result

print( "result = " + str(linear_function()))

執行結果如下:

result = [[-2.15657382]
 [ 2.95891446]
 [-1.08926781]
 [-0.84538042]]

1-2 計算sigmoid

TensorFlow是自帶常見的神經網路啟用函式,如tf.sigmoid 和tf.softmax

# GRADED FUNCTION: sigmoid

def sigmoid(z):
    """
    Computes the sigmoid of z

    Arguments:
    z -- input value, scalar or vector

    Returns: 
    results -- the sigmoid of z
    """

    ### START CODE HERE ### ( approx. 4 lines of code)
    #建立佔位符x
    x = tf.placeholder(tf.float32, name = "x")

    # compute sigmoid(x)
    sigmoid = tf.sigmoid(x)#建立sigmoid函式節點

    # Create a session, and run it. Please use the method 2 explained above. 
    # 採用feed_dict將z值傳遞給x
    with tf.Session() as sess:
        # Run session and call the output "result"
        result = sess.run(sigmoid, feed_dict = {x: z})

    ### END CODE HERE ###

    return result

執行:

print ("sigmoid(0) = " + str(sigmoid(0)))
print ("sigmoid(12) = " + str(sigmoid(12)))

執行結果:

sigmoid(0) = 0.5
sigmoid(12) = 0.999994

1-3 代價計算

在TensorFlow中可以直接採用內建的函式計算神經網路的代價。在TensorFlow裡面,我們不必對m個樣本做如下的計算:a[2](i) and y(i) for i=1…m:

J=1mi=1m(y(i)loga[2](i)+(1y(i))log(1a[2](i)))(2)

而只需一行程式碼即可。例如cross entropy loss的計算:
tf.nn.sigmoid_cross_entropy_with_logits(logits = ..., labels = ...)

在使用tf.nn.sigmoid_cross_entropy_with_logits函式使用時,需要輸入 z, 和a(通過sigmoid函式計算返回)才能夠計算cross entropy cost J的值:

1mi=1m(y(i)logσ(z[2](i))+(1y(i))log(1σ(z[2](i)))(2)

程式碼實現:

# GRADED FUNCTION: cost

def cost(logits, labels):
    """
    Computes the cost using the sigmoid cross entropy

    Arguments:
    logits -- vector containing z, output of the last linear unit (before the final sigmoid activation)
    labels -- vector of labels y (1 or 0) 

    Note: What we've been calling "z" and "y" in this class are respectively called "logits" and "labels" 
    in the TensorFlow documentation. So logits will feed into z, and labels into y. 

    Returns:
    cost -- runs the session of the cost (formula (2))
    """

    ### START CODE HERE ### 

    # Create the placeholders for "logits" (z) and "labels" (y) (approx. 2 lines)
    z = tf.placeholder(tf.float32, name = "logits")
    y = tf.placeholder(tf.float32, name = "labels")

    #定義代價函式
    cost = tf.nn.sigmoid_cross_entropy_with_logits(logits=z, labels=y)

    # Create a session
    sess = tf.Session()    

    # Run the session (approx. 1 line).
    sess.run(tf.global_variables_initializer())
    cost = sess.run(cost, feed_dict = {z:logits, y:labels})

    # Close the session (approx. 1 line). See method 1 above.
    sess.close() # Close the session

    ### END CODE HERE ###

    return cost

執行測試:

logits = sigmoid(np.array([0.2,0.4,0.7,0.9]))
cost = cost(logits, np.array([0,0,1,1]))
print ("cost = " + str(cost))

測試結果如下:

cost = [ 1.00538719  1.03664076  0.41385433  0.39956617]

1-4 One Hot encodings

一般,我們的y向量中的值都是從0到C-1,如此來表示分類結果,其中C是分類數。當C= 4,我們需要對y向量做如下轉換:
這裡寫圖片描述
這就是所謂 “one hot”編碼。在TensorFlow中程式碼:
tf.one_hot(labels, depth, axis)

完整的”one hot”編碼實現:

# GRADED FUNCTION: one_hot_matrix

def one_hot_matrix(labels, C):
    """
    Creates a matrix where the i-th row corresponds to the ith class number and the jth column
                     corresponds to the jth training example. So if example j had a label i. Then entry (i,j) 
                     will be 1. 

    Arguments:
    labels -- vector containing the labels 
    C -- number of classes, the depth of the one hot dimension

    Returns: 
    one_hot -- one hot matrix
    """

    ### START CODE HERE ###

    # Create a tf.constant equal to C (depth), name it 'C'. (approx. 1 line)
    C = tf.constant(C, name="C")

    # Use tf.one_hot, be careful with the axis (approx. 1 line)
    one_hot_matrix = tf.one_hot(labels, depth=C, axis=0)
    #注意axis值的選擇,否則矩陣方向可能是錯的

    # Create the session (approx. 1 line)
    sess = tf.Session()   

    # Run the session (approx. 1 line)
    one_hot = sess.run(one_hot_matrix)

    # Close the session (approx. 1 line). See method 1 above.
    sess.close()

    ### END CODE HERE ###

    return one_hot

測試程式碼:

labels = np.array([1,2,3,0,2,1])
one_hot = one_hot_matrix(labels, C = 4)
print ("one_hot = " + str(one_hot))

執行結果如下:

one_hot = [[ 0.  0.  0.  1.  0.  0.]
 [ 1.  0.  0.  0.  0.  1.]
 [ 0.  1.  0.  0.  1.  0.]
 [ 0.  0.  1.  0.  0.  0.]]

1-5 初始化為0或1

TensorFlow的0和1的初始化方式如下:tf.ones()和tf.zeros()
入參是一個shape值,返回是一個矩陣:

# GRADED FUNCTION: ones

def ones(shape):
    """
    Creates an array of ones of dimension shape

    Arguments:
    shape -- shape of the array you want to create

    Returns: 
    ones -- array containing only ones
    """

    ### START CODE HERE ###

    # Create "ones" tensor using tf.ones(...). (approx. 1 line)
    ones = tf.ones(shape)

    # Create the session (approx. 1 line)
    sess = tf.Session()

    # Run the session to compute 'ones' (approx. 1 line)
    ones = sess.run(ones)

    # Close the session (approx. 1 line). See method 1 above.
    sess.close()

    ### END CODE HERE ###
    return ones

測試執行:

print ("ones = " + str(ones([3])))
print ("ones = " + str(ones([3,2])))

測試結果:

ones = [ 1.  1.  1.]
ones = [[ 1.  1.]
 [ 1.  1.]
 [ 1.  1.]]

2 基於TensorFlow的神經網路構建

2-1 資料介紹:

手勢識別的影象資訊如下:

  • Training set: 1080張圖 (64 by 64 pixels) 分別表示不同的0-5的數字手勢(每個數字手勢有180張)
  • Test set: 120張圖(64 by 64 pixels)分別表示0-5的數字手勢(每個數字手勢20張)

注意,這是的資料集是SIGNS dataset的子集,整個SIGNS dataset包含了更多的手勢。
以下是原始的影象及其表示的數字資訊:
這裡寫圖片描述

Figure 1: SIGNS dataset

資料載入:

# Loading the dataset
X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_dataset()

選擇一張影象進行顯示:

# Example of a picture
index = 0
plt.imshow(X_train_orig[index])
print ("y = " + str(np.squeeze(Y_train_orig[:, index])))

執行結果:

y = 5

這裡寫圖片描述
同樣,我們對輸入的影象做扁平化和歸一化處理,同時對於y標記結果做one hot 編碼:

# Flatten the training and test images
X_train_flatten = X_train_orig.reshape(X_train_orig.shape[0], -1).T
X_test_flatten = X_test_orig.reshape(X_test_orig.shape[0], -1).T
# Normalize image vectors
X_train = X_train_flatten/255.
X_test = X_test_flatten/255.
# Convert training and test labels to one hot matrices
Y_train = convert_to_one_hot(Y_train_orig, 6)
Y_test = convert_to_one_hot(Y_test_orig, 6)

print ("Y_train_orig size = " + str(Y_train_orig.shape))
print ("number of training examples = " + str(X_train.shape[1]))
print ("number of test examples = " + str(X_test.shape[1]))
print ("X_train shape: " + str(X_train.shape))
print ("Y_train shape: " + str(Y_train.shape))
print ("X_test shape: " + str(X_test.shape))
print ("Y_test shape: " + str(Y_test.shape))

執行結果如下:

Y_train_orig size = (1, 1080)
number of training examples = 1080
number of test examples = 120
X_train shape: (12288, 1080)
Y_train shape: (6, 1080)
X_test shape: (12288, 120)
Y_test shape: (6, 120)

其中12288=64×64×3。
神經網路的模型:
LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SOFTMAX。由於是多分類問題,所以在輸出層,採用的是softmax啟用函式。

2-2 建立佔位符

為X和Y建立佔位符,用以後期傳入訓練資料集。

# GRADED FUNCTION: create_placeholders

def create_placeholders(n_x, n_y):
    """
    Creates the placeholders for the tensorflow session.

    Arguments:
    n_x -- scalar, size of an image vector (num_px * num_px = 64 * 64 * 3 = 12288)
    n_y -- scalar, number of classes (from 0 to 5, so -> 6)

    Returns:
    X -- placeholder for the data input, of shape [n_x, None] and dtype "float"
    Y -- placeholder for the input labels, of shape [n_y, None] and dtype "float"

    Tips:
    - You will use None because it let's us be flexible on the number of examples you will for the placeholders.
      In fact, the number of examples during test/train is different.
    """

    ### START CODE HERE ### (approx. 2 lines)
    X = tf.placeholder(tf.float32, shape=(n_x, None), name = "X")
    Y = tf.placeholder(tf.float32, shape=(n_y, None), name = "Y")
    ### END CODE HERE ###

    return X, Y

測試執行:

X, Y = create_placeholders(12288, 6)
print ("X = " + str(X))
print ("Y = " + str(Y))

測試結果:

X = Tensor("X_4:0", shape=(12288, ?), dtype=float32)
Y = Tensor("Y_1:0", shape=(6, ?), dtype=float32)

2-3 引數初始化

對於權重矩陣w,採用Xavier初始化,b則直接0初始化。

W1 = tf.get_variable("W1", [25,12288], initializer = tf.contrib.layers.xavier_initializer(seed = 1))
b1 = tf.get_variable("b1", [25,1], initializer = tf.zeros_initializer())

完整程式碼實現:

# GRADED FUNCTION: initialize_parameters

def initialize_parameters():
    """
    Initializes parameters to build a neural network with tensorflow. The shapes are:
                        W1 : [25, 12288]
                        b1 : [25, 1]
                        W2 : [12, 25]
                        b2 : [12, 1]
                        W3 : [6, 12]
                        b3 : [6, 1]

    Returns:
    parameters -- a dictionary of tensors containing W1, b1, W2, b2, W3, b3
    """

    tf.set_random_seed(1)                   # so that your "random" numbers match ours

    ### START CODE HERE ### (approx. 6 lines of code)
    W1 = tf.get_variable("W1", [25,12288], initializer = tf.contrib.layers.xavier_initializer(seed = 1))
    b1 = tf.get_variable("b1", [25,1], initializer = tf.zeros_initializer())
    W2 = tf.get_variable("W2", [12,25], initializer = tf.contrib.layers.xavier_initializer(seed = 1))
    b2 = tf.get_variable("b2", [12,1], initializer = tf.zeros_initializer())
    W3 = tf.get_variable("W3", [6,12], initializer = tf.contrib.layers.xavier_initializer(seed = 1))
    b3 = tf.get_variable("b3", [6,1], initializer = tf.zeros_initializer())
    ### END CODE HERE ###

    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2,
                  "W3": W3,
                  "b3": b3}

    return parameters

測試程式碼執行:

tf.reset_default_graph()
with tf.Session() as sess:
    parameters = initialize_parameters()
    print("W1 = " + str(parameters["W1"]))
    print("b1 = " + str(parameters["b1"]))
    print("W2 = " + str(parameters["W2"]))
    print("b2 = " + str(parameters["b2"]))

執行結果如下:

W1 = <tf.Variable 'W1:0' shape=(25, 12288) dtype=float32_ref>
b1 = <tf.Variable 'b1:0' shape=(25, 1) dtype=float32_ref>
W2 = <tf.Variable 'W2:0' shape=(12, 25) dtype=float32_ref>
b2 =