1. 程式人生 > >深度學習筆記——深度學習框架TensorFlow(二)

深度學習筆記——深度學習框架TensorFlow(二)

一. 學習網站:

二. 教程:

目錄:
1. 面向機器學習初學者的 MNIST 初級教程
2. 面向機器學習專家的 MNIST 高階教程
3. TensorFlow 使用指南(以MNIST為例)
4. 簡單的機器學習with tf.contrib.learn

MNIST For ML Beginners:
This tutorial is intended for readers who are new to both machine learning and TensorFlow. If you already know what MNIST is, and what softmax (multinomial logistic) regression is, you might prefer this faster paced tutorial. Be sure to install TensorFlow before starting either tutorial.
這篇教程是為機器學習和TensorFlow新手裝備的,如果你已經知道了MNIST,softmax迴歸函式是什麼,你更應該閱讀(

https://www.tensorflow.org/versions/r0.12/tutorials/mnist/pros/index.html)。在此之前請確保TensorFlow安裝好了。

When one learns how to program, there’s a tradition that the first thing you do is print “Hello World.” Just like programming has Hello World, machine learning has MNIST.
MNIST類似與程式中的“Hello World”

MNIST is a simple computer vision dataset. It consists of images of handwritten digits like these:
MNIST是一個簡單的計算視覺資料,他由如下手寫的數字組成。
這裡寫圖片描述

It also includes labels for each image, telling us which digit it is. For example, the labels for the above images are 5, 0, 4, and 1.
對於每張圖片,它還包括一個標記,如上標記為5,0,4,1。

In this tutorial, we’re going to train a model to look at images and predict what digits they are. Our goal isn’t to train a really elaborate model that achieves state-of-the-art performance – although we’ll give you code to do that later! – but rather to dip a toe into using TensorFlow. As such, we’re going to start with a very simple model, called a Softmax Regression.
在這個教程中,我們訓練一個模型,並且預測這個數字是什麼。我們的目的不是要設計一個世界一流的複雜模型 – 儘管我們會在之後給你原始碼去實現一流的預測模型 – 而是要介紹下如何使用TensorFlow。所以,我們這裡會從一個很簡單的數學模型開始,它叫做Softmax Regression。

The actual code for this tutorial is very short, and all the interesting stuff happens in just three lines. However, it is very important to understand the ideas behind it: both how TensorFlow works and the core machine learning concepts. Because of this, we are going to very carefully work through the code.
對應這個教程的實現程式碼很短,而且真正有意思的內容只包含在三行程式碼裡面。但是,去理解包含在這些程式碼裡面的設計思想是非常重要的:TensorFlow工作流程和機器學習的基本概念。因此,這個教程會很詳細地介紹這些程式碼的實現原理。

教程目標:
1. 學習MNIST資料和softmax迴歸
2. 基於圖片中的每一個畫素,建立模型以識別數字
3. 通過上千個例子,使用TF來訓練模型並且識別數字。
4. 檢查模型的準確性

from tensorflow.examples.tutorials.mnist import input_data  
mnist = input_data.read_data_sets("MNIST_data/",one_hot=True)

對此我進行了修改,只需要考入兩個檔案就可以下載所需資料了:這裡寫圖片描述
3.資料分析:MNIST資料被分為3個部分:mnist.train是55000條訓練資料,mnist.test是10000條測試資料,mnist.validation是5000個驗證資料。這樣分割十分重要,在機器學習模型設計時必須有一個單獨的測試資料集不用於訓練而是用來評估這個模型的效能,從而更加容易把設計的模型推廣到其他資料集上(泛化)。

As mentioned earlier, every MNIST data point has two parts: an image of a handwritten digit and a corresponding label. We’ll call the images “x” and the labels “y”. Both the training set and test set contain images and their corresponding labels; for example the training images are mnist.train.images and the training labels are mnist.train.labels.
如前所示,每一組MNIST資料都有兩個部分,一張手寫圖片和相對應的標籤。我們將影象標為x,標籤標為y。訓練集和測試集都是包含了影象和對應標籤。例如,訓練體系是mnist.train.images,訓練標籤是mnist.train.labels。

Each image is 28 pixels by 28 pixels. We can interpret this as a big array of numbers:
每一個圖片都是28x28,我們可以將這些表示為一串數字(即陣列)。
這裡寫圖片描述

We can flatten this array into a vector of 28x28 = 784 numbers. It doesn’t matter how we flatten the array, as long as we’re consistent between images. From this perspective, the MNIST images are just a bunch of points in a 784-dimensional vector space, with a very rich structure (warning: computationally intensive visualizations).
我們可以將個數組變為28*28=784的向量,只要我們能在影象之間保持一致,我們如何對映陣列並不重要。從這個角度來看,MNIST資料集的圖片就是在784維向量空間裡面的點,並且擁有比較複雜的結構(警告:此類資料的視覺化是計算密集型的)

Flattening the data throws away information about the 2D structure of the image. Isn’t that bad? Well, the best computer vision methods do exploit this structure, and we will in later tutorials. But the simple method we will be using here, a softmax regression (defined below), won’t.
扁平圖片的數字陣列會丟失圖片的2D結構資訊,這顯然是不理想的,最優秀的計算機視覺方法會挖掘這些結構資訊,我們會在後續的教程中介紹。但我們在這會使用一些簡單思想,softmax迴歸不會利用這些結構資訊。

The result is that mnist.train.images is a tensor (an n-dimensional array) with a shape of [55000, 784]. The first dimension is an index into the list of images and the second dimension is the index for each pixel in each image. Each entry in the tensor is a pixel intensity between 0 and 1, for a particular pixel in a particular image.
結果是mnist.train.image是一個[55000,784]張量(一個n維陣列),第一維代表的是圖片序列的下標,第二維代表的是每張圖片每個畫素的下標。tensor的每一個都代表某張圖片裡的某個畫素的強度值,介於0和1之間。這裡寫圖片描述
Each image in MNIST has a corresponding label, a number between 0 and 9 representing the digit drawn in the image.
MNIST中的每一張圖片都有一個對應的標籤,數字介於0到9之間。

For the purposes of this tutorial, we’re going to want our labels as “one-hot vectors”. A one-hot vector is a vector which is 0 in most dimensions, and 1 in a single dimension. In this case, the nth digit will be represented as a vector which is 1 in the nth dimension. For example, 3 would be [0,0,0,1,0,0,0,0,0,0]. Consequently, mnist.train.labels is a [55000, 10] array of floats.
為了這個教程,我們使用的標籤是”one-hot vectors”。一個one-hot vector是一個向量,這個系列由絕大多數的0和唯一一個1構成。在這個例子中,數字n將表示成一個只有在第n維度(從0開始)數字為1的10維向量。比如,標籤3可以被表示為[0,0,0,1,0,0,0,0,0,0]。因此,mnist.train.labels是一組[55000,10]的float型別的數字。
這裡寫圖片描述
We’re now ready to actually make our model!
我們現在開始構建這模型!

Softmax Regressions:
We know that every image in MNIST is of a handwritten digit between zero and nine. So there are only ten possible things that a given image can be. We want to be able to look at an image and give the probabilities for it being each digit. For example, our model might look at a picture of a nine and be 80% sure it’s a nine, but give a 5% chance to it being an eight (because of the top loop) and a bit of probability to all the others because it isn’t 100% sure.
我們知道MNIST中的每一幅圖片都是一組0~9的手寫數字。因此給定圖片只有10個。我們希望可以得到給定圖片代表每個數字的概率。例如,我們的模型可能推測一張包含9的圖片代表數字9的概率是80%,但是判斷它是8的概率是5%(因為8和9都有上半部分的小圓),然後給予它代表其他數字的概率更小的值。

This is a classic case where a softmax regression is a natural, simple model. If you want to assign probabilities to an object being one of several different things, softmax is the thing to do, because softmax gives us a list of values between 0 and 1 that add up to 1. Even later on, when we train more sophisticated models, the final step will be a layer of softmax.
這是一個使用softmax迴歸(softmax regression)模型的經典案例。如果你要為不同的事物分配不同的可能性,Softmax可以提供這樣的能力,因為softmax給我們0~1的值,並且相加後值為1。甚至在此之後,當我們訓練更多複雜模型時,最後一步也需要用softmax來分配概率。

A softmax regression has two steps: first we add up the evidence of our input being in certain classes, and then we convert that evidence into probabilities.
softmax迴歸分為兩步:
第一步,我們將輸入的證據新增到某一個確切類,並且我們將這些證據轉換為概率。

To tally up the evidence that a given image is in a particular class, we do a weighted sum of the pixel intensities. The weight is negative if that pixel having a high intensity is evidence against the image being in that class, and positive if it is evidence in favor.
為了計算給定圖片是否是某類的證據,我們對圖片畫素值進行加權求和。如果這畫素具有很強的證據說明這張圖片不屬於該類,那麼相應的權值就為負數,反之,如果這畫素擁有有利的證據支援這張圖片不屬於這個類,那麼權值為正數。

The following diagram shows the weights one model learned for each of these classes. Red represents negative weights, while blue represents positive weights.
下面的圖片顯示了一個模型學習到圖片上每個畫素對於特定數字類的權重。紅色代表負數權值,藍色代表正數權值。
這裡寫圖片描述

We also add some extra evidence called a bias. Basically, we want to be able to say that some things are more likely independent of the input. The result is that the evidence for a class i given an input x is:
我們也新增一些叫做偏差的證據,因為輸入往往有一些無關的干擾量。因此對於給定的輸入圖片x它代表的是數字i的證據可以表示的。
這裡寫圖片描述
where Wi is the weights and bi is the bias for class i, and j is an index for summing over the pixels in our input image x. We then convert the evidence tallies into our predicted probabilities y using the “softmax” function:

這裡Wi表示的是權重,bi表示的是類i的偏差,j
表示的是我們輸入影象x中的畫素索引用於畫素求和。通過使用softmax函式我們可以將這些證據轉換稱我們預測的概率y。
這裡寫圖片描述

Here softmax is serving as an “activation” or “link” function, shaping the output of our linear function into the form we want – in this case, a probability distribution over 10 cases. You can think of it as converting tallies of evidence into probabilities of our input being in each class. It’s defined as:
這裡的softmax可以看成是一個激勵(activation)函式或者連結(link)函式,把我們定義的線性函式輸出轉換成我們想要的格式,在這例子中也就是10個數字類的概率分佈。因此你可以認為,給定一張圖片,你可以把它看作是將證據的一致性轉化為每個類中輸入的概率,它被定義成如下:
這裡寫圖片描述
展開等式右邊的子式可以得到:
這裡寫圖片描述

But it’s often more helpful to think of softmax the first way: exponentiating its inputs and then normalizing them. The exponentiation means that one more unit of evidence increases the weight given to any hypothesis multiplicatively. And conversely, having one less unit of evidence means that a hypothesis gets a fraction of its earlier weight. No hypothesis ever has zero or negative weight. Softmax then normalizes these weights, so that they add up to one, forming a valid probability distribution. (To get more intuition about the softmax function, check out the section on it in Michael Nielsen’s book, complete with an interactive visualization.)
但把softmax定義成的一種則更有益處:將輸入進行冪運算,然後再正則化這些結果。這冪運算意味更大的證據對應更大的假設模型裡面的乘數權重值。反之,擁有更少的證據因為著假設模型裡面擁有更小的乘數係數。假設模型裡面的權值不可以是0值或者是負值。接下來softmax會正則化它們的權重,使它們的總和為1,以此構造一個有效的概率分佈。(更多的關於Softmax函式的資訊,可以參考Michael Nieslen的書裡面的這個部分,其中有關於softmax的可互動式的視覺化解釋。http://neuralnetworksanddeeplearning.com/chap3.html#softmax

You can picture our softmax regression as looking something like the following, although with a lot more xs. For each output, we compute a weighted sum of the xs, add a bias, and then apply softmax.
對於softmax迴歸模型可以用下圖解釋,對於輸入的xs加權求和,再分別加上一個偏置量,最後再輸入到softmax函式中:
這裡寫圖片描述

If we write that out as equations, we get:
如果我們寫出來這些等式,我們會得到:
這裡寫圖片描述
We can “vectorize” this procedure, turning it into a matrix multiplication and vector addition. This is helpful for computational efficiency. (It’s also a useful way to think.)
我們可以向量化這個過程,將此轉化為矩陣相乘和相加的形式,這使得計算更加高效。
這裡寫圖片描述
More compactly, we can just write:
更加緊湊的,我們也可以寫成如下形式:
這裡寫圖片描述

Implementing the Regression:
To do efficient numerical computing in Python, we typically use libraries like NumPy that do expensive operations such as matrix multiplication outside Python, using highly efficient code implemented in another language. Unfortunately, there can still be a lot of overhead from switching back to Python every operation. This overhead is especially bad if you want to run computations on GPUs or in a distributed manner, where there can be a high cost to transferring data.
為了用python實現高效的數值計算,我們通常會使用函式庫,比如NumPy,會把類似矩陣乘法這樣的複雜運算使用其他外部語言實現。不幸的是,從外部計算切換回Python的每一個操作,仍然是一個很大的開銷。如果你用GPU來進行外部計算,這樣的開銷會更大。用分散式的計算方式,也會花費更多的資源用來傳輸資料。、

TensorFlow also does its heavy lifting outside Python, but it takes things a step further to avoid this overhead. Instead of running a single expensive operation independently from Python, TensorFlow lets us describe a graph of interacting operations that run entirely outside Python. (Approaches like this can be seen in a few machine learning libraries.)
TensorFlow也把複雜的計算放在python之外完成,但是為了避免前面說的那些開銷,它做了進一步完善。Tensorflow不單獨地執行單一的複雜計算,而是讓我們可以先用圖描述一系列可互動的計算操作,然後全部一起在Python之外執行。(這樣類似的執行方式,可以在不少的機器學習庫中看到。)

To use TensorFlow, first we need to import it.
使用TensorFlow之前,首先匯入它:

import tensorflow as tf

We describe these interacting operations by manipulating symbolic variables. Let’s create one:
我們通過操作符號變數來描述這些可互動的操作單元,可以用下面的方式建立一個

x = tf.placeholder("float",[None,784])

x isn’t a specific value. It’s a placeholder, a value that we’ll input when we ask TensorFlow to run a computation. We want to be able to input any number of MNIST images, each flattened into a 784-dimensional vector. We represent this as a 2-D tensor of floating-point numbers, with a shape [None, 784]. (Here None means that a dimension can be of any length.)
x不是一個特定的值,而是一個佔位符placeholder,我們在TensorFlow執行計算時輸入這個值。我們希望能夠輸入任意數量的MNIST影象,每一張圖展平成784維的向量。我們用2維的浮點數張量來表示這些圖,這個張量的形狀是[None,784 ]。(這裡的None表示此張量的第一個維度可以是任何長度的。)

We also need the weights and biases for our model. We could imagine treating these like additional inputs, but TensorFlow has an even better way to handle it: Variable. A Variable is a modifiable tensor that lives in TensorFlow’s graph of interacting operations. It can be used and even modified by the computation. For machine learning applications, one generally has the model parameters be Variables.
我們的模型也需要權重值和偏置量,當然我們可以把它們當做是另外的輸入(使用佔位符),但TensorFlow有一個更好的方法來表示它們:Variable 。 一個Variable代表一個可修改的張量,存在在TensorFlow的用於描述互動性操作的圖中。它們可以用於計算輸入值,也可以在計算中被修改。對於各種機器學習應用,一般都會有模型引數,可以用Variable表示。

W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))

We create these Variables by giving tf.Variable the initial value of the Variable: in this case, we initialize both W and b as tensors full of zeros. Since we are going to learn W and b, it doesn’t matter very much what they initially are.
我們賦予tf.Variable不同的初值來建立不同的Variable:在這裡,我們都用全為零的張量來初始化W和b。因為我們要學習W和b的值,它們的初值可以隨意設定。

Notice that W has a shape of [784, 10] because we want to multiply the 784-dimensional image vectors by it to produce 10-dimensional vectors of evidence for the difference classes. b has a shape of [10] so we can add it to the output.
注意,W的維度是[784,10],因為我們想要用784維的圖片向量乘以它以得到一個10維的證據值向量,每一位對應不同數字類。b的形狀是[10],所以我們可以直接把它加到輸出上面。

We can now implement our model. It only takes one line to define it!
現在,我們可以實現我們的模型啦。只需要一行程式碼!

y = tf.nn.softmax(tf.matmul(x,W)+b)

First, we multiply x by W with the expression tf.matmul(x, W). This is flipped from when we multiplied them in our equation, where we had Wx, as a small trick to deal with x being a 2D tensor with multiple inputs. We then add b, and finally apply tf.nn.softmax.
首先,我們用tf.matmul(​​X,W)表示x乘以W,對應之前等式裡面的,這裡Wx是一個2維張量擁有多個輸入。然後再加上b,把和輸入到tf.nn.softmax函式裡面。

That’s it. It only took us one line to define our model, after a couple short lines of setup. That isn’t because TensorFlow is designed to make a softmax regression particularly easy: it’s just a very flexible way to describe many kinds of numerical computations, from machine learning models to physics simulations. And once defined, our model can be run on different devices: your computer’s CPU, GPUs, and even phones!
至此,我們先用了幾行簡短的程式碼來設定變數,然後只用了一行程式碼來定義我們的模型。TensorFlow不僅僅可以使softmax迴歸模型計算變得特別簡單,它也用這種非常靈活的方式來描述其他各種數值計算,從機器學習模型對物理學模擬模擬模型。一旦被定義好之後,我們的模型就可以在不同的裝置上執行:計算機的CPU,GPU,甚至是手機!

Training:
In order to train our model, we need to define what it means for the model to be good. Well, actually, in machine learning we typically define what it means for a model to be bad. We call this the cost, or the loss, and it represents how far off our model is from our desired outcome. We try to minimize that error, and the smaller the error margin, the better our model is.
為了訓練我們的模型,我們首先需要定義一個指標來評估這個模型是好的。其實,在機器學習,我們通常定義指標來表示一個模型是壞的,這個指標稱為成本(cost)或損失(loss),然後儘量最小化這個差值(error)。這個差值越小,我們的模型越好。

One very common, very nice function to determine the loss of a model is called “cross-entropy.” Cross-entropy arises from thinking about information compressing codes in information theory but it winds up being an important idea in lots of areas, from gambling to machine learning. It’s defined as:
一個非常常見的,非常漂亮的成本函式是“交叉熵”(cross-entropy)。交叉熵產生於資訊理論裡面的資訊壓縮編碼技術,但是它後來演變成為從博弈論到機器學習等其他領域裡的重要技術手段。它的定義如下:
這裡寫圖片描述

Where y is our predicted probability distribution, and y′ is the true distribution (the one-hot vector with the digit labels). In some rough sense, the cross-entropy is measuring how inefficient our predictions are for describing the truth. Going into more detail about cross-entropy is beyond the scope of this tutorial, but it’s well worth understanding.
y 是我們預測的概率分佈, y’ 是實際的分佈(我們輸入的one-hot vector)。比較粗糙的理解是,交叉熵是用來衡量我們的預測用於描述真相的低效性。更詳細的關於交叉熵的解釋超出本教程的範疇,但是你很有必要好好理解它(http://colah.github.io/posts/2015-09-Visual-Information/)。

To implement cross-entropy we need to first add a new placeholder to input the correct answers:
為了計算交叉熵,我們首先需要新增一個新的佔位符用於輸入正確值:

y_ = tf.placeholder(tf.float32,[None,10])

Then we can implement the cross-entropy function, 這裡寫圖片描述
然後我們可以用這裡寫圖片描述計算交叉熵

cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_*tf.log(y),reduction_indices=[1]))

First, tf.log computes the logarithm of each element of y. Next, we multiply each element of y_ with the corresponding element of tf.log(y). Then tf.reduce_sum adds the elements in the second dimension of y, due to the reduction_indices=[1] parameter. Finally, tf.reduce_mean computes the mean over all the examples in the batch.
首先,用 tf.log 計算 y 的每個元素的對數。接下來,我們把 y_ 的每一個元素和 tf.log(y) 的對應元素相乘。最後,由於其reduction_indices = [1]因此 將tf.reduce_sum與y的第二維元素相加。最後,tf.reduce_mean將會計算在批處理中所有例項的平均值。

#This is something about what I understand of reduce_sum
import tensorflow as tf
x = [[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]]]
sess = tf.Session()
print("0:")
print(sess.run(tf.reduce_sum(x,reduction_indices=0)))
print("1:")
print(sess.run(tf.reduce_sum(x,reduction_indices=1)))
print("2:")
print(sess.run(tf.reduce_sum(x,reduction_indices=2)))
sess.close()
=>
#0:
#[[ 8 10 12]
# [14 16 18]]
#1:
#[[ 5  7  9]
# [17 19 21]]
#2:
#[[ 6 15]
# [24 33]]
#

所謂維度按照理解是這樣的:
第一維(也就是reduction_indices=0的時候):【[[],[]],[[],[]]】這裡的粗體就是第一維
第二維(也就是reduction_indices=1的時候):
[【[],[]】,【[],[]】]
第三維(也就是reduction_indices=2的時候):
[[【】,【】],[【】,【】]]
所以
當reduction_indices=0時,我們將在0為的內部進行處理
【[[1,2,3],[4,5,6]]
[[7,8,9],[10,11,12]]】
第一維有一層,第二維有兩層,第三維有四層,我們將第二維的兩層相對加起來,也就是
1+7=8 2+8=10 3+9=12
4+10=14 5+11=16 6+12=18
當reduction_indices=1時,我們將在1維的內部進行處理
[【[1,2,3],[4,5,6]】
【[7,8,9],[10,11,12]】]
第三維的四層對應加起來,也就是
1+4=5 2+5=7 3+6=9
7+10=17 8+11=19 9+12=21
當reduction_indices=2時,我們將在2為的內部進行處理
[[【1,2,3】,【4,5,6】]
[【7,8,9】,【10,11,12】]]
三層內部,逗號之間的數字加起來,也就是
1+2+3=6 4+5+6=15
7+8+9=24 10+11+12=33

(Note that in the source code, we don’t use this formulation, because it is numerically unstable. Instead, we apply tf.nn.softmax_cross_entropy_with_logits on the unnormalized logits (e.g., we call softmax_cross_entropy_with_logits on tf.matmul(x, W) + b), because this more numerically stable function internally computes the softmax activation. In your code, consider using tf.nn.(sparse_)softmax_cross_entropy_with_logits instead).
注意在原始碼中,我們並不使用這個公式,因為它的數值是不穩定的,相反我們採用tf.nn.softmax_cross_entropy_with_logits作用在非標準的log上(例如,我們將softmax_cross_entropy_with_logits應用在tf.matmul(x,W)+b中,因為這個數值穩定的內部功能,計算softmax啟用。在你的程式碼,可以考慮使用 tf.nn.(sparse_)softmax_cross_entropy_with_logits)

Now that we know what we want our model to do, it’s very easy to have TensorFlow train it to do so. Because TensorFlow knows the entire graph of your computations, it can automatically use the backpropagation algorithm to efficiently determine how your variables affect the loss you ask it to minimize. Then it can apply your choice of optimization algorithm to modify the variables and reduce the loss.
現在我們知道我們需要我們的模型做什麼啦,用TensorFlow來訓練它是非常容易的。因為TensorFlow擁有一張描述你各個計算單元的圖,它可以自動地使用反向傳播演算法(backpropagation algorithm:http://colah.github.io/posts/2015-08-Backprop/)來有效地確定你的變數是如何影響你想要最小化的那個成本值的。然後,TensorFlow會用你選擇的優化演算法來不斷地修改變數以降低成本。

train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

In this case, we ask TensorFlow to minimize cross_entropy using the gradient descent algorithm with a learning rate of 0.5. Gradient descent is a simple procedure, where TensorFlow simply shifts each variable a little bit in the direction that reduces the cost. But TensorFlow also provides many other optimization algorithms: using one is as simple as tweaking one line.
在這裡,我們要求TensorFlow用梯度下降演算法(gradient descent algorithm:https://en.wikipedia.org/wiki/Gradient_descent)以0.5的學習速率最小化交叉熵。梯度下降演算法(gradient descent algorithm)是一個簡單的學習過程,TensorFlow只需將每個變數一點點地往使成本不斷降低的方向移動。當然TensorFlow也提供了其他許多優化演算法(https://www.tensorflow.org/versions/r0.12/api_docs/python/train.html#optimizers):只要簡單地調整一行程式碼就可以使用其他的演算法。

What TensorFlow actually does here, behind the scenes, is to add new operations to your graph which implement backpropagation and gradient descent. Then it gives you back a single operation which, when run, does a step of gradient descent training, slightly tweaking your variables to reduce the loss.
TensorFlow在這裡實際上所做的是,它會在後臺給描述你的計算的那張圖裡面增加一系列新的計算操作單元用於實現反向傳播演算法和梯度下降演算法。然後,它返回給你的只是一個單一的操作,當執行這個操作時,它用梯度下降演算法訓練你的模型,微調你的變數,不斷減少成本。

Now we have our model set up to train. One last thing before we launch it, we have to create an operation to initialize the variables we created. Note that this defines the operation but does not run it yet:
現在,我們已經設定好了我們的模型。在執行計算之前,我們需要新增一個操作來初始化我們建立的變數:

init = tf.initialize_all_variables()

We can now launch the model in a Session, and now we run the operation that initializes the variables:
現在我們可以在一個Session裡面啟動我們的模型,並且初始化變數:

sess = tf.Session()
sess.run(init)

Let’s train – we’ll run the training step 1000 times!
然後開始訓練模型,這裡我們讓模型迴圈訓練1000次!

for i in range(1000):
    batch_xs,bacth_ys = mninst.train.next_batch(100)
    sess.run(train_step,feed_dict={x:batch_xs,y:batch_ys})

Each step of the loop, we get a “batch” of one hundred random data points from our training set. We run train_step feeding in the batches data to replace the placeholders.
該迴圈的每個步驟中,我們都會隨機抓取訓練資料中的100個批處理資料點,然後我們用這些資料點作為引數替換之前的佔位符來執行train_step。

Using small batches of random data is called stochastic training – in this case, stochastic gradient descent. Ideally, we’d like to use all our data for every step of training because that would give us a better sense of what we should be doing, but that’s expensive. So, instead, we use a different subset every time. Doing this is cheap and has much of the same benefit.
使用一小部分的隨機資料來進行訓練被稱為隨機訓練(stochastic training)- 在這裡更確切的說是隨機梯度下降訓練。在理想情況下,我們希望用我們所有的資料來進行每一步的訓練,因為這能給我們更好的訓練結果,但顯然這需要很大的計算開銷。所以,每一次訓練我們可以使用不同的資料子集,這樣做既可以減少計算開銷,又可以最大化地學習到資料集的總體特性。

Evaluating Our Model:

Well, first let’s figure out where we predicted the correct label. tf.argmax is an extremely useful function which gives you the index of the highest entry in a tensor along some axis. For example, tf.argmax(y,1) is the label our model thinks is most likely for each input, while tf.argmax(y_,1) is the correct label. We can use tf.equal to check if our prediction matches the truth.
首先指出我們估計的相對應的標籤值,tr.argmax是一個非常有用的還是,它會返回給你多個維度中的tensor最大值。例如tf.argmax(y,1)返回是我們模型認為的每個輸入的最有可能的標籤。而tf.argmax(y_,1)是正確的標籤。我們可以使用tf.equal去判斷我們的預期是否正確

correct_prediction = tf.equal(tf.argmax(y,1),tf.argmax(y_,1))

That gives us a list of booleans. To determine what fraction are correct, we cast to floating point numbers and then take the mean. For example, [True, False, True, True] would become [1,0,1,1] which would become 0.75.
這行程式碼會給我們一組布林值。為了確定正確預測項的比例,我們可以把布林值轉換成浮點數,然後取平均值。例如,[True, False, True, True] 會變成 [1,0,1,1] ,取平均值後得到 0.75

accuray = tf.reduce_mean(tf.cast(correct_prediction,tf.float32))

Finally, we ask for our accuracy on our test data.
最後,我們計算所學習到的模型在測試資料集上面的正確率。

print(sess.run(accuracy,feed_dict={x:mnist.test.images,y_:mnist.test.labels}))

This should be about 92%.

Is that good? Well, not really. In fact, it’s pretty bad. This is because we’re using a very simple model. With some small changes, we can get to 97%. The best models can get to over 99.7% accuracy! (For more information, have a look at this list of results.)
個結果好嗎?嗯,並不太好。事實上,這個結果是很差的。這是因為我們僅僅使用了一個非常簡單的模型。不過,做一些小小的改進,我們就可以得到97%的正確率。最好的模型甚至可以獲得超過99.7%的準確率!(想了解更多資訊,可以看看這個關於各種模型的效能對比列表。http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html)

What matters is that we learned from this model. Still, if you’re feeling a bit down about these results, check out the next tutorial where we do a lot better, and learn how to build more sophisticated models using TensorFlow!
比結果更重要的是,我們從這個模型中學習到的設計思想。不過,如果你仍然對這裡的結果有點失望,可以檢視下一個教程,在那裡你可以學習如何用FensorFlow構建更加複雜的模型以獲得更好的效能!
完整程式碼如下所示:

#coding=UTF-8
import data.input_data as input_data
import tensorflow as tf 
from tensorflow.contrib.metrics.python.metrics.classification import accuracy
from numpy import float32

mnist = input_data.read_data_sets("MNIST_data/",one_hot=True)
x = tf.placeholder(tf.float32,[None,784])
W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W)+b)
y_ = tf.placeholder(tf.float32, [None,10])
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=tf.log(y), labels=y_))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
sess = tf.InteractiveSession()
#Train
tf.global_variables_initializer().run()
for _ in range(1000):
    batch_x,batch_y = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x:batch_x,y_:batch_y})

correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuray = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print(sess.run(accuray,feed_dict={x:mnist.test.images,y_:mnist.test.labels}))