通過攝像頭捕獲影象用tensorflow做手寫數字識別

阿新 • • 發佈：2019-01-11

花了一晚上搞好了攝像頭捕獲影象做手寫數字識別，程式碼基於tensorflow的mnist程式碼實現，作為學習tensorflow的一個過程。

先在mnist資料集上訓練好網路，並儲存模型。

import numpy as np
import tensorflow as tf 
import tensorflow.examples.tutorials.mnist.input_data as input_data

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True) 
x_input = tf.placeholder(tf.float32, [None, 784])  
y_actual = tf.placeholder(tf.float32, shape=[None, 10]) 


def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)

def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)

def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool(x):
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],strides=[1, 2, 2, 1], padding='SAME')

#input -> conv -> pool -> conv -> pool -> fc -> dropout -> softmax
def network(x)
    x_image = tf.reshape(x, [-1,28,28,1]) #-1 means arbitrary
    W_conv1 = weight_variable([5, 5, 1, 32])
    b_conv1 = bias_variable([32])
    h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)      #conv1
    h_pool1 = max_pool(h_conv1)                                   #max_pool1

    W_conv2 = weight_variable([5, 5, 32, 64])
    b_conv2 = bias_variable([64])
    h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)      #conv2
    h_pool2 = max_pool(h_conv2)                                   #max_pool2

    W_fc1 = weight_variable([7 * 7 * 64, 1024])
    b_fc1 = bias_variable([1024])
    h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
    h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)    #fc1

    keep_prob = tf.placeholder("float")
    h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)                  #dropout

    W_fc2 = weight_variable([1024, 10])
    b_fc2 = bias_variable([10])
    y_predicts=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2) #fc2 output
	return y_predicts

y_predict=network(x_input)

#see http://www.tensorfly.cn/tfdoc/tutorials/mnist_pros.html
cross_entropy = -tf.reduce_sum(y_actual*tf.log(y_predict))
train_step = tf.train.GradientDescentOptimizer(1e-3).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_predict,1), tf.argmax(y_actual,1))    
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float")) 

saver = tf.train.Saver()
sess=tf.InteractiveSession()                          
sess.run(tf.initialize_all_variables())
for i in range(20000):  #iteration 20000 steps = (epochs * train_size) / batch_size ,epochs=21
  batch = mnist.train.next_batch(64) #batch_size=64
  if i%100 == 0:
    train_acc = accuracy.eval(feed_dict={x_input:batch[0], y_actual: batch[1], keep_prob: 1.0})
    print('step',i,'training accuracy',train_acc)
    train_step.run(feed_dict={x_input: batch[0], y_actual: batch[1], keep_prob: 0.5})
saver.save(sess, "model_save.ckpt") #save model
#test accuracy in mnist.test dataset
test_acc=accuracy.eval(feed_dict={x_input: mnist.test.images, y_actual: mnist.test.labels, keep_prob: 1.0})
print("test accuracy",test_acc)

預測時使用opencv來開啟攝像頭捕獲影象，設定ROI區域，將ROI區域影象輸入載入好引數的cnn網路來識別

import numpy as np
import tensorflow as tf
import cv2


def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)

def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)

def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool(x):
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],strides=[1, 2, 2, 1], padding='SAME')

def network(x)
    x_image = tf.reshape(x, [-1,28,28,1]) #-1 means arbitrary
    h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)  #conv1
    h_pool1 = max_pool(h_conv1)                               #max_pool1

    h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)  #conv2
    h_pool2 = max_pool(h_conv2)                               #max_pool2

    h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
    h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1) #fc1
    
    h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)               #dropout

    y_predict=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2) #fc2 output
	return y_predict
	
keep_prob = tf.placeholder("float")
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])
sess=tf.InteractiveSession()
saver = tf.train.Saver()
saver.restore(sess, "./model_save.ckpt") #load model file must have ./ with tensorflow1.0

cap = cv2.VideoCapture(1)
while(1):
    ret, frame = cap.read()
    cv2.rectangle(frame,(270,200),(340,270),(0,0,255),2)
    cv2.imshow("capture", frame)
    roiImg = frame[200:270,270:340]
    img = cv2.resize(roiImg,(28,28))
    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    np_img = img.astype(np.float32)
	
    netoutput = network(np_img)
    predictions = sess.run(netoutput,feed_dict={keep_prob: 0.5})

    predicts=predictions.tolist() #tensorflow output is numpy.ndarray like [[0 0 0 0]]
    label=predicts[0]
    result=label.index(max(label))
    print('result num:')
    print(result)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break
cap.release()
cv2.destroyAllWindows()

測試圖片

隨便找幾個數字測試，識別還挺準的。

通過攝像頭捕獲影象用tensorflow做手寫數字識別

花了一晚上搞好了攝像頭捕獲影象做手寫數字識別，程式碼基於tensorflow的mnist程式碼實現，作為學習tensorflow的一個過程。先在mnist資料集上訓練好網路，並儲存模型。 import numpy as np import tensorflow as tf

用 KNN 做手寫數字識別

用 KNN 做手寫數字識別目錄用 KNN 做手寫數字識別 1. KNN的原理 2. KNN實現手寫數字識別過程作為一個小白，寫此文章主要是為了自己記錄，方便回過頭來查詢！本文主要參考ApacheCN（專注於優秀專案維護的開源組織）中MachineL

教你用TensorFlow實現手寫數字識別

弱者用淚水安慰自己，強者用汗水磨練自己。這段時間因為專案中有一塊需要用到影象識別，最近就一直在煉丹，寶寶心裡苦，但是寶寶不說。。。能點開這篇文章的朋友估計也已經對TensorFlow有了一定了解，至少知道這是個什麼東西，我也就不過多介紹了。沒安裝TensorFlo

[TensorFlow深度學習入門]實戰十一·用雙向BiRNN(LSTM)做手寫數字識別準確率99%+

[TensorFlow深度學習入門]實戰十一·用雙向BiRNN(LSTM)做手寫數字識別準確率99%+ 此博文是我們在完成實戰五·用RNN(LSTM)做手寫數字識別的基礎上使用BiRNN(LSTM)結構，進一步提升模型的準確率，1000steps準確率達到99%。首先我們先

[TensorFlow深度學習入門]實戰五·用RNN(LSTM)做手寫數字識別準確率98%+

參考部落格地址，修復了一個小Bug，收斂速度和準確率都略微提升。使用此模型在Kaggle比賽準確率98%+ import os os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE" import time import tensorflow as tf im

新手上手Tensorflow之手寫數字識別應用（3）

本系列為應用TensorFlow實現手寫數字識別應用的全過程的程式碼實現及細節討論。按照實現流程，分為如下幾部分： 1. 模型訓練並儲存模型 2. 通過滑鼠輸入數字並儲存 2. 影象預處理 4. 讀入模型對輸入的圖片進行識別本文重點討論影象預處理的問題。所謂的影象預處理，

新手上手Tensorflow之手寫數字識別應用（2）

本系列為應用TensorFlow實現手寫數字識別應用的全過程的程式碼實現及細節討論。按照實現流程，分為如下幾部分： 1. 模型訓練並儲存模型 2. 通過滑鼠輸入數字並儲存 2. 影象預處理 4. 讀入模型對輸入的圖片進行識別本文重點討論模型的儲存以及讀入問題。關於Tens

新手上手Tensorflow之手寫數字識別應用（1）

學深度學習有一段時間了，各種演算法研究一通，什麼CNN啦，RNN啦，LSTM啦，RCNN啦，各種論文看了一堆。看沒看懂且不說（心虛。。），回來我想把訓練的模型看看實際效果的時候，才發現TensorFlow的好多基本功能還不會。好吧，還是拿著Mnist資料集搞一波手寫數字識別的全流程吧！涉

Python做手寫數字識別

最近在學neural networks and deeplearning這本書，也跟著做了一下實驗，這本書的地址是http://neuralnetworksanddeeplearning.com/chap1.html，當然網路上也有翻譯版的，可以下載看。由於剛開始學Python，難免會遇到很

使用sklearn做手寫數字識別模型：AdaBoostClassifier

1.載入資料集導包 import numpy as np import matplotlib.pyplot as plt from sklearn import datasets,cross_validation,ensemble def load_classifica

TensorFlow——MNIST手寫數字識別

import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data #載入資料集 mnist=input_data.read_data_sets('MNIST_data',one_hot=True) #

TensorFlow——Mnist手寫數字識別並可視化實戰教程（一）

要點：該教程為深度學習tensorflow框架mnist手寫數字識別。實戰教程分為（一）（二）（三）分別從tensorflow和MATLAB雙角度來實現。筆者資訊：Next_Legend Q

100天搞定機器學習|day39 Tensorflow Keras手寫數字識別

提示：建議先看day36-38的內容 TensorFlow™ 是一個採用資料流圖（data flow graphs），用於數值計算的開源軟體庫。節點（Nodes）在圖中表示數學操作，圖中的線（edges）則表示在節點間相互聯絡的多維資料陣列，即張量（tensor）。它靈活的架構讓你可以在多種平臺上展開計算，

python-積卷神經網路全面理解-tensorflow實現手寫數字識別

　　　　首先，關於神經網路，其實是一個結合很多知識點的一個演算法，關於cnn（積卷神經網路）大家需要了解：　　　　　　　　　　下面給出我之前總結的這兩個知識點（基於吳恩達的機器學習）　　　　　　　　　　代價函式：　　　　　　　　　　代價函式　　　　　　　　　　代價函式（Cost Function ）是

[TensorFlow深度學習入門]實戰六·用CNN做Kaggle比賽手寫數字識別準確率99%+

[TensorFlow深度學習入門]實戰六·用CNN做Kaggle比賽手寫數字識別準確率99%+ 參考部落格地址本部落格採用Lenet5實現，也包含TensorFlow模型引數儲存與載入參考我的博文，實用性比較好。在訓練集準確率99.85%，測試訓練集準確率99%+。訓練

通過Python呼叫QQAI做手寫OCR識別並匯出結果欄位到excel裡

有個需求：現場需要根據列印的表格手工填寫好內容，然後再在電腦上一個個錄入進去，費時費力，所以想是否可以通過程式把照片內需要的資料讀取出來並匯出到excel表格裡。網上找了一下教程，目前百度AI和QQAI都有OCR識別的能力開放平臺，看評論騰訊稍微好一點，所以選擇了QQAI（其實半斤八兩

Tensorflow - Tutorial (7) : 利用 RNN/LSTM 進行手寫數字識別

ddc htm net sets 手寫 n-2 align csdn global 1. 經常使用類 class tf.contrib.rnn.BasicLSTMCell BasicLSTMCell 是最簡單的一個LSTM類。沒有實現clippi

Tensorflow實踐 mnist手寫數字識別

model 損失函數兩層最簡 sin test http gif bat minst數據集　　　　tensorflow的文檔中就自帶了mnist手寫數字識別的例子，是一個很經典也比較簡單

tensorflow 基礎學習五：MNIST手寫數字識別

truncate averages val flow one die correct 表示 data MNIST數據集介紹： from tensorflow.examples.tutorials.mnist import input_data # 載入MNIST數據集，

第二節，TensorFlow 使用前饋神經網絡實現手寫數字識別

com net config return pyplot dataset 運行算法但是一感知器感知器學習筆記：https://blog.csdn.net/liyuanbhu/article/details/51622695 感知器（Percep

通過攝像頭捕獲影象用tensorflow做手寫數字識別

相關推薦