1. 程式人生 > >Hierarchical Attention Network for Document Classification--tensorflow實現篇

Hierarchical Attention Network for Document Classification--tensorflow實現篇

上週我們介紹了Hierarchical Attention Network for Document Classification這篇論文的模型架構,這周抽空用tensorflow實現了一下,接下來主要從程式碼的角度介紹如何實現用於文字分類的HAN模型。

資料集

首先介紹一下資料集,這篇論文中使用了幾個比較大的資料集,包括IMDB電影評分,yelp餐館評價等等。選定使用yelp2013之後,一開始找資料集的時候完全處於懵逼狀態,所有相關的論文和資料裡面出現的資料集下載連結都指向YELP官網,但是官網上怎麼都找不到相關資料的下載,然後就各種搜感覺都搜不到==然後就好不容易在github上面找到了,MDZZ,我這都是在寫什麼,絕對不是在湊字數,單純的吐槽資料不好找而已。連結如下:

https://github.com/rekiksab/Yelp/tree/master/yelp_challenge/yelp_phoenix_academic_dataset
這裡面好像不止一個數據集,還有user,business等其他幾個資料集,不過在這裡用不到罷了。先來看一下資料集的格式,如下,每一行是一個評論的文字,是json格式儲存的,主要有vote, user_id, review_id, stars, data, text, type, business_id幾項,針對本任務,只需要使用stars評分和text評論內容即可。這裡我選擇先將相關的資料儲存下來作為資料集。程式碼如下所示:

{"votes
": {"funny": 0, "useful": 5, "cool": 2}, "user_id": "rLtl8ZkDX5vH5nAx9C3q5Q", "review_id": "fWKvX83p0-ka4JS3dc6E5A", "stars": 5, "date": "2011-01-26", "text": "My wife took me here on my birthday for breakfast and it was excellent. The weather was perfect which made sitting outside overlooking their grounds an absolute pleasure. Our waitress was excellent and our food arrived quickly on the semi-busy Saturday morning. It looked like the place fills up pretty quickly so the earlier you get here the better.\n\nDo yourself a favor and get their Bloody Mary. It was phenomenal and simply the best I've ever had. I'm pretty sure they only use ingredients from their garden and blend them fresh when you order it. It was amazing.\n\nWhile EVERYTHING on the menu looks excellent, I had the white truffle scrambled eggs vegetable skillet and it was tasty and delicious. It came with 2 pieces of their griddled bread with was amazing and it absolutely made the meal complete. It was the best \"toast\" I've ever had.\n\nAnyway, I can't wait to go back!"
, "type": "review", "business_id": "9yKzy9PApeiPPOUJEtnvkg"}

資料集的預處理操作,這裡我做了一定的簡化,將每條評論資料都轉化為30*30的矩陣,其實可以不用這麼規劃,只需要將大於30的截斷即可,小魚30的不需要補全操作,只是後續需要給每個batch選定最大長度,然後獲取每個樣本大小,這部分我還沒有太搞清楚,等之後有時間再看一看,把這個功能加上就行了。先這樣湊合用==

#coding=utf-8
import json
import pickle
import nltk
from nltk.tokenize import WordPunctTokenizer
from collections import defaultdict

#使用nltk分詞分句器
sent_tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
word_tokenizer = WordPunctTokenizer()

#記錄每個單詞及其出現的頻率
word_freq = defaultdict(int)

# 讀取資料集,並進行分詞,統計每個單詞出現次數,儲存在word freq中
with open('yelp_academic_dataset_review.json', 'rb') as f:
    for line in f:
        review = json.loads(line)
        words = word_tokenizer.tokenize(review['text'])
        for word in words:
            word_freq[word] += 1

    print "load finished"

# 將詞頻表儲存下來
with open('word_freq.pickle', 'wb') as g:
    pickle.dump(word_freq, g)
    print len(word_freq)#159654
    print "word_freq save finished"

num_classes = 5
# 將詞頻排序,並去掉出現次數最小的3個
sort_words = list(sorted(word_freq.items(), key=lambda x:-x[1]))
print sort_words[:10], sort_words[-10:]

#構建vocablary,並將出現次數小於5的單詞全部去除,視為UNKNOW
vocab = {}
i = 1
vocab['UNKNOW_TOKEN'] = 0
for word, freq in word_freq.items():
    if freq > 5:
        vocab[word] = i
        i += 1
print i
UNKNOWN = 0

data_x = []
data_y = []
max_sent_in_doc = 30
max_word_in_sent = 30

#將所有的評論檔案都轉化為30*30的索引矩陣,也就是每篇都有30個句子,每個句子有30個單詞
# 不夠的補零,多餘的刪除,並儲存到最終的資料集檔案之中
with open('yelp_academic_dataset_review.json', 'rb') as f:
    for line in f:
        doc = []
        review = json.loads(line)
        sents = sent_tokenizer.tokenize(review['text'])
        for i, sent in enumerate(sents):
            if i < max_sent_in_doc:
                word_to_index = []
                for j, word in enumerate(word_tokenizer.tokenize(sent)):
                    if j < max_word_in_sent:
                            word_to_index.append(vocab.get(word, UNKNOWN))
                doc.append(word_to_index)

        label = int(review['stars'])
        labels = [0] * num_classes
        labels[label-1] = 1
        data_y.append(labels)
        data_x.append(doc)
    pickle.dump((data_x, data_y), open('yelp_data', 'wb'))
    print len(data_x) #229907
    # length = len(data_x)
    # train_x, dev_x = data_x[:int(length*0.9)], data_x[int(length*0.9)+1 :]
    # train_y, dev_y = data_y[:int(length*0.9)], data_y[int(length*0.9)+1 :]

在將資料預處理之後,我們就得到了一共229907篇文件,每篇都是30*30 的單詞索引矩陣,這樣在後續進行讀取的時候直接根據嵌入矩陣E就可以將單詞轉化為詞向量了。也就省去了很多麻煩。這樣,我們還需要一個數據的讀取的函式,將儲存好的資料載入記憶體,其實很簡單,就是一個pickle讀取函式而已,然後將資料集按照9:1的比例分成訓練集和測試集。其實這裡我覺得9:1會使驗證集樣本過多(20000個),但是論文中就是這麼操作的==暫且不管這個小細節,就按論文裡面的設定做吧。程式碼如下所示:

def read_dataset():
    with open('yelp_data', 'rb') as f:
        data_x, data_y = pickle.load(f)
        length = len(data_x)
        train_x, dev_x = data_x[:int(length*0.9)], data_x[int(length*0.9)+1 :]
        train_y, dev_y = data_y[:int(length*0.9)], data_y[int(length*0.9)+1 :]
        return train_x, train_y, dev_x, dev_y

有了這個函式,我們就可以在訓練時一鍵讀入資料集了。接下來我們看一下模型架構的實現部分。

模型實現

按照上篇部落格中關於模型架構的介紹,結合下面兩張圖進行理解,我們應該很容易的得出模型的框架主要分為句子層面,文件層面兩部分,然後每個內部有包含encoder和attention兩部分。
這裡寫圖片描述
這裡寫圖片描述
程式碼部分如下所示,主要是用tf.nn.bidirectional_dynamic_rnn()函式實現雙向GRU的構造,然後Attention層就是一個MLP+softmax機制,yehe你容易理解。

#coding=utf8

import tensorflow as tf
from tensorflow.contrib import rnn
from tensorflow.contrib import layers

def length(sequences):
#返回一個序列中每個元素的長度
    used = tf.sign(tf.reduce_max(tf.abs(sequences), reduction_indices=2))
    seq_len = tf.reduce_sum(used, reduction_indices=1)
    return tf.cast(seq_len, tf.int32)

class HAN():

    def __init__(self, vocab_size, num_classes, embedding_size=200, hidden_size=50):

        self.vocab_size = vocab_size
        self.num_classes = num_classes
        self.embedding_size = embedding_size
        self.hidden_size = hidden_size

        with tf.name_scope('placeholder'):
            self.max_sentence_num = tf.placeholder(tf.int32, name='max_sentence_num')
            self.max_sentence_length = tf.placeholder(tf.int32, name='max_sentence_length')
            self.batch_size = tf.placeholder(tf.int32, name='batch_size')
            #x的shape為[batch_size, 句子數, 句子長度(單詞個數)],但是每個樣本的資料都不一樣,,所以這裡指定為空
            #y的shape為[batch_size, num_classes]
            self.input_x = tf.placeholder(tf.int32, [None, None, None], name='input_x')
            self.input_y = tf.placeholder(tf.float32, [None, num_classes], name='input_y')

        #構建模型
        word_embedded = self.word2vec()
        sent_vec = self.sent2vec(word_embedded)
        doc_vec = self.doc2vec(sent_vec)
        out = self.classifer(doc_vec)

        self.out = out


    def word2vec(self):
        #嵌入層
        with tf.name_scope("embedding"):
            embedding_mat = tf.Variable(tf.truncated_normal((self.vocab_size, self.embedding_size)))
            #shape為[batch_size, sent_in_doc, word_in_sent, embedding_size]
            word_embedded = tf.nn.embedding_lookup(embedding_mat, self.input_x)
        return word_embedded

    def sent2vec(self, word_embedded):
        with tf.name_scope("sent2vec"):
            #GRU的輸入tensor是[batch_size, max_time, ...].在構造句子向量時max_time應該是每個句子的長度,所以這裡將
            #batch_size * sent_in_doc當做是batch_size.這樣一來,每個GRU的cell處理的都是一個單詞的詞向量
            #並最終將一句話中的所有單詞的詞向量融合(Attention)在一起形成句子向量

            #shape為[batch_size*sent_in_doc, word_in_sent, embedding_size]
            word_embedded = tf.reshape(word_embedded, [-1, self.max_sentence_length, self.embedding_size])
            #shape為[batch_size*sent_in_doce, word_in_sent, hidden_size*2]
            word_encoded = self.BidirectionalGRUEncoder(word_embedded, name='word_encoder')
            #shape為[batch_size*sent_in_doc, hidden_size*2]
            sent_vec = self.AttentionLayer(word_encoded, name='word_attention')
            return sent_vec

    def doc2vec(self, sent_vec):
        #原理與sent2vec一樣,根據文件中所有句子的向量構成一個文件向量
        with tf.name_scope("doc2vec"):
            sent_vec = tf.reshape(sent_vec, [-1, self.max_sentence_num, self.hidden_size*2])
            #shape為[batch_size, sent_in_doc, hidden_size*2]
            doc_encoded = self.BidirectionalGRUEncoder(sent_vec, name='sent_encoder')
            #shape為[batch_szie, hidden_szie*2]
            doc_vec = self.AttentionLayer(doc_encoded, name='sent_attention')
            return doc_vec

    def classifer(self, doc_vec):
        #最終的輸出層,是一個全連線層
        with tf.name_scope('doc_classification'):
            out = layers.fully_connected(inputs=doc_vec, num_outputs=self.num_classes, activation_fn=None)
            return out

    def BidirectionalGRUEncoder(self, inputs, name):
        #雙向GRU的編碼層,將一句話中的所有單詞或者一個文件中的所有句子向量進行編碼得到一個 2×hidden_size的輸出向量,然後在經過Attention層,將所有的單詞或句子的輸出向量加權得到一個最終的句子/文件向量。
        #輸入inputs的shape是[batch_size, max_time, voc_size]
        with tf.variable_scope(name):
            GRU_cell_fw = rnn.GRUCell(self.hidden_size)
            GRU_cell_bw = rnn.GRUCell(self.hidden_size)
            #fw_outputs和bw_outputs的size都是[batch_size, max_time, hidden_size]
            ((fw_outputs, bw_outputs), (_, _)) = tf.nn.bidirectional_dynamic_rnn(cell_fw=GRU_cell_fw,
                                                                                 cell_bw=GRU_cell_bw,
                                                                                 inputs=inputs,
                                                                                 sequence_length=length(inputs),
                                                                                 dtype=tf.float32)
            #outputs的size是[batch_size, max_time, hidden_size*2]
            outputs = tf.concat((fw_outputs, bw_outputs), 2)
            return outputs

    def AttentionLayer(self, inputs, name):
        #inputs是GRU的輸出,size是[batch_size, max_time, encoder_size(hidden_size * 2)]
        with tf.variable_scope(name):
            # u_context是上下文的重要性向量,用於區分不同單詞/句子對於句子/文件的重要程度,
            # 因為使用雙向GRU,所以其長度為2×hidden_szie
            u_context = tf.Variable(tf.truncated_normal([self.hidden_size * 2]), name='u_context')
            #使用一個全連線層編碼GRU的輸出的到期隱層表示,輸出u的size是[batch_size, max_time, hidden_size * 2]
            h = layers.fully_connected(inputs, self.hidden_size * 2, activation_fn=tf.nn.tanh)
            #shape為[batch_size, max_time, 1]
            alpha = tf.nn.softmax(tf.reduce_sum(tf.multiply(h, u_context), axis=2, keep_dims=True), dim=1)
            #reduce_sum之前shape為[batch_szie, max_time, hidden_szie*2],之後shape為[batch_size, hidden_size*2]
            atten_output = tf.reduce_sum(tf.multiply(inputs, alpha), axis=1)
            return atten_output

以上就是主要的模型架構部分,其實思路也是很簡單的,主要目的是熟悉一下其中一些操作的使用方法。接下來就是模型的訓練部分了。

模型訓練

其實這部分裡的資料讀入部分我一開始打算使用上次部落格中提到的TFRecords來做,但是實際用的時候發現貌似還有點不熟悉,嘗試了好幾次都有點小錯誤,雖然之前已經把別人的程式碼都看明白了,但是真正到自己寫的時候還是存在一定的難度,還要抽空在學習學習==所以在最後還是回到了以前的老方法,分批次讀入,恩,最起碼簡單易懂23333.。。。

由於這部分大都是重複性的程式碼,所以不再進行詳細贅述,不懂的可以去看看我前面幾篇部落格裡面關於模型訓練部分程式碼的介紹。

這裡重點說一下,關於梯度訓練部分的梯度截斷,由於RNN模型在訓練過程中往往會出現梯度爆炸和梯度彌散等現象,所以在訓練RNN模型時,往往會使用梯度截斷的技術來防止梯度過大而引起無法正確求到的現象。然後就基本上都是使用的dennizy大神的CNN程式碼中的程式了。

#coding=utf-8
import tensorflow as tf
import model
import time
import os
from load_data import read_dataset, batch_iter


# Data loading params
tf.flags.DEFINE_string("data_dir", "data/data.dat", "data directory")
tf.flags.DEFINE_integer("vocab_size", 46960, "vocabulary size")
tf.flags.DEFINE_integer("num_classes", 5, "number of classes")
tf.flags.DEFINE_integer("embedding_size", 200, "Dimensionality of character embedding (default: 200)")
tf.flags.DEFINE_integer("hidden_size", 50, "Dimensionality of GRU hidden layer (default: 50)")
tf.flags.DEFINE_integer("batch_size", 32, "Batch Size (default: 64)")
tf.flags.DEFINE_integer("num_epochs", 10, "Number of training epochs (default: 50)")
tf.flags.DEFINE_integer("checkpoint_every", 100, "Save model after this many steps (default: 100)")
tf.flags.DEFINE_integer("num_checkpoints", 5, "Number of checkpoints to store (default: 5)")
tf.flags.DEFINE_integer("evaluate_every", 100, "evaluate every this many batches")
tf.flags.DEFINE_float("learning_rate", 0.01, "learning rate")
tf.flags.DEFINE_float("grad_clip", 5, "grad clip to prevent gradient explode")

FLAGS = tf.flags.FLAGS

train_x, train_y, dev_x, dev_y = read_dataset()
print "data load finished"

with tf.Session() as sess:
    han = model.HAN(vocab_size=FLAGS.vocab_size,
                    num_classes=FLAGS.num_classes,
                    embedding_size=FLAGS.embedding_size,
                    hidden_size=FLAGS.hidden_size)

    with tf.name_scope('loss'):
        loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=han.input_y,
                                                                      logits=han.out,
                                                                      name='loss'))
    with tf.name_scope('accuracy'):
        predict = tf.argmax(han.out, axis=1, name='predict')
        label = tf.argmax(han.input_y, axis=1, name='label')
        acc = tf.reduce_mean(tf.cast(tf.equal(predict, label), tf.float32))

    timestamp = str(int(time.time()))
    out_dir = os.path.abspath(os.path.join(os.path.curdir, "runs", timestamp))
    print("Writing to {}\n".format(out_dir))

    global_step = tf.Variable(0, trainable=False)
    optimizer = tf.train.AdamOptimizer(FLAGS.learning_rate)
    # RNN中常用的梯度截斷,防止出現梯度過大難以求導的現象
    tvars = tf.trainable_variables()
    grads, _ = tf.clip_by_global_norm(tf.gradients(loss, tvars), FLAGS.grad_clip)
    grads_and_vars = tuple(zip(grads, tvars))
    train_op = optimizer.apply_gradients(grads_and_vars, global_step=global_step)

    # Keep track of gradient values and sparsity (optional)
    grad_summaries = []
    for g, v in grads_and_vars:
        if g is not None:
            grad_hist_summary = tf.summary.histogram("{}/grad/hist".format(v.name), g)
            grad_summaries.append(grad_hist_summary)

    grad_summaries_merged = tf.summary.merge(grad_summaries)

    loss_summary = tf.summary.scalar('loss', loss)
    acc_summary = tf.summary.scalar('accuracy', acc)


    train_summary_op = tf.summary.merge([loss_summary, acc_summary, grad_summaries_merged])
    train_summary_dir = os.path.join(out_dir, "summaries", "train")
    train_summary_writer = tf.summary.FileWriter(train_summary_dir, sess.graph)

    dev_summary_op = tf.summary.merge([loss_summary, acc_summary])
    dev_summary_dir = os.path.join(out_dir, "summaries", "dev")
    dev_summary_writer = tf.summary.FileWriter(dev_summary_dir, sess.graph)

    checkpoint_dir = os.path.abspath(os.path.join(out_dir, "checkpoints"))
    checkpoint_prefix = os.path.join(checkpoint_dir, "model")
    if not os.path.exists(checkpoint_dir):
        os.makedirs(checkpoint_dir)
    saver = tf.train.Saver(tf.global_variables(), max_to_keep=FLAGS.num_checkpoints)

    sess.run(tf.global_variables_initializer())

    def train_step(x_batch, y_batch):
        feed_dict = {
            han.input_x: x_batch,
            han.input_y: y_batch,
            han.max_sentence_num: 30,
            han.max_sentence_length: 30,
            han.batch_size: 64
        }
        _, step, summaries, cost, accuracy = sess.run([train_op, global_step, train_summary_op, loss, acc], feed_dict)

        time_str = str(int(time.time()))
        print("{}: step {}, loss {:g}, acc {:g}".format(time_str, step, cost, accuracy))
        train_summary_writer.add_summary(summaries, step)

        return step

    def dev_step(x_batch, y_batch, writer=None):
        feed_dict = {
            han.input_x: x_batch,
            han.input_y: y_batch,
            han.max_sentence_num: 30,
            han.max_sentence_length: 30,
            han.batch_size: 64
        }
        step, summaries, cost, accuracy = sess.run([global_step, dev_summary_op, loss, acc], feed_dict)
        time_str = str(int(time.time()))
        print("++++++++++++++++++dev++++++++++++++{}: step {}, loss {:g}, acc {:g}".format(time_str, step, cost, accuracy))
        if writer:
            writer.add_summary(summaries, step)

    for epoch in range(FLAGS.num_epochs):
        print('current epoch %s' % (epoch + 1))
        for i in range(0, 200000, FLAGS.batch_size):
            x = train_x[i:i + FLAGS.batch_size]
            y = train_y[i:i + FLAGS.batch_size]
            step = train_step(x, y)
            if step % FLAGS.evaluate_every == 0:
                dev_step(dev_x, dev_y, dev_summary_writer)

當模型訓練好之後,我們就可以去tensorboard上面檢視訓練結果如何了。

訓練結果

訓練起來不算慢,但是也稱不上快,在實驗室伺服器上做測試,64G記憶體,基本上2秒可以跑3個batch。然後我昨天晚上跑了之後就回宿舍了,回來之後發現忘了把dev的資料寫到summary裡面,而且現在每個epoch裡面沒加shuffle,也沒跑很久,更沒有調參,所以結果湊合能看出一種趨勢,等過幾天有時間在跑跑該該引數之類的看能不能有所提升,就簡單上幾個截圖吧。
這裡寫圖片描述
這裡寫圖片描述
這裡寫圖片描述
這裡寫圖片描述

相關推薦

Hierarchical Attention Network for Document Classification--tensorflow實現

上週我們介紹了Hierarchical Attention Network for Document Classification這篇論文的模型架構,這周抽空用tensorflow實現了一下,接下來主要從程式碼的角度介紹如何實現用於文字分類的HAN模型。 資料

Hierarchical Attention Networks for Document Classification 模型理解

Hierarchical Attention Networks for Document Classification 模型理解篇 本文借鑑了大神的部落格,連結:https://blog.csdn.net/liuchonge/article/details/73610734 最近看了

Hierarchical Attention Networks for Document Classification 實現

Hierarchical Attention Networks for Document Classification 實現篇 本文借鑑了大神的部落格和程式碼,連結:https://blog.csdn.net/liuchonge/article/details/74092014?loca

《17.Residual Attention Network for Image Classification

動機 深度學習中的Attention,源自於人腦的注意力機制,當人的大腦接受到外部資訊,如視覺資訊、聽覺資訊時,往往不會對全部資訊進行處理和理解,而只會將注意力集中在部分顯著或者感興趣的資訊上,這樣有助於濾除不重要的資訊,而提升資訊處理的效率。 最早將Attention利用

論文筆記:Residual Attention Network for Image Classification

前言 深度學習中的Attention,源自於人腦的注意力機制,當人的大腦接受到外部資訊,如視覺資訊、聽覺資訊時,往往不會對全部資訊進行處理和理解,而只會將注意力集中在部分顯著或者感興趣的資訊上,這樣有助於濾除不重要的資訊,而提升資訊處理的效率。最早將A

Residual Attention Network for Image Classification, cvpr17

人至懶則無敵。 cvpr17的論文,很有意思,值得一讀和復現。(筆者懶,還是坐等開源吧) 還是老樣子,看圖說話,具體細節,請看論文。 圖1說明了越high-level的part feature及其mask越會focus在object或者parts of objec

Deep Neural Network for Image Classification: Application

cal pack 分享圖片 his exp params next min super When you finish this, you will have finished the last programming assignment of Week 4, and a

01神經網路和深度學習-Deep Neural Network for Image Classification: Application-第四周程式設計作業2

一、兩層神經網路 模型:LINEAR->RELU->LINEAR->SIGMOID #coding=utf-8 import time import numpy as np import h5py import matplotlib.pyplot as

Recurrent Neural Network for Text Classification with Multi-Task Learning

引言 Pengfei Liu等人在2016年的IJCAI上發表的論文,論文提到已存在的網路都是針對單一任務進行訓練,但是這種模型都存在問題,即缺少標註資料,當然這是任何機器學習任務都面臨的問題。 為了應對資料量少,常用的方法是使用一個無監督的預訓練模型,比如詞向量,實驗中也取得了不錯

Andrew Ng 深度學習課程deeplearning.ai 程式設計作業——shallow network for datesets classification (1-3)

##Planar data classification with one hidden layer ## 1.常用的Python Library numpy:is the fundamental package for scientific computin

第四周程式設計作業(二)-Deep Neural Network for Image Classification: Application

Deep Neural Network for Image Classification: Application When you finish this, you will have finished the last programming assignment of Week 4

ReID:Harmonious Attention Network for Peson Re-Identification 解讀

Problem Existing person re-identification(re-id) methods either assume the availability of well-aligned person bounding box

Deep Neural Network for Image Classification:Application

上一篇文章中實現了一個兩層神經網路和L層神經網路需要用到的函式 本篇我們利用這些函式來實現一個深層神經網路來實現圖片的分類 1.首先是匯入需要的包 import time import numpy as np import h5py import matplotlib.p

Hierarchical Attention Based Semi-supervised Network Representation Learning

can 識別 sent 新的 序列 -type 信息 註意力 semi Hierarchical Attention Based Semi-supervised Network Representation Learning 1. 任務 給定:節點信息網絡 目標:為每

tensorflow實現attention

import tensorflow as tf def attention(inputs, attention_size, time_major=False, return_alphas=False): """ Attention mechanism layer which re

Connectionist Temporal Classification(CTC)、音識別模型小型綜述和一個簡易的語音識別模型的tensorflow實現

CTC是一種端到端的語音識別技術,他避免了需要字或者音素級別的標註,只需要句子級別的標註就可以進行訓練,感覺非常巧妙,也很符合神經網路浪潮人們的習慣。特別是LSTM+CTC相較於之前的DNN+HMM,LSTM能夠更好的捕捉輸入中的重要的點(LSTM隨著狀態數目增加引數呈線性增加,而HMM會平

tensorflow實現分類問題classification

資料與預測目標: x是在2和-2附近的正態分佈,y是0和1,y=0代表x更接近2,也就是x是正數,y=1代表x是負數。 import tensorflow as tf import numpy as np import matplotlib.pyplot as p

MACNN-Learning Multi-Attention Convolutional Neural Network for Fine-Grained Image Recognition

《Learning Multi-Attention Convolutional Neural Network for Fine-Grained Image Recognition》是微軟亞洲研究院17年出的一篇細粒度影象識別論文,它的姊妹篇是《Look Closer to

Two-level attention model for fine-grained Image classification

The Application of Two-level Attention Models in Deep Convolutional Neural Network for Fine-grained Image Classification(細粒度影象識別) 原文連結:

論文筆記《The application of two-level attention models in deep convolutional neural network for FGVC》

這篇文章是2015年的,作者使用提出了兩級注意力的方法,來進行細粒度分類。 以鳥類分類為例。作者在object-level和part-level兩個級別分別對鳥進行分類,將得到的分數相加綜合後得到最後的分類結果。 上圖是鳥分類在object-level的一個流程圖,先用select