Question Retrieval with Distributed Representations and Participant Reputation in Community QA論文筆記

阿新 • • 發佈：2018-12-21

原文下載地址

摘要

社群問題的難點在於：重複性問題
解決上述問題要採用Query retrieval(QR)，QR的難點在於：同義詞匯
本文演算法：1）採用continuous bag-of-words(CBoW)模型對詞（word）進行 Distributed Representations(分散式表達，詞嵌入)；2)對given query和存檔的query計算tile域和description域的相似度；3）將使用者信譽(user reputation)也用於排序模型
測試資料集為 Asus's Republic of Gamers (ROG) 論壇

引言

QR的難點在於同於詞彙，處理同義詞的方法有四種：

Language model information retrieval (LMIR):思想為計算給定問題和候選問題間詞序列的概率
language model with category smoothing (LMC)：將問題類別表示為向量空間的一個維度（上述兩種方法的缺點為：忽略了詞與詞之間的相似度）
translation-based language modeling (TBLM)：使用QA對來學習語義相關的單詞以改進傳統的IR模型，缺點是學習一個翻譯表太耗時

distributed-representation-based language modeling (DRLM) :使用資料的分散式表示來替換TBLM中的詞到詞間的翻譯概率，其使用word2vector計算概率

本文演算法

本文演算法包含三部分：1）詞嵌入學習：給定論壇資料集，問題被視為基本單位，問題中的每個單詞都會轉換為一個單詞向量。

2）得分生成：學習到單詞向量後，就可以通過計算查詢問題和候選問題之間的相似性來進行問題檢索。

3）使用信譽資訊：通過引入每個存檔問題參與者的信譽值來加強排序函式。

1.Word2vec

word2vec的理解可以參看部落格

[NLP] 秒懂詞向量Word2vec的本質，研究表明CBoW模型在文字分類方面表現更好，特別適用於包含極少數不常見單詞的文件，而且該模型的訓練速度快於 skip-gram模型，因此本文采用CBoW進行詞向量學習。

2.問題標題和描述的排序函式

利用word2vec學習到詞向量後，每個問題q的向量表示式如下：

其中w為q中的每個詞，e是向量中每個維度的值。查詢問題q和候選答案Q間的相似度得分為：

論壇問題包含兩部分：title和description，不同於之前的研究，本文分別計算這兩部分的相似度得分：

α和β都是超引數，α+β=1

3.使用論壇中的使用者信譽

查詢問題q和候選問題Q間的相似度得分表達為：

為超引數，，RPU(Q)是參與Q討論的使用者信譽值總和，為避免來自同一論壇使用者的過多信譽值，只新增一次每個參與者的信譽值，為確保新帖的公平性，求信譽值的均值。

實驗

Forum為本文演算法，-T考慮問題title的相似度，-C考慮問題description的相似度，-R考慮使用者信譽值的相似度，上表可以看出本文演算法優於其他演算法。

下表為超引數的最優值：

Wiki表示採用Wiki訓練資料，Table3表明Wiki表現最差，這表明對於word2vec的訓練，域內資料比域外培訓資料更有效。

Question Retrieval with Distributed Representations and Participant Reputation in Community QA論文筆記

原文下載地址摘要社群問題的難點在於：重複性問題解決上述問題要採用Query retrieval(QR)，QR的難點在於：同義詞匯本文演算法：1）採用continuous bag-of-words(CBoW)模型對詞（word）進行 Distributed

《Data Structure And Algorithm Analysis In C++》讀書筆記六

Chapter 6 Priority Queues(Heaps)* Efficient implementation of the priority queue ADT.* Use of priority queues.* Advanced implementations o

ICCV 2015 《Understanding and Diagnosing Visual Tracking Systems》論文筆記

寫在前面今天開啟csdn，想寫這篇部落格的時候，發現，哇，Markdown編輯器，趕緊試了一下，好用，哈哈。理解出錯之處望不吝指正。文章大意這篇文章是ICCV2015年的文章，我是看了浙大王萌萌學姐的目標跟蹤綜

Learning both Weights and Connections for Efficient Neural Network -- 論文筆記

這是2015年斯坦福和英偉達的一篇論文。 1.簡介：通過修剪訓練後網路中的不重要連線（connections），來減少網路所需要的引數，減少記憶體和cpu的消耗，使網路更加適應在移動裝置上執行。 2.idea思想： 1）首先訓練整個網路

Distributed Representations of Words and Phrases and their Compositionality

最近提出的連續skip-gram模型是學習高質量分散式詞向量表示的有效方法，它捕獲了大量精確地語法和語義資訊。本文的擴充套件是提高向量的質量和訓練的速度。通過下采樣一些頻繁的詞獲得速度很大的提升，並且學得更規則的詞表示，並且提出了一個可以替換層次級softmax的方法，叫負取樣。

Beyond Part Models: Person Retrieval with Refined Part Pooling (and A Strong Convolutional Baseline)

一、介紹提取part-level的特徵進行行人重識別提供了細粒度資訊，最近的研究表明對提升行人重識別準確度非常有效。其中提取part-level特徵發揮作用的關鍵是每個部分被準確的定位。現有一些利用外部提示的方法如姿態估計等等來定位每個部分，作者提出了一種根據每部分內容上

Paragraph2vec(段向量）-------基於《Distributed Representations of Sentences and Documents》

目錄一、概要 1）背景 2）摘要二、內容 1）傳統的bag of words 2）本文的paragraph vector 3）演算法（1）word2vec的演算法原理（2）paragraph vector演算法三、總結一、概要 1）背景

Distributed Deep Learning with IBM DDL and TensorFlow NMT

by Seetharami Seelam, Geert Janssen, and Luis Lastras Introduction Sequence-to-sequence models are used extensively in tasks such as machine translation

論文理解：DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations

即將入坑的和已經入坑的小夥伴可以新增QQ群交流：群名稱：AI服裝檢索分類群號：474700336 主要包含兩個工作： 1. 建立了一個服裝資料集DeepFashion，包含80萬張服裝影象，有類別，屬性，特徵點以及服飾框的標註。詳情可以參考我的另一篇

DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations

1.資料集結構 Deep Fashion是CUHK的MMLAB提供的公開資料集，包含了80萬張圖片。首先為影象生成形容詞+名詞的描述，例如‘印著動物的裙子’，然後將名詞作為類別標籤，共生成了50種標記分類。然後將形容詞作為對應的屬

經典論文閱讀——DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations （CVPR 2

DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations （CVPR 2016） link：http://mmlab.ie.cuhk.edu.hk/proj

[Nuxt] Navigate with nuxt-link and Customize isClient Behavior in Nuxt and Vue.js

component edi task pre share http less obj .get Because Nuxt renders pages on the server, you should use the nuxt-link components to navi

[React] Style the body element with styled-components and "injectGlobal"

boto nts from ann can styles font ply -s In this lesson, we see how we can apply styles globally with the "injectGlobal" helper method in

[React] Theme your application with styled-components and "ThemeProvider"

radius int ssi tid react style pro sha reac In this styled-components lesson, we set a "primary color" within a UI "theme" object. We mak

SDN實戰: Build a mini-lab environment and practice SDN-IP/ONOS with GNS3, Mininet and VMware

sdn-ip onos mininet gns3 vmwareSDN IN ACTION: Build a mini-lab environment and practice SDN-IP/ONOS with GNS3, Mininet and VMware 薛國鋒 [email protect

SDN實戰: Practice SDN/OpenFlow with LINC-Switch and OpenDaylight

erlang linc opendaylight restful restconfSDN IN ACTION: Practice SDN/OpenFlow with LINC-Switch and OpenDaylight 薛國鋒 [email protected]本次實驗，重點學習了Erlan

[React] Create a queue of Ajax requests with redux-observable and group the results.

exe new sta fault merge tasks return cer convert With redux-observable, we have the power of RxJS at our disposal - this means tasks that

[Node.js] Read a File in Node.js with fs.readFile and fs.readFileSync

sync spa enc erro buffer ron div examples nbsp We‘ll read a csv file in node.js both synchronously, and asynchronously. The file we‘re re

[Node.js] Write or Append to a File in Node.js with fs.writeFile and fs.writeFileSync

cti pen instead all write urn object The nod In node.js, you can require fs, and then call fs.writeFile with the filename, and data to wr

OReilly.Hands-On.Machine.Learning.with.Scikit-Learn.and.TensorFlow學習筆記彙總

其中用到的知識點我都記錄在部落格中了：https://blog.csdn.net/dss_dssssd 第一章知識點總結： supervised learning k-Nearest Neighbors Linear Regression

Question Retrieval with Distributed Representations and Participant Reputation in Community QA論文筆記

摘要

引言

本文演算法

1.Word2vec

2.問題標題和描述的排序函式

3.使用論壇中的使用者信譽

實驗

相關推薦