1. 程式人生 > 其它 >NLTK:離線安裝punkt

NLTK:離線安裝punkt

技術標籤:小記錄nltk

NLTK 3.5 documentation

官方文件給出了各種安裝方法,其中,提到了命令列安裝指導:

Command line installation

The downloader will search for an existingnltk_datadirectory to install NLTK data. If one does not exist it will attempt to create one in a central location (when using an administrator account) or otherwise in the user’s filespace. If necessary, run the download command from an administrator account, or using sudo. The recommended system location isC:\nltk_data

(Windows);/usr/local/share/nltk_data(Mac); and/usr/share/nltk_data(Unix). You can use the-dflag to specify a different location (but if you do this, be sure to set theNLTK_DATAenvironment variable accordingly).

Run the commandpython-mnltk.downloaderall. To ensure central installation, run the commandsudopython-mnltk.downloader-d/usr/local/share/nltk_dataall

.

Windows: Use the “Run…” option on the Start menu. Windows Vista users need to first turn on this option, usingStart->Properties->Customizeto check the box to activate the “Run…” option.

Test the installation: Check that the user environment and privileges are set correctly by logging in to a user account, starting the Python interpreter, and accessing the Brown Corpus (see the previous section).

Windows 系統下可使用 python -m nltk.downloader -dC:\Users\Cui\AppData\Roaming\nltk_data 將 data 安裝到指定目錄。

一、問題

但是再安裝 punkt 時遇到一些問題:

>>> import nltk
>>> nltk.download('punkt')
[nltk_data] Error loading punkt: <urlopen error [WinError 10054]
[nltk_data]     遠端主機強迫關閉了一個現有的連線。>
False

這裡給出 離線安裝 punkt 的方法。

二、解決

1、手動下載 NLTK 資料集

這裡直接附上別人的部落格《解決nltk download(‘punkt‘) 連線嘗試失敗》

異可在官網下載:NLTK Corpora

2、安裝 punkt

把下載好的語料包 punkt.zip 解壓到 nltk_data/tokenizers/ 中。

注:因為 punkt 屬於 tokenizers 所以需要新建 tokenizers 資料夾。