1. 程式人生 > >Ubuntu16.04環境下Python下xlearn機器學習庫的配置

Ubuntu16.04環境下Python下xlearn機器學習庫的配置

一、xlearn的簡介

參見:https://www.zhihu.com/question/37256015/answer/268151326,http://www.sohu.com/a/206728248_206784

        在機器學習裡,除了深度學習和樹模型 (GBDT, RF) 之外,如何高效地處理高維稀疏資料也是非常重要的課題,Sparse LR, FM, FFM 這些演算法被廣泛運用在實際生產和kaggle比賽中。現有的開源軟體例如 liblinear, libfm, libffm 都只能針對特定的演算法,並且可擴充套件性、靈活性、易用性都不夠友好。

    相比於已有的軟體,這款一款針對於海量資料處理任務的分散式機器學習系統

xlearn(區別於360的深度學習排程平臺 XLearningxlearn來源於北大信科的肖臻課題組)的優勢主要有:

    1、通用性好,我們用統一的架構將主流的演算法(lr, fm, ffm 等)全部囊括,使用者不用再切換於不同軟體之間。

    2、效能好。xlearn由高效能c++開發,提供 cache-aware 和 lock-free learning,並且經過手工 SSE/AVX 指令優化。 在單機MacBook Pro上測試 xlearn 可以比 libfm 快13倍,比 libffm 和 liblinear 快5倍(基於Criteo CTR資料 bechmark)。

    3、

易用性和靈活性,xlearn 提供簡單的 python 介面,並且集合了機器學習比賽中許多有用的功能,例如:cross-validation,early-stopping 等。除此之外,使用者可以靈活選擇優化演算法(例如,SGD,AdaGrad, FTRL 等)。

    4、可擴充套件性好。xlearn 提供 out-of-core 計算,利用外存計算可以在單機處理 1TB 資料。

二、xlearn的安裝與配置(pip)

   參見: http://xlearn-doc.readthedocs.io/en/latest/index.html

    使用pip安裝xlearn庫:
[email protected]
:~$ sudo pip install xlearn [sudo] password for yuhuiliu: Collecting xlearn   Downloading https://files.pythonhosted.org/packages/1a/20/d2762ecfd0da63bf2f0ee95429c7cf8ad44ab8ad4adc48b405fa67a09848/xlearn-0.31a1.tar.gz (1.8MB)     100% |████████████████████████████████| 1.9MB 4.2MB/s Building wheels for collected packages: xlearn   Running setup.py bdist_wheel for xlearn ... done   Stored in directory: /home/yuhuiliu/.cache/pip/wheels/9c/46/1a/e7682af4ef3320ad6e106c633aea0ee46ffb353aaf31723bab Successfully built xlearn tensorflow-gpu 1.7.0 requires numpy>=1.13.3, which is not installed. h5py 2.7.1 requires numpy>=1.7, which is not installed. tensorflow-tensorboard 1.5.0 requires numpy>=1.12.0, which is not installed. torchvision 0.2.1 requires numpy, which is not installed. pandas 0.22.0 requires numpy>=1.9.0, which is not installed. patsy 0.5.0 requires numpy>=1.4, which is not installed. matplotlib 2.1.2 requires numpy>=1.7.1, which is not installed. keras 2.1.3 requires numpy>=1.9.1, which is not installed. opencv-python 3.4.0.12 requires numpy>=1.11.1, which is not installed. tensorflow 1.5.0 requires numpy>=1.12.1, which is not installed. tensorboard 1.7.0 requires numpy>=1.12.0, which is not installed. scipy 1.0.0 requires numpy>=1.8.2, which is not installed. Installing collected packages: xlearn Successfully installed xlearn-0.31a1

    顯示有下面的庫沒有安裝或者需要更新:

tensorflow-gpu 1.7.0 requires numpy>=1.13.3, which is not installed.
h5py 2.7.1 requires numpy>=1.7, which is not installed.
tensorflow-tensorboard 1.5.0 requires numpy>=1.12.0, which is not installed.
torchvision 0.2.1 requires numpy, which is not installed.
pandas 0.22.0 requires numpy>=1.9.0, which is not installed.
patsy 0.5.0 requires numpy>=1.4, which is not installed.
matplotlib 2.1.2 requires numpy>=1.7.1, which is not installed.
keras 2.1.3 requires numpy>=1.9.1, which is not installed.
opencv-python 3.4.0.12 requires numpy>=1.11.1, which is not installed.
tensorflow 1.5.0 requires numpy>=1.12.1, which is not installed.
tensorboard 1.7.0 requires numpy>=1.12.0, which is not installed.
scipy 1.0.0 requires numpy>=1.8.2, which is not installed.

    這裡顯示的tensorflow-gpu等庫依賴的numpy版本過低,需要更新一下numpy:

[email protected]:~$ sudo pip install tensorflow-gpu h5py  numpy scipy scikit-learn -U
    這裡附帶更新了一下tensorflow-gpu h5py scipy scikit-learn 等庫:
[email protected]:~$ sudo pip install tensorflow-gpu h5py  numpy scipy scikit-learn -U
Collecting tensorflow-gpu
  Downloading https://files.pythonhosted.org/packages/f2/fa/01883fee1cdb4682bbd188edc26da5982c459e681543bb7f99299fca8800/tensorflow_gpu-1.8.0-cp35-cp35m-manylinux1_x86_64.whl (216.3MB)
    100% |████████████████████████████████| 216.3MB 219kB/s
Requirement already up-to-date: h5py in /usr/local/lib/python3.5/dist-packages (2.7.1)
Requirement already up-to-date: numpy in /usr/local/lib/python3.5/dist-packages (1.14.3)
Requirement already up-to-date: scipy in /usr/local/lib/python3.5/dist-packages (1.1.0)
Requirement already up-to-date: scikit-learn in /usr/local/lib/python3.5/dist-packages (0.19.1)
Requirement not upgraded as not directly required: wheel>=0.26 in /usr/local/lib/python3.5/dist-packages (from tensorflow-gpu) (0.31.0)
Requirement not upgraded as not directly required: termcolor>=1.1.0 in /usr/local/lib/python3.5/dist-packages (from tensorflow-gpu) (1.1.0)
Requirement not upgraded as not directly required: six>=1.10.0 in /usr/lib/python3/dist-packages (from tensorflow-gpu) (1.10.0)
Requirement not upgraded as not directly required: astor>=0.6.0 in /usr/local/lib/python3.5/dist-packages (from tensorflow-gpu) (0.6.2)
Requirement not upgraded as not directly required: protobuf>=3.4.0 in /usr/local/lib/python3.5/dist-packages (from tensorflow-gpu) (3.5.1)
Collecting tensorboard<1.9.0,>=1.8.0 (from tensorflow-gpu)
  Downloading https://files.pythonhosted.org/packages/59/a6/0ae6092b7542cfedba6b2a1c9b8dceaf278238c39484f3ba03b03f07803c/tensorboard-1.8.0-py3-none-any.whl (3.1MB)
    100% |████████████████████████████████| 3.1MB 1.8MB/s
Requirement not upgraded as not directly required: gast>=0.2.0 in /usr/local/lib/python3.5/dist-packages (from tensorflow-gpu) (0.2.0)
Requirement not upgraded as not directly required: grpcio>=1.8.6 in /usr/local/lib/python3.5/dist-packages (from tensorflow-gpu) (1.11.0)
Requirement not upgraded as not directly required: absl-py>=0.1.6 in /usr/local/lib/python3.5/dist-packages (from tensorflow-gpu) (0.1.9)
Requirement not upgraded as not directly required: setuptools in /usr/local/lib/python3.5/dist-packages (from protobuf>=3.4.0->tensorflow-gpu) (39.1.0)
Requirement not upgraded as not directly required: werkzeug>=0.11.10 in /usr/local/lib/python3.5/dist-packages (from tensorboard<1.9.0,>=1.8.0->tensorflow-gpu) (0.14.1)
Requirement not upgraded as not directly required: markdown>=2.6.8 in /usr/local/lib/python3.5/dist-packages (from tensorboard<1.9.0,>=1.8.0->tensorflow-gpu) (2.6.11)
Requirement not upgraded as not directly required: bleach==1.5.0 in /usr/local/lib/python3.5/dist-packages (from tensorboard<1.9.0,>=1.8.0->tensorflow-gpu) (1.5.0)
Requirement not upgraded as not directly required: html5lib==0.9999999 in /usr/local/lib/python3.5/dist-packages (from tensorboard<1.9.0,>=1.8.0->tensorflow-gpu) (0.9999999)
Installing collected packages: tensorboard, tensorflow-gpu
  Found existing installation: tensorboard 1.7.0
    Uninstalling tensorboard-1.7.0:
      Successfully uninstalled tensorboard-1.7.0
  Found existing installation: tensorflow-gpu 1.7.0
    Uninstalling tensorflow-gpu-1.7.0:
      Successfully uninstalled tensorflow-gpu-1.7.0
Successfully installed tensorboard-1.8.0 tensorflow-gpu-1.8.0

    顯示xlearn成功安裝:

[email protected]:~$ sudo pip install xlearn
[sudo] password for yuhuiliu:
Requirement already satisfied: xlearn in /usr/local/lib/python3.5/dist-packages (0.31a1)
[email protected]:~$ sudo pip list |grep xle
xlearn                        0.31a1

三、使用conda建立新的環境

    在上面可以看見使用pip install了xlearn,但是隻能在系統自帶的python環境下使用,如下圖使用conda list|grep xlear 命令檢視xlearn庫的資訊,就發現沒有該庫的資訊:

[email protected]:~$ conda list |grep xlea
[email protected]:~$

    參照https://www.zhihu.com/question/58033789對conda這一工具的介紹和https://www.jianshu.com/p/7e4c29a26f29的解決方法,這時為了預防配置第三方庫不慎導致的崩潰,一般建立新的環境進行配置。

    1、先列出當前的環境列表:

[email protected]:~$ conda env list
# conda environments:
#
base                  *  /home/yuhuiliu/anaconda3
    2、建立一個名字叫做ffm_baseline的環境,指定python版本為3.x(如果要安裝特定的版本,如3.6,應為:python=3.6):
[email protected]:~$ conda create -n ffm_baseline python=3
Solving environment: done

## Package Plan ##

  environment location: /home/yuhuiliu/anaconda3/envs/ffm_baseline

  added / updated specs:
    - python=3


The following NEW packages will be INSTALLED:

    ca-certificates: 2018.03.07-0            defaults
    certifi:         2018.4.16-py36_0        defaults
    libedit:         3.1.20170329-h6b74fdf_2 defaults
    libffi:          3.2.1-hd88cf55_4        defaults
    libgcc-ng:       7.2.0-hdf63c60_3        defaults
    libstdcxx-ng:    7.2.0-hdf63c60_3        defaults
    ncurses:         6.1-hf484d3e_0          defaults
    openssl:         1.0.2o-h20670df_0       defaults
    pip:             10.0.1-py36_0           defaults
    python:          3.6.5-hc3d631a_2        defaults
    readline:        7.0-ha6073c6_4          defaults
    setuptools:      39.1.0-py36_0           defaults
    sqlite:          3.23.1-he433501_0       defaults
    tk:              8.6.7-hc745277_3        defaults
    wheel:           0.31.0-py36_0           defaults
    xz:              5.2.3-h5e939de_4        defaults
    zlib:            1.2.11-ha838bed_2       defaults

Proceed ([y]/n)? Y

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use:
# > source activate ffm_baseline
#
# To deactivate an active environment, use:
# > source deactivate
#

[email protected]:~$

    3、此時使用conda list命令檢視如下,會發現多了一個ffm_baseline的環境:

[email protected]:~$ conda env list
# conda environments:
#
base                  *  /home/yuhuiliu/anaconda3
ffm_baseline             /home/yuhuiliu/anaconda3/envs/ffm_baseline

     4、source activate ffm_baseline一下進入ffm_baseline環境,使用conda list檢視該環境的庫:

[email protected]:~$ source activate ffm_baseline
(ffm_baseline) [email protected]:~$ conda list
# packages in environment at /home/yuhuiliu/anaconda3/envs/ffm_baseline:
#
# Name                    Version                   Build  Channel
ca-certificates           2018.03.07                    0    defaults
certifi                   2018.4.16                py36_0    defaults
libedit                   3.1.20170329         h6b74fdf_2    defaults
libffi                    3.2.1                hd88cf55_4    defaults
libgcc-ng                 7.2.0                hdf63c60_3    defaults
libstdcxx-ng              7.2.0                hdf63c60_3    defaults
ncurses                   6.1                  hf484d3e_0    defaults
openssl                   1.0.2o               h20670df_0    defaults
pip                       10.0.1                   py36_0    defaults
python                    3.6.5                hc3d631a_2    defaults
readline                  7.0                  ha6073c6_4    defaults
setuptools                39.1.0                   py36_0    defaults
sqlite                    3.23.1               he433501_0    defaults
tk                        8.6.7                hc745277_3    defaults
wheel                     0.31.0                   py36_0    defaults
xz                        5.2.3                h5e939de_4    defaults
zlib                      1.2.11               ha838bed_2    defaults
     這裡的lib列表裡面展示了conda預設環境下的一些基本庫,與預設安裝就有的base環境下的庫相比,也小巧的多,方便我們配置,因為anacond的功能如jupyter等預設還需要其他庫的支援。

四、第三方庫xlearn在conda環境下的使用

    在網上搜索之後,發現的原因是anaconda在託管了系統的python環境後,pip命令預設指向的是系統原始python環境裡面的pip命令,因此使用pip install xlearn只能安裝在系統環境的pip list中;而如果pip命令指向的是anconda託管環境中的pip命令時,可以pip install xlearn到conda的list中,作為對比,在另外一臺電腦sinc-server上試驗如下:

[email protected]:~$ pip install xlearn
Collecting xlearn
  Downloading https://files.pythonhosted.org/packages/1a/20/d2762ecfd0da63bf2f0ee95429c7cf8ad44ab8ad4adc48b405fa67a09848/xlearn-0.31a1.tar.gz (1.8MB)
    100% |████████████████████████████████| 1.9MB 620kB/s
Building wheels for collected packages: xlearn
  Running setup.py bdist_wheel for xlearn ... done
  Stored in directory: /home/yuhuiliu/.cache/pip/wheels/9c/46/1a/e7682af4ef3320ad6e106c633aea0ee46ffb353aaf31723bab
Successfully built xlearn
Installing collected packages: xlearn
Successfully installed xlearn-0.31a1
You are using pip version 9.0.1, however version 10.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
[email protected]:~$ conda list |grep xlearn
xlearn                    0.31a1                    <pip>
[email protected]:~$ python
Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 18:10:19)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import xlearn
>>>

這時,參考網上的說明,如果想在conda的環境中使用第三方庫xlearn的話,附錄網上的幾種建議如下:

    1、參照https://blog.csdn.net/sinat_39338078/article/details/78749724對第三方庫xlearn進行安裝:

(ffm_baseline) [email protected]:~$ sudo pip install xlearn
[sudo] password for yuhuiliu:
Requirement already satisfied: xlearn in /usr/local/lib/python3.5/dist-packages (0.31a1)

    將原始安裝於/usr/local/lib/python3.5/dist-packages/的庫檔案,

    #此處的檔案大都是用pip安裝的,也就是剛剛安裝的xlearn也位於此處,但是還有一些依賴項#

    將這個資料夾下的所有檔案複製到 對應的~/anaconda3/lib/python3.5/site-packages/目錄下,有重複的選擇替換

    經過檢視:

[email protected]:~$ ls ~/anaconda3/lib/p
pkgconfig/ python3.6/

    anaconda3/lib的目錄下面環境為python3.6的庫,強行合併python3.5下的庫到python3.6的庫下面感覺有點不靠譜~

    2 、參照https://segmentfault.com/q/1010000012539647和https://www.zhihu.com/question/41974592中“史密斯”同學的解答,再根據上面的https://www.jianshu.com/p/7e4c29a26f29,先sudo pip uninstall xlearn解除安裝掉xlearn,直接執行/home/yuhuiliu/anaconda3/bin下的pip命令:

(ffm_baseline) [email protected]:~$ sudo pip uninstall xlearn
[sudo] password for yuhuiliu:
Uninstalling xlearn-0.31a1:
  Would remove:
    /usr/local/lib/python3.5/dist-packages/xlearn-0.31a1.dist-info/*
    /usr/local/lib/python3.5/dist-packages/xlearn/*
Proceed (y/n)? Y
  Successfully uninstalled xlearn-0.31a1
(ffm_baseline) [email protected]:~$ sudo /home/yuhuiliu/anaconda3/bin/pip install xlearn
Collecting xlearn
  Using cached https://files.pythonhosted.org/packages/1a/20/d2762ecfd0da63bf2f0ee95429c7cf8ad44ab8ad4adc48b405fa67a09848/xlearn-0.31a1.tar.gz
Building wheels for collected packages: xlearn
  Running setup.py bdist_wheel for xlearn ... done
  Stored in directory: /home/yuhuiliu/.cache/pip/wheels/9c/46/1a/e7682af4ef3320ad6e106c633aea0ee46ffb353aaf31723bab
Successfully built xlearn
distributed 1.21.8 requires msgpack, which is not installed.
Installing collected packages: xlearn
Successfully installed xlearn-0.31a1
    顯示有依賴問題,這裡先不管,來conda list |grep xlear看一下xlearn的資訊:
(ffm_baseline) [email protected]:~$ conda list |grep xlear
(ffm_baseline) [email protected]:~$ 
    額,好像沒有,直接執行/home/yuhuiliu/anaconda3/bin下的pip命令安裝到的是預設的base環境下,切換到base環境,conda list |grep xlear看一下xlearn的資訊:
(ffm_baseline) [email protected]:~$ conda env list
# conda environments:
#
base                     /home/yuhuiliu/anaconda3
ffm_baseline          *  /home/yuhuiliu/anaconda3/envs/ffm_baseline

(ffm_baseline) [email protected]:~$ source activate base
(base) [email protected]:~$ conda list |grep xlear
xlearn                    0.31a1                    <pip>
(base) [email protected]:~$ python
Python 3.6.5 |Anaconda custom (64-bit)| (default, Apr 29 2018, 16:14:56)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import xlearn as xl
>>> xl.hello()
----------------------------------------------------------------------------------------------
           _
          | |
     __  _| |     ___  __ _ _ __ _ __
     \ \/ / |    / _ \/ _` | '__| '_ \
      >  <| |___|  __/ (_| | |  | | | |
     /_/\_\_____/\___|\__,_|_|  |_| |_|

        xLearn   -- 0.31 Version --
----------------------------------------------------------------------------------------------

>>>

   到這裡,xlearn可以在conda建立的base環境下使用了。

    3、解決xlearn的依賴問題:

(base) [email protected]:~$ sudo /home/yuhuiliu/anaconda3/bin/pip install msgpack
[sudo] password for yuhuiliu:
Collecting msgpack
  Downloading https://files.pythonhosted.org/packages/22/4e/dcf124fd97e5f5611123d6ad9f40ffd6eb979d1efdc1049e28a795672fcd/msgpack-0.5.6-cp36-cp36m-manylinux1_x86_64.whl (315kB)
    100% |████████████████████████████████| 317kB 11kB/s
Installing collected packages: msgpack
Successfully installed msgpack-0.5.6
(base) [email protected]:~$