1. 程式人生 > >系統安裝情況以及深度學習環境搭建

系統安裝情況以及深度學習環境搭建

1.戴爾AL安裝Ubuntu16.04問題總結

1).找不到固態硬碟

由於dell電腦的出廠設定,在BIOS裡面都統一把硬碟模式設為RAID ON,但這種模式下可能會導致不能正確識別或者完全發揮處SSD的效能。下面是把RAID模式更改位AHCI的方法。

進入wins之後,按下WIN鍵+R鍵,輸入msconfig,進入如下引導介面,安全引導打鉤,最小打鉤,如下所示

之後,點選重新啟動;在啟動之後,按下F2鍵進入BIOS依次找到Advanced介面,選中SATA operation,並按下回車鍵,選擇AHCI模式,這裡提示要重新裝系統,不用理會,點選YES即可,然後按F10,選擇YES,重啟電腦。重啟之後,進入windows的安全模式,再次按下WIN鍵和R鍵,並輸入msconfig,在引導介面,把之前的安全引導的勾全部去掉,

 

然後點選下面的確定,最後選擇重新啟動。開機成功,證明我們開啟了AHCI模式。

2)觸控式螢幕驅動不對

sudo su
echo 'blacklist i2c_hid' >> /etc/modprobe.d/blacklist.conf
depmod -a
update-initramfs -u

and reboot

 

3)黑屏

 安裝完ubuntu16.04之後,可能會出現黑屏的現象,解決方法:

一、

  1. 開機在系統選擇時按”e”進入grub的編輯模式
  2. 找到“quite splash”並在後面加上對nvidia顯示卡的驅動支援”nomodeset”
  3. 按 Ctrl+X或F10啟動系統
  4. 以管理員許可權編輯/etc/default/grub
  5. 找到GRUB_CMDLINE_LINUX_DEFAULT=”quiet splash”,修改為:GRUB_CMDLINE_LINUX_DEFAULT=”quiet splash nomodeset”
  6. 更新grub:sudo update-grub,並重新開機

二、安裝完系統後,可能會進入系統,進入之後執行如下

sudo nano /etc/default/grub
找到這一行:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"修改為GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nomodeset"
貌似Ctrl+o, ctrl +x後(具體看下面提示)更新GRUB,輸入:sudo update-grub

 

環境搭建

1.安裝依賴包

sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler  
  
sudo apt-get install --no-install-recommends libboost-all-dev  
  
sudo apt-get install libopenblas-dev liblapack-dev libatlas-base-dev  
  
sudo apt-get install libgflags-dev libgoogle-glog-dev liblmdb-dev  
  
sudo apt-get install git cmake build-essential 

2.安裝顯示卡驅動

由於16.04預設安裝的是nouveau顯示卡驅動,而它不能用於CUDA的,需要解除安裝並重新安裝

1)首先禁用Ubuntu16.04自帶的顯示卡驅動nouveau,禁用方法就是在 /etc/modprobe.d/blacklist-nouveau.conf 檔案中新增一條禁用命令,如下

sudo gedit /etc/modprobe.d/blacklist-nouveau.conf 

開啟後發現該檔案中沒有任何內容,寫入:

    blacklist nouveau  
    options nouveau modeset=0  

儲存後關閉檔案,注意此時還需執行以下命令使禁用 nouveau 真正生效:

 

 sudo update-initramfs -u

 

檢測禁用生效了沒,使用如下

lsmod | grep nouveau 

下面就開始重灌顯示卡驅動:

我的驅動下載的是NVIDIA_Linux-x86_64-415.13.run,放到自己的使用者名稱home目錄下

下面進入文字模式,ctrl+alt+f1,在文字模式下關閉桌面服務:sudo service lightdm stop,(如果要下載之前安裝的英偉達驅動可以使用sudo apt-get purge nvidia* ),進入到存放驅動的目錄下,執行如下命令:

sudo sh NVIDIA_Linux-x86_64-415.13.run --no-opengl-libs    #run檔案的檔名根據自己下的檔名修改,預設是我提供的檔案  

期間出現如下:

  1. Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 387.26?  
  2. (y)es/(n)o/(q)uit: y 
  1. do you want to run nvidia-xconfig?  
  2. (y)es/(n)o/(q)uit: n 

 

  1. Install the CUDA 9.1 Samples?  
  2. (y)es/(n)o/(q)uit: n 
  1. Install the CUDA 9.1 Toolkit?  
  2. (y)es/(n)o/(q)uit: n 

然後重新啟動系統reboot就可以了,在此驅動安裝完畢。使用如下命令nvidia-settings和nvidia-smi來驗證。

下面安裝cuda10(通過命令nvidia-smi來檢視到的),下載之,名字叫cuda_10.0.130_410.48_linux.run。

執行如下

sudo sh cuda_9.1.85_387.26_linux.run --no-opengl-libs 

Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 387.26?  
(y)es/(n)o/(q)uit: n  
  
Install the CUDA 9.1 Toolkit?  
(y)es/(n)o/(q)uit: y  
  
Enter Toolkit Location  
 [ default is /usr/local/cuda-9.1 ]:   
  
Do you want to install a symbolic link at /usr/local/cuda?  
(y)es/(n)o/(q)uit: y  
  
Install the CUDA 9.1 Samples?  
(y)es/(n)o/(q)uit: y  
  
Enter CUDA Samples Location  
 [ default is /home/ccem ]:   
  
Installing the CUDA Toolkit in /usr/local/cuda-9.1 ...  
Installing the CUDA Samples in /home/ccem ...  
Copying samples to /home/ccem/NVIDIA_CUDA-9.1_Samples now...  
Finished copying samples.  
  
===========  
= Summary =  
===========  
  
Driver:   Not Selected  
Toolkit:  Installed in /usr/local/cuda-9.1  
Samples:  Installed in /home/ccem  
  
Please make sure that  
 -   PATH includes /usr/local/cuda-9.1/bin  
 -   LD_LIBRARY_PATH includes /usr/local/cuda-9.1/lib64, or, add /usr/local/cuda-9.1/lib64 to /etc/ld.so.conf and run ldconfig as root  
  
To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-9.1/bin  
  
Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-9.1/doc/pdf for detailed information on setting up CUDA.  
  
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 384.00 is required for CUDA 9.1 functionality to work.  
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:  
 sudo <CudaInstaller>.run -silent -driver  
  
Logfile is /tmp/cuda_install_36731.log 

如果出現如下,則說明缺少依賴庫

Installing the CUDA Toolkit in /usr/local/cuda-9.1 …   
Missing recommended library: libGLU.so   
Missing recommended library: libX11.so   
Missing recommended library: libXi.so   
Missing recommended library: libXmu.so
則對應安裝依賴庫
sudo apt-get install freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev 

安裝完後,配置cuda的環境變數下面是為當前使用者配置

sudo gedit ~/.bashrc  
export PATH=/usr/local/cuda/bin:$PATH     #/usr/local/cuda和/usr/local/cuda-10.0是同一個資料夾,前者是後者的映象  
  
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

source ~/.bashrc使之生效;下面是為所有使用者配置環境變數

$ sudo vim /etc/profile
export PATH=/usr/local/cuda/bin:${PATH} # 必須
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:${LD_LIBRARY_PATH} # 非必須,可以用前面介紹的方式

檢驗CUDA 是否安裝成功,輸入:

cd /usr/local/cuda-10.0/samples/1_Utilities/deviceQuery  
  
sudo make  
  
./deviceQuery 

下面是安裝cuDNN v7,我下載的版本是cudnn-10.0-linux-x64-v7.4.1.5.tgz。把他解壓到任何路徑,我的解壓路徑位/usr/local/cudnn下面,解壓後的資料夾名為cuda,資料夾中包含兩個資料夾:一個為include,另一個為lib64。將解壓後的檔案中的lib64資料夾關聯到環境變數中。這一步很重要,sudo gedit ~/.bashrc,輸入如下內容

export LD_LIBRARY_PATH=/your/path/to/cudnn/lib64:$LD_LIBRARY_PATH

其中/your/path/to/cudnn/lib64是指.tgz解壓後的檔案所在路徑中的lib64資料夾。儲存,退出並source一下,再重啟一下Terminal(終端),該步驟可以成功的配置cuDNN的Lib檔案,配置cuDNN的最後一步就是將解壓後的cuDNN資料夾(一般該檔名為cuda)中的include資料夾(/your/path/to/cudnn/include)中的cudnn.h檔案拷貝到/usr/local/cuda/include中,由於進入了系統路徑,因此執行該操作時需要獲取管理員許可權。

   cd cuda/include
   sudo cp *.h /usr/local/cuda/include/
之後,再重置cudnn.h檔案的讀寫許可權: sudo chmod a+r /usr/local/cuda/include/cudnn.h,至此,cuDNN的配置就全部安裝完成了。

下面安裝tensorflow,我選擇的原始碼安裝方式,參考https://github.com/jikexueyuanwiki/tensorflow-zh/blob/master/SOURCE/get_started/os_setup.md以及https://blog.csdn.net/a446712385/article/details/79149977

在終端輸入以下命令:

$ git clone --recurse-submodules https://github.com/tensorflow/tensorflow

–recurse-submodules 引數是必須得, 用於獲取 TesorFlow 依賴的 protobuf 庫.放入home目錄下,下面下載Bazel並安裝之

下載的名字為bazel-0.15.2-installer-linux-x86_64.sh

安裝其他依賴:

sudo apt-get update
sudo apt-get install python-pip python-numpy swig python-dev python-wheel sudo apt-get install pkg-config zip g++ zlib1g-dev unzip
sudo apt-get install default-jdk
 

//For Python 2.7:
sudo apt-get install python-numpy swig python-dev python-wheel

//For Python 3.x:
$ sudo apt-get install python3-numpy swig python3-dev python3-wheel

在這裡使用python3.

export PATH=/usr/bin:$PATH,這是python環境變數的配置

./bazel-0.3.2-installer-linux-x86_64.sh --user
將執行路徑output/bazel 新增到$PATH環境變數後bazel工具就可以使用了,環境變數配置
~/.bashrc下面輸入
export PATH=$HOME/bin:$PATH

下面去配置tensorflow,

進入到它的資料夾下面,執行./configure

這部分是配置tensorflow,然後再生成whl安裝tensorflow。
直接pip安裝就是安裝官網提供的已經配置好的whl包,而原始碼安裝就是利用bazel編譯後,生成whl包,再進行安裝。

(如果是需要開啟GPU,在這裡需要配置cuda和cudnn。因為電腦顯示卡計算能力不夠不能開啟GPU,所以之前沒有安裝cuda和cudnn)

1)配置

You have bazel 0.17.2 installed.
Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3.5


Found possible Python library paths:
  /usr/local/lib/python3.5/dist-packages
  /usr/lib/python3/dist-packages
Please input the desired Python library path to use.  Default is [/usr/local/lib/python3.5/dist-packages]

Do you wish to build TensorFlow with Apache Ignite support? [Y/n]: n
No Apache Ignite support will be enabled for TensorFlow.

Do you wish to build TensorFlow with XLA JIT support? [Y/n]: n
No XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with ROCm support? [y/N]: n
No ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 9.0]: 


Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 


Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]: 


Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 


Do you wish to build TensorFlow with TensorRT support? [y/N]: n
No TensorRT support will be enabled for TensorFlow.

Please specify the locally installed NCCL version you want to use. [Default is to use https://github.com/nvidia/nccl]: 


Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 6.1]: 


Do you want to use clang as CUDA compiler? [y/N]: n
nvcc will be used as CUDA compiler.

Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: 


Do you wish to build TensorFlow with MPI support? [y/N]: n
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: 


Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
    --config=mkl            # Build with MKL support.
    --config=monolithic     # Config for mostly static monolithic build.
    --config=gdr            # Build with GDR support.
    --config=verbs          # Build with libverbs support.
    --config=ngraph         # Build with Intel nGraph support.
Configuration finished
View Code

上面的部分程式碼是參考https://www.cnblogs.com/seniusen/p/9756302.html

以上在配置的過程中可能會出錯,在這裡我把系統預設的Python2改為了python3.5,使用方法如下

備份原來的python2軟連結,sudo mv /usr/bin/python /usr/bin/python.2-bak,然後執行ln -s /usr/local/bin/python3.5 /usr/bin/python,使用python --version測試成功,但是在編譯tensorflow的時候會出現一些問題,NO module named keras.preprocessing,解決方法sudo pip install keras,但是又出現其他的錯誤ModuleNotFoundError: No module named 'pip._internal',解決方法

 wget https://bootstrap.pypa.io/get-pip.py  --no-check-certificate
sudo python get-pip.py

然後測試,pip -V,即可解決。

下面進行編譯

在tensorflow目錄下,輸入以下三個命令:

bazel build -c opt //tensorflow/tools/pip_package:build_pip_package

編譯很久,結束之後,執行

bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

tmp/tensorflow_pkg目錄下找到(whl包的名字可能不一樣,和電腦屬性或者當前tensorflow版本之類的有關),我的名字為tensorflow-1.12.0rc0-cp35-cp35m-linux_x86_64.whl

 將其複製到主資料夾,以便安裝

sudo pip install tensorflow-1.12.0rc0-cp35-cp35m-linux_x86_64.whl

等待安裝完成後,輸入以下命令,不報錯即安裝成功.

 

 測試是否安裝成功

python
#這裡會輸出python的版本資訊,見下圖
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
#這裡會輸出GPU的相關資訊,表明TensorFlow是在GPU上執行的,見下圖
>>> sess.run(hello)
b'Hello, TensorFlow!'
>>> a = tf.constant(10)
>>> b = tf.constant(22)
>>> sess.run(a+b)
32
>>>

 以下是tensorflow c++的介面設定https://www.cnblogs.com/seniusen/p/9756302.html