1. 程式人生 > >Python寫的Spark示例,報錯與解決方法

Python寫的Spark示例,報錯與解決方法

對應的環境變數:

#java
export JAVA_HOME=/usr/local/jdk1.8.0_181  
export PATH=$JAVA_HOME/bin:$PATH
#python
export PYTHON_HOME=/usr/local/python3
export PATH=$PYTHON_HOME/bin:$PATH
#spark
export SPARK_HOME=/usr/local/spark                                                                              export PATH=$SPARK_HOME/bin:$PATH
#add spark to python
export PYTHONPATH=/usr/local/spark/python
#add pyspark to jupyter
export PYSPARK_PYTHON=/usr/local/python3/bin/python3 # 因為我們裝了兩個版本的python,所以要指定pyspark_python,>否則pyspark執行程式會報錯。
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='notebook --allow-root'

使用 python寫的Spark示例:

# -*- coding: utf-8 -*-
from __future__ import print_function
from pyspark import *
import os
if __name__ == '__main__':
    sc = SparkContext("local[4]")
    sc.setLogLevel("WARN")
    rdd = sc.parallelize("hello Pyspark world".split(" "))
    counts = rdd \                                                                                              
       .flatMap(lambda line: line) \
       .map(lambda word: (word, 1)) \
       .reduceByKey(lambda a, b: a + b) \
       .foreach(print)
    sc.stop

出現如下錯誤

Traceback (most recent call last):
  File "test1.py", line 3, in <module>
    from pyspark import *
  File "/usr/local/spark/python/pyspark/__init__.py", line 46, in <module>
    from pyspark.context import SparkContext
  File "/usr/local/spark/python/pyspark/context.py", line 29, in <module>
    from py4j.protocol import Py4JError
ImportError: No module named py4j.protocol

解決方法:

#進入python的目錄
/usr/local/python3/lib/python3.6/site-packages

#拷貝日誌包過來
cp /usr/local/spark/python/lib/py4j-0.10.7-src.zip ./

#解壓
unzip py4j-0.10.7-src.zip