1. 程式人生 > >使用REST API提交Apache Spark Job​​​​​​​

使用REST API提交Apache Spark Job​​​​​​​

使用REST API提交Apache Spark Job

使用Apache spark時,有時需要從群集外部按需觸發Spark作業。我們可以通過兩種方式在群集中提交Apache spark作業。

  • Spark從Spark叢集中提交

要從spark叢集中提交spark作業,我們使用spark-submit。下面是一個示例shell指令碼,它提交了Spark作業。大多數參與者都是自我解釋的。

<span style="color:#212529"><span style="color:#212529"><code><span style="color:#93a1a1">#!/bin/bash</span>

<span style="color:#22b3eb">$SPARK_HOME</span>/bin/spark-submit <span style="color:#cb4b16">\</span>
 <span style="color:#22b3eb">--class</span> com.nitendragautam.sparkbatchapp.main.Boot <span style="color:#cb4b16">\</span>
<span style="color:#22b3eb">--master</span> spark://192.168.133.128:7077 <span style="color:#cb4b16">\</span>
<span style="color:#22b3eb">--deploy-mode</span> cluster <span style="color:#cb4b16">\</span>
<span style="color:#22b3eb">--supervise</span> <span style="color:#cb4b16">\</span>
<span style="color:#22b3eb">--executor-memory</span> 4G <span style="color:#cb4b16">\</span>
<span style="color:#22b3eb">--driver-memory</span> 4G <span style="color:#cb4b16">\</span>
<span style="color:#22b3eb">--total-executor-cores</span> 2 <span style="color:#cb4b16">\</span>
/home/hduser/sparkbatchapp.jar <span style="color:#cb4b16">\</span>
/home/hduser/NDSBatchApp/input <span style="color:#cb4b16">\</span>
/home/hduser/NDSBatchApp/output/

</code></span></span>
  • 來自Spark叢集外部的REST API

在這篇文章中,我將解釋如何在REST API的幫助下觸發Spark作業。我請在提交Spark Job之前確保Spark Cluster正在執行。

Spark Master

圖:Apache Spark Master

使用Shell指令碼觸發Spark批處理作業

建立一個submit_spark_job.sh以下面的內容命名的Shell指令碼。給shell指令碼

<span style="color:#212529"><span style="color:#212529"><code><span style="color:#93a1a1">#!/bin/bash</span>


curl <span style="color:#22b3eb">-X</span> POST http://192.168.133.128:6066/v1/submissions/create <span style="color:#22b3eb">--header</span> <span style="color:#2aa198">"Content-Type:application/json;charset=UTF-8"</span> <span style="color:#22b3eb">--data</span> <span style="color:#2aa198">'{
  "appResource": "/home/hduser/sparkbatchapp.jar",
  "sparkProperties": {
    "spark.executor.memory": "4g",
    "spark.master": "spark://192.168.133.128:7077",
    "spark.driver.memory": "4g",
    "spark.driver.cores": "2",
    "spark.eventLog.enabled": "false",
    "spark.app.name": "Spark REST API201804291717022",
    "spark.submit.deployMode": "cluster",
    "spark.jars": "/home/hduser/sparkbatchapp.jar",
    "spark.driver.supervise": "true"
  },
  "clientSparkVersion": "2.0.1",
  "mainClass": "com.nitendragautam.sparkbatchapp.main.Boot",
  "environmentVariables": {
    "SPARK_ENV_LOADED": "1"
  },
  "action": "CreateSubmissionRequest",
  "appArgs": [
    "/home/hduser/NDSBatchApp/input",
    "/home/hduser/NDSBatchApp/output/"
  ]
}'</span>
</code></span></span>

一旦火花作業成功執行,您將看到具有以下內容的輸出。

<span style="color:#212529"><span style="color:#212529"><code>
[email protected]: sh submit_spark_job.sh
<span style="color:#859900">{</span>
  <span style="color:#2aa198">"action"</span> : <span style="color:#2aa198">"CreateSubmissionResponse"</span>,
  <span style="color:#2aa198">"message"</span> : <span style="color:#2aa198">"Driver successfully submitted as driver-20180429125849-0001"</span>,
  <span style="color:#2aa198">"serverSparkVersion"</span> : <span style="color:#2aa198">"2.0.1"</span>,
  <span style="color:#2aa198">"submissionId"</span> : <span style="color:#2aa198">"driver-20180429125849-0001"</span>,
  <span style="color:#2aa198">"success"</span> : <span style="color:#b58900">true</span>
<span style="color:#859900">}</span>

</code></span></span>

使用REST API檢查Spark作業的狀態

如果要檢查Spark作業的狀態,可以使用Submission Id和下面的shell指令碼。

<span style="color:#212529"><span style="color:#212529"><code> curl http://192.168.133.128:6066/v1/submissions/status/driver-20180429125849-0001
<span style="color:#859900">{</span>
  <span style="color:#2aa198">"action"</span> : <span style="color:#2aa198">"SubmissionStatusResponse"</span>,
  <span style="color:#2aa198">"driverState"</span> : <span style="color:#2aa198">"FINISHED"</span>,
  <span style="color:#2aa198">"serverSparkVersion"</span> : <span style="color:#2aa198">"2.0.1"</span>,
  <span style="color:#2aa198">"submissionId"</span> : <span style="color:#2aa198">"driver-20180429125849-0001"</span>,
  <span style="color:#2aa198">"success"</span> : <span style="color:#b58900">true</span>,
  <span style="color:#2aa198">"workerHostPort"</span> : <span style="color:#2aa198">"192.168.133.128:38451"</span>,
  <span style="color:#2aa198">"workerId"</span> : <span style="color:#2aa198">"worker-20180429124356-192.168.133.128-38451"</span>
<span style="color:#859900">}</span>
</code></span></span>