Spark原始碼分析之ResultTask處理

阿新 • • 發佈：2018-12-09

Spark原始碼分析之ResultTask處理

視訊

Spark 原始碼分析之ResultTask原理分析圖解(bilibili視訊):

https://www.bilibili.com/video/av37442139/?p=24
Spark 原始碼分析之ResultTask處理(bilibili視訊):https://www.bilibili.com/video/av37442139/?p=25
Spark 原始碼分析之ResultTask原理分析圖解(youtube視訊):https://youtu.be/8LwOIfxjNqU
Spark 原始碼分析之ResultTask處理(youtube視訊):https://youtu.be/1r7hzIXO11Y

概述

ResultTask 執行當前分割槽的計算，首先從ShuffleMapTask拿到當前分割槽的資料，會從所有的ShuffleMapTask都拿一遍當前的分割槽資料，然後呼叫reduceByKey自定義的函式進行計算

最後合併所有的ResultTask輸出結果，進行輸出

圖解

ResultTask.scala 類

/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *    http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.spark.scheduler

import java.nio.ByteBuffer

import java.io._

import org.apache.spark._
import org.apache.spark.broadcast.Broadcast
import org.apache.spark.rdd.RDD

/**
 * A task that sends back the output to the driver application.
 *
 * See [[Task]] for more information.
 *
 * @param stageId id of the stage this task belongs to
 * @param taskBinary broadcasted version of the serialized RDD and the function to apply on each
 *                   partition of the given RDD. Once deserialized, the type should be
 *                   (RDD[T], (TaskContext, Iterator[T]) => U).
 * @param partition partition of the RDD this task is associated with
 * @param locs preferred task execution locations for locality scheduling
 * @param outputId index of the task in this job (a job can launch tasks on only a subset of the
 *                 input RDD's partitions).
 */
private[spark] class ResultTask[T, U](
    stageId: Int,
    stageAttemptId: Int,
    taskBinary: Broadcast[Array[Byte]],
    partition: Partition,
    locs: Seq[TaskLocation],
    val outputId: Int,
    internalAccumulators: Seq[Accumulator[Long]])
  extends Task[U](stageId, stageAttemptId, partition.index, internalAccumulators)
  with Serializable {

  @transient private[this] val preferredLocs: Seq[TaskLocation] = {
    if (locs == null) Nil else locs.toSet.toSeq
  }

  override def runTask(context: TaskContext): U = {
    // Deserialize the RDD and the func using the broadcast variables.
    val deserializeStartTime = System.currentTimeMillis()
    val ser = SparkEnv.get.closureSerializer.newInstance()
    val (rdd, func) = ser.deserialize[(RDD[T], (TaskContext, Iterator[T]) => U)](
      ByteBuffer.wrap(taskBinary.value), Thread.currentThread.getContextClassLoader)
    _executorDeserializeTime = System.currentTimeMillis() - deserializeStartTime

    metrics = Some(context.taskMetrics)
    func(context, rdd.iterator(partition, context))
  }

  // This is only callable on the driver side.
  override def preferredLocations: Seq[TaskLocation] = preferredLocs

  override def toString: String = "ResultTask(" + stageId + ", " + partitionId + ")"
}

反序列化ResultTask，結果為rdd,和func函式
taskBinary的值是在DAGScheduler.submitMissingTasks()方法中進行序列化的

val ser = SparkEnv.get.closureSerializer.newInstance()
    val (rdd, func) = ser.deserialize[(RDD[T], (TaskContext, Iterator[T]) => U)](
      ByteBuffer.wrap(taskBinary.value), Thread.currentThread.getContextClassLoader)

DAGScheduler中序列化taskBinary:Broadcast引數

 var taskBinary: Broadcast[Array[Byte]] = null
    try {
      // For ShuffleMapTask, serialize and broadcast (rdd, shuffleDep).
      // For ResultTask, serialize and broadcast (rdd, func).
      val taskBinaryBytes: Array[Byte] = stage match {
        case stage: ShuffleMapStage =>
          closureSerializer.serialize((stage.rdd, stage.shuffleDep): AnyRef).array()
        case stage: ResultStage =>
          closureSerializer.serialize((stage.rdd, stage.func): AnyRef).array()
      }

      taskBinary = sc.broadcast(taskBinaryBytes)

ResultTask.runTask()方法
func 函式將Iterator轉換為陣列: RDD.collect()方法中的 (iter: Iterator[T]) => iter.toArray
整個ResultTask計算在 rdd.iterator(partition, context) 中完成
此時的RDD為:ShuffleRDD,所以rdd.iterator()方法呼叫的是ShuffleRDD.iterator()方法，會呼叫ShuffleRDD.compute()方法

func(context, rdd.iterator(partition, context))

RDD.collect()方法

  /**
   * Return an array that contains all of the elements in this RDD.
   */
  def collect(): Array[T] = withScope {
    val results = sc.runJob(this, (iter: Iterator[T]) => iter.toArray)
    Array.concat(results: _*)
  }

ShuffleRDD.compute()方法
通過依賴找到 dep.shuffleHandle()函式，也就是reduceByKey()中自定義的函式
SparkEnv.get.shuffleManager得到預設的SortShuffleManager
呼叫SortShuffleManager.getReader()方法
read()方法，呼叫 BlockStoreShuffleReader.read()方法

  override def compute(split: Partition, context: TaskContext): Iterator[(K, C)] = {
    val dep = dependencies.head.asInstanceOf[ShuffleDependency[K, V, C]]
    SparkEnv.get.shuffleManager.getReader(dep.shuffleHandle, split.index, split.index + 1, context)
      .read()
      .asInstanceOf[Iterator[(K, C)]]
  }

SortShuffleManager.getReader()方法
返回 BlockStoreShuffleReader()物件

  /**
   * Get a reader for a range of reduce partitions (startPartition to endPartition-1, inclusive).
   * Called on executors by reduce tasks.
   */
  override def getReader[K, C](
      handle: ShuffleHandle,
      startPartition: Int,
      endPartition: Int,
      context: TaskContext): ShuffleReader[K, C] = {
    new BlockStoreShuffleReader(
      handle.asInstanceOf[BaseShuffleHandle[K, _, C]], startPartition, endPartition, context)
  }

BlockStoreShuffleReader.read()方法
該方法會拿到ShuffleMapTask輸出的資料，通過ShuffleBlockFetcherIterator()可以拿到所有ShuffleMapTask輸出的檔案資料(並且是當前partition的資料)，把這些資料反序列化放到可迭代變數recordIter中

 /** Read the combined key-values for this reduce task */
  override def read(): Iterator[Product2[K, C]] = {
    val streamWrapper: (BlockId, InputStream) => InputStream = { (blockId, in) =>
      blockManager.wrapForCompression(blockId,
        CryptoStreamUtils.wrapForEncryption(in, blockManager.conf))
    }

    val wrappedStreams = new ShuffleBlockFetcherIterator(
      context,
      blockManager.shuffleClient,
      blockManager,
      mapOutputTracker.getMapSizesByExecutorId(handle.shuffleId, startPartition, endPartition),
      streamWrapper,
      // Note: we use getSizeAsMb when no suffix is provided for backwards compatibility
      SparkEnv.get.conf.getSizeAsMb("spark.reducer.maxSizeInFlight", "48m") * 1024 * 1024,
      SparkEnv.get.conf.getBoolean("spark.shuffle.detectCorrupt", true))

    val ser = Serializer.getSerializer(dep.serializer)
    val serializerInstance = ser.newInstance()

    // Create a key/value iterator for each stream
    val recordIter = wrappedStreams.flatMap { case (blockId, wrappedStream) =>
      // Note: the asKeyValueIterator below wraps a key/value iterator inside of a
      // NextIterator. The NextIterator makes sure that close() is called on the
      // underlying InputStream when all records have been read.
      serializerInstance.deserializeStream(wrappedStream).asKeyValueIterator
    }

    // Update the context task metrics for each record read.
    val readMetrics = context.taskMetrics.createShuffleReadMetricsForDependency()
    val metricIter = CompletionIterator[(Any, Any), Iterator[(Any, Any)]](
      recordIter.map(record => {
        readMetrics.incRecordsRead(1)
        record
      }),
      context.taskMetrics().updateShuffleReadMetrics())

    // An interruptible iterator must be used here in order to support task cancellation
    val interruptibleIter = new InterruptibleIterator[(Any, Any)](context, metricIter)

    val aggregatedIter: Iterator[Product2[K, C]] = if (dep.aggregator.isDefined) {
      if (dep.mapSideCombine) {
        // We are reading values that are already combined
        val combinedKeyValuesIterator = interruptibleIter.asInstanceOf[Iterator[(K, C)]]
        dep.aggregator.get.combineCombinersByKey(combinedKeyValuesIterator, context)
      } else {
        // We don't know the value type, but also don't care -- the dependency *should*
        // have made sure its compatible w/ this aggregator, which will convert the value
        // type to the combined type C
        val keyValuesIterator = interruptibleIter.asInstanceOf[Iterator[(K, Nothing)]]
        dep.aggregator.get.combineValuesByKey(keyValuesIterator, context)
      }
    } else {
      require(!dep.mapSideCombine, "Map-side combine without Aggregator specified!")
      interruptibleIter.asInstanceOf[Iterator[Product2[K, C]]]
    }

    // Sort the output if there is a sort ordering defined.
    dep.keyOrdering match {
      case Some(keyOrd: Ordering[K]) =>
        // Create an ExternalSorter to sort the data. Note that if spark.shuffle.spill is disabled,
        // the ExternalSorter won't spill to disk.
        val sorter =
          new ExternalSorter[K, C, C](context, ordering = Some(keyOrd), serializer = Some(ser))
        sorter.insertAll(aggregatedIter)
        context.taskMetrics().incMemoryBytesSpilled(sorter.memoryBytesSpilled)
        context.taskMetrics().incDiskBytesSpilled(sorter.diskBytesSpilled)
        context.internalMetricsToAccumulators(
          InternalAccumulator.PEAK_EXECUTION_MEMORY).add(sorter.peakMemoryUsedBytes)
        CompletionIterator[Product2[K, C], Iterator[Product2[K, C]]](sorter.iterator, sorter.stop())
      case None =>
        aggregatedIter
    }
  }

BlockStoreShuffleReader.read()方法詳解
該方法會拿到ShuffleMapTask輸出的資料，通過ShuffleBlockFetcherIterator()可以拿到所有ShuffleMapTask輸出的檔案資料(並且是當前partition的資料)，把這些資料反序列化放到可迭代變數recordIter中

val streamWrapper: (BlockId, InputStream) => InputStream = { (blockId, in) =>
      blockManager.wrapForCompression(blockId,
        CryptoStreamUtils.wrapForEncryption(in, blockManager.conf))
    }

    val wrappedStreams = new ShuffleBlockFetcherIterator(
      context,
      blockManager.shuffleClient,
      blockManager,
      mapOutputTracker.getMapSizesByExecutorId(handle.shuffleId, startPartition, endPartition),
      streamWrapper,
      // Note: we use getSizeAsMb when no suffix is provided for backwards compatibility
      SparkEnv.get.conf.getSizeAsMb("spark.reducer.maxSizeInFlight", "48m") * 1024 * 1024,
      SparkEnv.get.conf.getBoolean("spark.shuffle.detectCorrupt", true))

    val ser = Serializer.getSerializer(dep.serializer)
    val serializerInstance = ser.newInstance()

    // Create a key/value iterator for each stream
    val recordIter = wrappedStreams.flatMap { case (blockId, wrappedStream) =>
      // Note: the asKeyValueIterator below wraps a key/value iterator inside of a
      // NextIterator. The NextIterator makes sure that close() is called on the
      // underlying InputStream when all records have been read.
      serializerInstance.deserializeStream(wrappedStream).asKeyValueIterator
    }

BlockStoreShuffleReader.read()方法詳解
把recordIter 放到 metricIter中(ShuffleMapTask中的輸出資料檔案都在這裡邊)
把metricIter作為例項化引數傳給InterruptibleIterator，賦值給變數interruptibleIter
把interruptibleIter轉化為可迭代的變數 combinedKeyValuesIterator
把迭代變數傳給 dep.aggregator.get.combineCombinersByKey(combinedKeyValuesIterator, context)，賦值給可迭代變數： aggregatedIter
判斷 dep.keyOrdering 有沒有排序，如果沒有，直接返回 aggregatedIter
如果dep.keyOrdering 有有排序，則通過ExternalSorter 演算法進行排序處理，再返回結果

  // Create a key/value iterator for each stream
    val recordIter = wrappedStreams.flatMap { case (blockId, wrappedStream) =>
      // Note: the asKeyValueIterator below wraps a key/value iterator inside of a
      // NextIterator. The NextIterator makes sure that close() is called on the
      // underlying InputStream when all records have been read.
      serializerInstance.deserializeStream(wrappedStream).asKeyValueIterator
    }

    // Update the context task metrics for each record read.
    val readMetrics = context.taskMetrics.createShuffleReadMetricsForDependency()
    val metricIter = CompletionIterator[(Any, Any), Iterator[(Any, Any)]](
      recordIter.map(record => {
        readMetrics.incRecordsRead(1)
        record
      }),
      context.taskMetrics().updateShuffleReadMetrics())

    // An interruptible iterator must be used here in order to support task cancellation
    val interruptibleIter = new InterruptibleIterator[(Any, Any)](context, metricIter)

    val aggregatedIter: Iterator[Product2[K, C]] = if (dep.aggregator.isDefined) {
      if (dep.mapSideCombine) {
        // We are reading values that are already combined
        val combinedKeyValuesIterator = interruptibleIter.asInstanceOf[Iterator[(K, C)]]
        dep.aggregator.get.combineCombinersByKey(combinedKeyValuesIterator, context)
      } else {
        // We don't know the value type, but also don't care -- the dependency *should*
        // have made sure its compatible w/ this aggregator, which will convert the value
        // type to the combined type C
        val keyValuesIterator = interruptibleIter.asInstanceOf[Iterator[(K, Nothing)]]
        dep.aggregator.get.combineValuesByKey(keyValuesIterator, context)
      }
    } else {
      require(!dep.mapSideCombine, "Map-side combine without Aggregator specified!")
      interruptibleIter.asInstanceOf[Iterator[Product2[K, C]]]
    }

Spark原始碼分析之ResultTask處理

Spark原始碼分析之ResultTask處理更多資源 SPARK 原始碼分析技術分享(bilibilid視訊彙總套裝視訊): https://www.bilibili.com/video/av37442139/ github: https://github.com/open

Spark 原始碼分析之ShuffleMapTask處理

Spark 原始碼分析之ShuffleMapTask處理更多資源 SPARK 原始碼分析技術分享(bilibilid視訊彙總套裝視訊): https://www.bilibili.com/video/av37442139/ github: https://github.com

Spark原始碼分析之ShuffleMapTask處理

Spark原始碼分析之ShuffleMapTask處理更多資源 SPARK 原始碼分析技術分享(bilibilid視訊彙總套裝視訊): https://www.bilibili.com/video/av37442139/ github: https://github.com/opensour

Spark原始碼分析之Spark Shell（上）

https://www.cnblogs.com/xing901022/p/6412619.html 文中分析的spark版本為apache的spark-2.1.0-bin-hadoop2.7。 bin目錄結構： -rwxr-xr-x. 1 bigdata bigdata 1089 Dec

Spark 原始碼分析之ShuffleMapTask記憶體資料Spill和合並

前置條件 Hadoop版本: Hadoop 2.6.0-cdh5.15.0 Spark版本: SPARK 1.6.0-cdh5.15.0 JDK.1.8.0_191 scala2.10.7 技能標籤 Spark ShuffleMapTask 記憶體中的資

spark原始碼分析之BypassMergeSortShuffleWriter

概述 spark1.6以後，取消了基於hash的shuffle，只剩下基於sort的shuffle。現在只存在以下三種shuffle writer： BypassMergeSortShuffleWriter UnsafeShuffleWriter SortShuffl

spark原始碼分析之ShuffleExternalSorter

概述 ShuffleExternalSorter是專門用於sort-based shuffle的external sorter。傳入的record會被追加到data page。當所有的record都已經插入該sorter時，或者當前執行緒的shuffle memory已

spark原始碼分析之TaskMemoryManager

概述 TaskMemoryManager用於管理每個task分配的記憶體。在off-heap記憶體模式中，可以用64-bit的地址來表示記憶體地址。在on-heap記憶體模式中，通過base object的引用和該物件中64-bit 的偏移量來表示記憶體地址。當我

spark原始碼分析之NioBufferedFileInputStream

NioBufferedFileInputStream是spark實現的一種新的位元組流，它既支援內部緩衝區，又支援nio讀取檔案，使用direct buffer避免java堆與native記憶體之間的資料拷貝。在Java jdk中沒有可供直接使用的具備以上2個功能的位元組流。

spark原始碼分析之UnsafeShuffleWriter

概述 SortShuffleManager會判斷在滿足以下條件時呼叫UnsafeShuffleWriter，否則降級為使用SortShuffleWriter： Serializer支援relocation。這是指Serializer可以對已經序列化的物件進行排序，這種排

spark原始碼分析之ReadAheadInputStream

概述 ReadAheadInputStream實現了從當前buffer讀取的data耗盡時，切換到另外一個buffer讀取資料，並啟動任務從底層輸入流非同步預讀data，放入耗盡的buffer中。它通過2個buffer來完成——active buffer和read ah

Spark原始碼分析之三：Stage劃分

Stage劃分的大體流程如下圖所示：前面提到，對於JobSubmitted事件，我們通過呼叫DAGScheduler的handleJobSubmitted()方法來處理。那麼我們先來看下程式碼： // 處理Job提交的函式 pri

Spring Cloud Netflix Zuul原始碼分析之請求處理篇-上

微信公眾號：I am CR7如有問題或建議，請在下方留言;最近更新：2019-01-03 微信公眾號：I am CR7如有問題或建議，請在下方留言最近更新：2019-01-03 前言經過前面兩篇文章的鋪墊，大戲正式上場。本文將對zuul是如何根據配置

spark原始碼分析之Master原始碼主備切換機制分析

Master原始碼分析之主備切換機制 1.當選為leader之後的操作 //ElectedLeader 當選leader case ElectedLeader => {

Spark原始碼分析之Sort-Based Shuffle讀寫流程

override def read(): Iterator[Product2[K, C]] = { // 構造ShuffleBlockFetcherIterator，一個迭代器，它獲取多個塊，對於本地塊，從本地讀取 // 對於遠端塊，通過遠端方法讀取val blockFetcherItr = new

Spark原始碼分析之Master資源排程演算法原理

Master是通過schedule方法進行資源排程，告知worker啟動executor等。一schedule方法 1判斷master狀態，只有alive狀態的master才可以進行資源排程，sta

spark原始碼分析之worker原理篇

解釋： 1、master要求worker啟動driver和executor 2、worker啟動driver的一個基本的原理，worker會啟動一個執行緒DriverRunner，然後DriverRunner會去負責啟動driver程序，然後在之後對d

Spark原始碼分析之Master註冊機制原理

一 Worker向Master註冊 1.1 Worker啟動，呼叫registerWithMaster，向Master註冊當worker啟動的時候，會呼叫registerWithMaster方法

spark 原始碼分析之四 -- TaskScheduler的建立和啟動過程

　　在 spark 原始碼分析之二 -- SparkContext 的初始化過程中，第 14 步和 16 步分別描述了 TaskScheduler的初始化和啟動過程。　　話分兩頭，先說 TaskScheduler的初始化過程　TaskScheduler的例項化 1

spark 原始碼分析之十三 -- SerializerManager剖析

對SerializerManager的說明：它是為各種Spark元件配置序列化，壓縮和加密的元件，包括自動選擇用於shuffle的Serializer。spark中的資料在network IO 或 local disk IO傳輸過程中。都需要序列化。其預設的 Serializer 是 org.a

Spark原始碼分析之ResultTask處理

Spark原始碼分析之ResultTask處理

更多資源

視訊

概述

圖解

ResultTask.scala 類

相關推薦