1. 程式人生 > 其它 >HADOOP YARN(3):YARN案例實操(1)

HADOOP YARN(3):YARN案例實操(1)

1 Yarn生產環境核心引數配置案例

1)需求:從1G資料中,統計每個單詞出現次數。伺服器3臺,每臺配置4G記憶體,4CPU4執行緒。

2)需求分析:

1G/ 128m = 8個MapTask1ReduceTask1mrAppMaster

平均每個節點執行10/ 3 3個任務(4 3 3

3)修改yarn-site.xml配置引數如下:

<!-- 選擇排程器,預設容量 -->
<property>
    <description>The class to use as the resource scheduler.</description>
    <name>yarn.resourcemanager.scheduler.class
</name> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value> </property> <!-- ResourceManager處理排程器請求的執行緒數量,預設50;如果提交的任務數大於50,可以增加該值,但是不能超過3臺 * 4執行緒 = 12執行緒(去除其他應用程式實際不能超過8) --> <property> <description>Number of threads to handle scheduler interface
.</description> <name>yarn.resourcemanager.scheduler.client.thread-count</name> <value>8</value> </property> <!-- 是否讓yarn自動檢測硬體進行配置,預設是false,如果該節點有很多其他應用程式,建議手動配置。如果該節點沒有其他應用程式,可以採用自動 --> <property> <description>Enable auto-detection of node capabilities such as
memory and CPU. </description> <name>yarn.nodemanager.resource.detect-hardware-capabilities</name> <value>false</value> </property> <!-- 是否將虛擬核數當作CPU核數,預設是false,採用物理CPU核數 --> <property> <description>Flag to determine if logical processors(such as hyperthreads) should be counted as cores. Only applicable on Linux when yarn.nodemanager.resource.cpu-vcores is set to -1 and yarn.nodemanager.resource.detect-hardware-capabilities is true. </description> <name>yarn.nodemanager.resource.count-logical-processors-as-cores</name> <value>false</value> </property> <!-- 虛擬核數和物理核數乘數,預設是1.0 --> <property> <description>Multiplier to determine how to convert phyiscal cores to vcores. This value is used if yarn.nodemanager.resource.cpu-vcores is set to -1(which implies auto-calculate vcores) and yarn.nodemanager.resource.detect-hardware-capabilities is set to true. The number of vcores will be calculated as number of CPUs * multiplier. </description> <name>yarn.nodemanager.resource.pcores-vcores-multiplier</name> <value>1.0</value> </property> <!-- NodeManager使用記憶體數,預設8G,修改為4G記憶體 --> <property> <description>Amount of physical memory, in MB, that can be allocated for containers. If set to -1 and yarn.nodemanager.resource.detect-hardware-capabilities is true, it is automatically calculated(in case of Windows and Linux). In other cases, the default is 8192MB. </description> <name>yarn.nodemanager.resource.memory-mb</name> <value>4096</value> </property> <!-- nodemanager的CPU核數,不按照硬體環境自動設定時預設是8個,修改為4個 --> <property> <description>Number of vcores that can be allocated for containers. This is used by the RM scheduler when allocating resources for containers. This is not used to limit the number of CPUs used by YARN containers. If it is set to -1 and yarn.nodemanager.resource.detect-hardware-capabilities is true, it is automatically determined from the hardware in case of Windows and Linux. In other cases, number of vcores is 8 by default.</description> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>4</value> </property> <!-- 容器最小記憶體,預設1G --> <property> <description>The minimum allocation for every container request at the RM in MBs. Memory requests lower than this will be set to the value of this property. Additionally, a node manager that is configured to have less memory than this value will be shut down by the resource manager. </description> <name>yarn.scheduler.minimum-allocation-mb</name> <value>1024</value> </property> <!-- 容器最大記憶體,預設8G,修改為2G --> <property> <description>The maximum allocation for every container request at the RM in MBs. Memory requests higher than this will throw an InvalidResourceRequestException. </description> <name>yarn.scheduler.maximum-allocation-mb</name> <value>2048</value> </property> <!-- 容器最小CPU核數,預設1個 --> <property> <description>The minimum allocation for every container request at the RM in terms of virtual CPU cores. Requests lower than this will be set to the value of this property. Additionally, a node manager that is configured to have fewer virtual cores than this value will be shut down by the resource manager. </description> <name>yarn.scheduler.minimum-allocation-vcores</name> <value>1</value> </property> <!-- 容器最大CPU核數,預設4個,修改為2個 --> <property> <description>The maximum allocation for every container request at the RM in terms of virtual CPU cores. Requests higher than this will throw an InvalidResourceRequestException.</description> <name>yarn.scheduler.maximum-allocation-vcores</name> <value>2</value> </property> <!-- 虛擬記憶體檢查,預設開啟,修改為關閉 --> <property> <description>Whether virtual memory limits will be enforced for containers.</description> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> </property> <!-- 虛擬記憶體和實體記憶體設定比例,預設2.1 --> <property> <description>Ratio between virtual memory to physical memory when setting memory limits for containers. Container allocations are expressed in terms of physical memory, and virtual memory usage is allowed to exceed this allocation by this ratio. </description> <name>yarn.nodemanager.vmem-pmem-ratio</name> <value>2.1</value> </property>

2 容量排程器多佇列提交案例

1)在生產環境怎麼建立佇列?

1)排程器預設就1default佇列,不能滿足生產要求。

2)按照框架:hive /spark/ flink 每個框架的任務放入指定的佇列(企業用的不是特別多)

3)按照業務模組:登入註冊、購物車、下單、業務部門1、業務部門2

2)建立多佇列的好處?

1)因為擔心員工不小心,寫遞迴死迴圈程式碼,把所有資源全部耗盡。

2)實現任務的降級使用,特殊時期保證重要的任務佇列資源充足。11.116.18

業務部門1(重要)=》業務部門2(比較重要)=》下單(一般)=》購物車(一般)=》登入註冊(次要)

2.1需求

需求1default佇列佔總記憶體的40%,最大資源容量佔總資源60%hive佇列佔總記憶體的60%,最大資源容量佔總資源80%

需求2:配置佇列優先順序

2.2配置多佇列的容量排程器

1)在capacity-scheduler.xml中配置如下:

1)修改如下配置

<!-- 指定多佇列,增加hive佇列 -->
<property>
    <name>yarn.scheduler.capacity.root.queues</name>
    <value>default,hive</value>
    <description>
      The queues at the this level (root is the root queue).
    </description>
</property>

<!-- 降低default佇列資源額定容量為40%,預設100% -->
<property>
    <name>yarn.scheduler.capacity.root.default.capacity</name>
    <value>40</value>
</property>

<!-- 降低default佇列資源最大容量為60%,預設100% -->
<property>
    <name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
    <value>60</value>
</property>

2)為新加佇列新增必要屬性:

<!-- 指定hive佇列的資源額定容量 -->
<property>
    <name>yarn.scheduler.capacity.root.hive.capacity</name>
    <value>60</value>
</property>

<!-- 使用者最多可以使用佇列多少資源,1表示 -->
<property>
    <name>yarn.scheduler.capacity.root.hive.user-limit-factor</name>
    <value>1</value>
</property>

<!-- 指定hive佇列的資源最大容量 -->
<property>
    <name>yarn.scheduler.capacity.root.hive.maximum-capacity</name>
    <value>80</value>
</property>

<!-- 啟動hive佇列 -->
<property>
    <name>yarn.scheduler.capacity.root.hive.state</name>
    <value>RUNNING</value>
</property>

<!-- 哪些使用者有權向佇列提交作業 -->
<property>
    <name>yarn.scheduler.capacity.root.hive.acl_submit_applications</name>
    <value>*</value>
</property>

<!-- 哪些使用者有權操作佇列,管理員許可權(檢視/殺死) -->
<property>
    <name>yarn.scheduler.capacity.root.hive.acl_administer_queue</name>
    <value>*</value>
</property>

<!-- 哪些使用者有權配置提交任務優先順序 -->
<property>
    <name>yarn.scheduler.capacity.root.hive.acl_application_max_priority</name>
    <value>*</value>
</property>

<!-- 任務的超時時間設定:yarn application -appId appId -updateLifetime Timeout
參考資料:https://blog.cloudera.com/enforcing-application-lifetime-slas-yarn/ -->

<!-- 如果application指定了超時時間,則提交到該佇列的application能夠指定的最大超時時間不能超過該值。 
-->
<property>
    <name>yarn.scheduler.capacity.root.hive.maximum-application-lifetime</name>
    <value>-1</value>
</property>

<!-- 如果application沒指定超時時間,則用default-application-lifetime作為預設值 -->
<property>
    <name>yarn.scheduler.capacity.root.hive.default-application-lifetime</name>
    <value>-1</value>
</property>

2)分發配置檔案

3)重啟Yarn或者執行yarn rmadmin -refreshQueues重新整理佇列,就可以看到兩條佇列:

[atguigu@hadoop102 hadoop-3.1.3]$ yarn rmadmin -refreshQueues

2.4 任務優先順序

容量排程器,支援任務優先順序的配置,在資源緊張時,優先順序高的任務將優先獲取資源。預設情況,Yarn將所有任務的優先順序限制為0,若想使用任務的優先順序功能,須開放該限制。

1)修改yarn-site.xml檔案,增加以下引數

<property>
    <name>yarn.cluster.max-application-priority</name>
    <value>5</value>
</property>

2)分發配置,並重啟Yarn

[atguigu@hadoop102 hadoop]$ xsync yarn-site.xml
[atguigu@hadoop103 hadoop-3.1.3]$ sbin/stop-yarn.sh
[atguigu@hadoop103 hadoop-3.1.3]$ sbin/start-yarn.sh

3)模擬資源緊張環境,可連續提交以下任務,直到新提交的任務申請不到資源為止。

[atguigu@hadoop102 hadoop-3.1.3]$ hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar pi 5 2000000

4)再次重新提交優先順序高的任務

[atguigu@hadoop102 hadoop-3.1.3]$ hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar pi  -D mapreduce.job.priority=5 5 2000000

5)也可以通過以下命令修改正在執行的任務的優先順序。

yarn application -appID <ApplicationID> -updatePriority 優先順序

[atguigu@hadoop102 hadoop-3.1.3]$ yarn application -appID application_1611133087930_0009 -updatePriority 5

本文來自部落格園,作者:秋華,轉載請註明原文連結:https://www.cnblogs.com/qiu-hua/p/15229175.html