prometheus教程：一篇文章講懂prometheus

阿新 • • 發佈：2022-02-19

作為雲原生體系下的“預設”監控系統，prometheus正在獲得越來越廣泛的關注。今天，我們就寫一篇教程，講一下prometheus的設計理念，看看它是如何用非常簡單的設計支撐起如此複雜的功能的。

首先，我們來思考一下，如果要做一個類似prometheus的監控系統，都有哪些難點，比如

每個服務的監控需求都不一樣，那麼對於監控系統來說，要怎麼設計其資料模型，才能取得易用性和通用性之間的平衡
大量的資料量要如何儲存
怎樣能實現各種複雜的報表
...

帶著這些問題，我們就來看看prometheus是怎麼設計的。

歷史

讓我們先從歷史說起，prometheus最早由SoundCloud開發，後來捐贈到開源社群。在2016年假如CNCF, 即雲原生計算基金會。Prometheus是CNCF的第二個專案，僅次於kubernets。因此，可想而知，promethous在整個雲原生體系中有多麼重要的作用。Prometheus也逐漸成了雲原生下監控系統的事實標準。

核心設計理念

對於一個監控系統來說，核心要解決的問題其實就三個：

監控指標用什麼形式表示
怎麼收集和儲存指標
怎麼利用指標生成報表

對於這三個問題，prometheus都給出了很巧妙的解決方案。

資料模型

romethous的資料模型，簡而言之，就是一個「時序」的 Metric資料。所謂metric, 就是資料的測量值，而所謂時序，就是這些metric, 會源源不斷的產生不同時間點的資料。

Metric有唯一的名稱標識，也可以設定多個label, 可以用於過濾和聚合，其格式如下。

<metric name>{<label name>=<label value>, ...}

這樣，對於任何業務，我們都可以將監控資料設計成統一的metric格式。這樣對於promethous來說，方案可以足夠簡單，只用處理這一種資料格式就可以。而同時又足以方便的應對千變萬化的業務場景。

Prometheus提供了 counter, gauge, histogram, summary 四種核心的metric, 不過其區別僅體現在client端和promQL中。截至目前(2021.11)，不同的metric 型別在 prometheus server 這一側並不會有什麼區別，

資料收集和儲存

Prometheus server會定時從要監控的服務暴露出的http介面上抓取資料，是一種典型的拉模型。

相對推模型，拉模型會有一些好處，比如更容易監測某一個節點是否正常；更容易本地除錯等。當然，對於一個監控系統來說，採用推還是拉，其實並不是一個主要問題。

Prometheus的資料是典型的時序資料，prometheus本身會將資料儲存在本地磁碟上。要注意的是，本地儲存不可複製，無法構建叢集，如果本地磁碟或節點出現故障，儲存將無法擴充套件和遷移。因此一般只能把本地儲存視為近期資料的短暫滑動視窗。

而關於持久化儲存的問題，prometheus實際上並沒有試圖解決。它的做法是定義出標準的讀寫介面，從而可以將資料儲存到任意一個第三方儲存上。

生成報表

Prometheus定義了功能強大的promQL, 可以滿足各種複雜的查詢場景，具體可參考 https://prometheus.io/docs/prometheus/latest/querying/basics/

周邊生態

一個開源專案的發展，當然離不開周邊生態的發展。而prometheus目前已經有了很完善的生態，在java, go, python等主流的開發語言下，都有完善的client包可以使用；像spring中，可以很容易的為多種元件增加打點，這一點，在下邊的實戰環節我們會細講；在kubernetes中，可以輕易的配置自動去各個節點抓取prometheus資料；藉助grafana等工具，也可以配置出多種多樣的報表。

實戰

教程的接下來一部分，我們會以springboot專案為例，來看一看prometheus的實際效果。

其核心思路就是使用spring-actuator 為springboot應用配置監控，並以promethous的結構暴露出來。

首先，引入依賴

implementation("org.springframework.boot:spring-boot-starter-actuator")
implementation("io.micrometer:micrometer-registry-prometheus")

然後新增spring配置

management:
  endpoints:
    web:
      exposure:
        include: "prometheus"
  metrics:
    distribution:
      sla:
        http:
          server:
            requests: "100ms,150ms,250ms,500ms,1s"
      percentiles-histogram:
        http:
          server:
            requests: true
    web:
      server:
        request:
          autotime:
            enabled: true
    export:
      prometheus:
        enabled: true
    tags:
      application: name

這個配置裡，其實做了幾件事：將資料以prometheus的格式暴露出來；自動為http請求新增histogram監控；增加一個application標識，這個標識會作為一個label出現在所有metric中。

之後，啟動springboot專案，並且訪問/actuator/prometheus路徑，就可以看到大量metric, 比如

# HELP executor_pool_size_threads The current number of threads in the pool
# TYPE executor_pool_size_threads gauge
executor_pool_size_threads{application="ads-programad",name="asyncExecutor",} 0.0
# HELP tomcat_servlet_request_seconds  
# TYPE tomcat_servlet_request_seconds summary
tomcat_servlet_request_seconds_count{application="ads-programad",name="dispatcherServlet",} 1.0
tomcat_servlet_request_seconds_sum{application="ads-programad",name="dispatcherServlet",} 0.0
# HELP executor_pool_core_threads The core number of threads for the pool
# TYPE executor_pool_core_threads gauge
executor_pool_core_threads{application="ads-programad",name="asyncExecutor",} 70.0
# HELP jvm_classes_unloaded_classes_total The total number of classes unloaded since the Java virtual machine has started execution
# TYPE jvm_classes_unloaded_classes_total counter
jvm_classes_unloaded_classes_total{application="ads-programad",} 0.0
# HELP executor_completed_tasks_total The approximate total number of tasks that have completed execution
# TYPE executor_completed_tasks_total counter
executor_completed_tasks_total{application="ads-programad",name="asyncExecutor",} 0.0
# HELP tomcat_threads_config_max_threads  
# TYPE tomcat_threads_config_max_threads gauge
tomcat_threads_config_max_threads{application="ads-programad",name="http-nio-9000",} 500.0
# HELP process_cpu_usage The "recent cpu usage" for the Java Virtual Machine process
# TYPE process_cpu_usage gauge
process_cpu_usage{application="ads-programad",} 0.0
# HELP tomcat_sessions_active_current_sessions  
# TYPE tomcat_sessions_active_current_sessions gauge
tomcat_sessions_active_current_sessions{application="ads-programad",} 0.0
# HELP jvm_memory_committed_bytes The amount of memory in bytes that is committed for the Java virtual machine to use
# TYPE jvm_memory_committed_bytes gauge
jvm_memory_committed_bytes{application="ads-programad",area="heap",id="G1 Eden Space",} 3.5651584E7
jvm_memory_committed_bytes{application="ads-programad",area="heap",id="G1 Old Gen",} 4.6137344E7
jvm_memory_committed_bytes{application="ads-programad",area="nonheap",id="Compressed Class Space",} 5767168.0
jvm_memory_committed_bytes{application="ads-programad",area="nonheap",id="CodeHeap 'non-profiled nmethods'",} 8847360.0
jvm_memory_committed_bytes{application="ads-programad",area="nonheap",id="CodeHeap 'non-nmethods'",} 2555904.0
jvm_memory_committed_bytes{application="ads-programad",area="nonheap",id="Metaspace",} 4.2287104E7
jvm_memory_committed_bytes{application="ads-programad",area="heap",id="G1 Survivor Space",} 4194304.0
# HELP tomcat_servlet_request_max_seconds  
# TYPE tomcat_servlet_request_max_seconds gauge
tomcat_servlet_request_max_seconds{application="ads-programad",name="dispatcherServlet",} 0.0
# HELP tomcat_connections_current_connections  
# TYPE tomcat_connections_current_connections gauge
tomcat_connections_current_connections{application="ads-programad",name="http-nio-9000",} 3.0
# HELP tomcat_sessions_active_max_sessions  
# TYPE tomcat_sessions_active_max_sessions gauge
...

其中，除了我們顯式配置的http監控，其實還有大量的jvm, 機器負載等基礎的監控資訊。

除此之外，對於其他元件的監控也很容易新增，諸如執行緒池、http連線池、自定義監控等，可以參考 https://github.com/lcy362/springboot-prometheus-demo

這樣，無論這個springboot專案如何部署，無論是用java原生的部署，還是用docker部署，還是部署在kubernetes上，都可以非常容易的獲取各個監控metrics資料。

原文地址: http://lichuanyang.top/posts/28288/

prometheus教程：一篇文章講懂prometheus

歷史

核心設計理念

資料模型

資料收集和儲存

生成報表

周邊生態

實戰

prometheus教程：一篇文章講懂prometheus

一篇文章搞懂python的轉義字元及用法

全網最詳細的Python3基礎教程！一篇文章讓你入門！

一篇文章看懂JavaScript中的回撥

建立上下文_一篇文章看懂js上下文

一篇文章弄懂javascript記憶體洩漏

一篇文章弄懂ECMAScript中的操作符

一篇文章弄懂PHP和HTML的巢狀寫法

【一篇文章搞懂】2021年Android工作或許更難找，年薪超過80萬！

【一篇文章搞懂】史上最通俗計算機網路分層詳解，趕緊收藏！

一篇文章秒懂selenium常用方法

一篇文章弄懂js中的typeof用法

一篇文章講明白vue3的script setup，擁抱組合式API！

一篇文章搞懂Mysql(乾貨!)

Python與C互動之指標，一篇文章搞懂核心程式設計

前端HTML入門教程，一篇文章搞定，你就是web前端行內人了

Java多執行緒詳解——一篇文章搞懂Java多執行緒

（三）一篇文章搞懂FileBeat

一篇文章，讓你看懂 Spring Cloud 之 Eureka

一篇文章讓你搞懂什麼是HTTP協議

prometheus教程： 一篇文章講懂prometheus

歷史

核心設計理念

資料模型

資料收集和儲存

生成報表

周邊生態

實戰

相關推薦

prometheus教程：一篇文章講懂prometheus