SQL Server Always On延遲觀測和預警
阿新 • • 發佈:2021-12-10
延遲是AlwaysON的最大敵人之一。對AlwaysON而言,其首要目標就儘量減少(無法避免)主副本、輔助副本的資料延遲,實現主副本、輔助副本的“資料同步”。只有主副本、輔助副本的同步延遲越小越高,只讀訪問的實性性才會越高,資料庫的RTO(Estimating Failover Time)和RPO(Estimating Potential Data Loss)也才會越小。
但延遲可能存在於AlwaysON同步的各個環節中,因此,在分析現延遲情況時,應該首先理解AlwaysON的同步過程,然後切分到每個過程中進行監控和分析。
AlwaysON同步的6大步驟
在我的上篇文章《AlwaysON的同步原理及同步模式》中,曾介紹過AlwaysON的同步過程。歸結起來,主要包括如下六個步驟:
① log flush(primary)
② log capture(primary)
③ send(primary and secondary)
④ log receive and cache(secondary)
⑤ log hardened(secondary)
⑥ redo(secondary)
前兩個步驟發生在主副本,最後三個步驟發生在輔助副本,中間的第三個步驟發生主副本和輔助副本之間。
另外,如果是同步提交模式,還需要增加一個步驟:輔助副本在步驟5之後,會發送一個(日誌硬化)確認資訊給主副本,然後才能進入redo階段。
通過擴充套件事件跟蹤,我們可以知道日誌塊移動的每個步驟,並且可以確切地知道事務延遲來自何處。
通常,延遲來自三個部分:
- 主庫日誌固化的持續時間:它等於Log_flush_start(步驟2)和Log_flush_complete(步驟3)的時間之和。
- 從庫日誌固化的持續時間:它等於Log_flush_start(步驟10)和Log_flush_complete(步驟11)的時間之和。
- 網路傳送的持續時間 :primary:hadr_log_block_send_complete-> secondary:hadr_transport_receive_log_block_message(步驟6-7)和(secondary:hadr_lsn_send_complete-> primary:hadr_receive_harden_lsn_message(步驟12-13)的時間之和
建立擴充套件事件:
/* Note: this trace could generate very large amount of data very quickly,
depends on the actual transaction rate. On a busy server it can grow several GB per minute,
so do not run the script too long to avoid the impact to the production server. */ CREATE EVENT SESSION [AlwaysOn_Data_Movement_Tracing] ON SERVER
ADD EVENT sqlserver.file_write_completed,
ADD EVENT sqlserver.file_write_enqueued,
ADD EVENT sqlserver.hadr_apply_log_block,
ADD EVENT sqlserver.hadr_apply_vlfheader,
ADD EVENT sqlserver.hadr_capture_compressed_log_cache,
ADD EVENT sqlserver.hadr_capture_filestream_wait,
ADD EVENT sqlserver.hadr_capture_log_block,
ADD EVENT sqlserver.hadr_capture_vlfheader,
ADD EVENT sqlserver.hadr_db_commit_mgr_harden,
ADD EVENT sqlserver.hadr_db_commit_mgr_harden_still_waiting,
ADD EVENT sqlserver.hadr_db_commit_mgr_update_harden,
ADD EVENT sqlserver.hadr_filestream_processed_block,
ADD EVENT sqlserver.hadr_log_block_compression,
ADD EVENT sqlserver.hadr_log_block_decompression,
ADD EVENT sqlserver.hadr_log_block_group_commit ,
ADD EVENT sqlserver.hadr_log_block_send_complete,
ADD EVENT sqlserver.hadr_lsn_send_complete,
ADD EVENT sqlserver.hadr_receive_harden_lsn_message,
ADD EVENT sqlserver.hadr_send_harden_lsn_message,
ADD EVENT sqlserver.hadr_transport_flow_control_action,
ADD EVENT sqlserver.hadr_transport_receive_log_block_message,
ADD EVENT sqlserver.log_block_pushed_to_logpool,
ADD EVENT sqlserver.log_flush_complete ,
ADD EVENT sqlserver.log_flush_start,
ADD EVENT sqlserver.recovery_unit_harden_log_timestamps
ADD TARGET package0.event_file(SET filename=N'c:\mslog\AlwaysOn_Data_Movement_Tracing.xel',max_file_size=(500),max_rollover_files=(4))
WITH (MAX_MEMORY=4096 KB,
EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,
MAX_DISPATCH_LATENCY=30 SECONDS,
MAX_EVENT_SIZE=0 KB,
MEMORY_PARTITION_MODE=NONE,
TRACK_CAUSALITY=OFF,
STARTUP_STATE=ON
) GO
Always ON預警
IF EXISTS(SELECT 1 FROM sys.objects o WHERE o.[object_id]=OBJECT_ID('[dbo].[Proc_DBA_AlwaysonWarning]') AND o.[type] IN(N'P',N'PC')) DROP PROC [dbo].[Proc_DBA_AlwaysonWarning] GO -- ============================================= -- Description: alwayson預警 -- Remark : 所有引數均為產生預警的引數 -- ============================================= CREATE PROCEDURE dbo.Proc_DBA_AlwaysonWarning @syncMode BIT=NULL, --"同步模式" 是否為同步提交,是1否0,如為NULL則不處理。預設為 NULL (不處理是否非同步) @syncStateIsFinished BIT=0, --"同步狀態" 是否為 "SYNCHRONIZED",是1否0,如為NULL則不處理。預設為 0 (如同步狀態為未完成則預警) @syncHealth BIT=0, --"同步健康狀態" 是否為健康, 是1否0,如為NULL則不處理。預設為 0 否 (如健康狀態為不健康則預警) @redoDelaySeconds INT=600, --"Redo延遲(秒)" > 多少則預警。預設為 600 (s) @logDelaySeconds INT=600, --"Log傳送延遲(秒)" > 多少則預警。預設為 600 (s) @redoWaitQueueKB BIGINT=10240, --"Redo等待佇列(KB)" > 多少則預警。預設為 10240 (10MB) @logWaitQueueKB BIGINT=524288 --"Log傳送等待佇列(KB)" > 多少則預警。預設為 524288 (512MB) AS BEGIN SET NOCOUNT ON; ;WITH t AS ( SELECT ar.replica_server_name AS [副本名稱] , ar.availability_mode_desc as [同步模式], DB_NAME(dbr.database_id) AS [資料庫名稱] , dbr.database_state_desc AS [資料庫狀態], dbr.synchronization_state_desc AS [同步狀態], dbr.synchronization_health_desc AS [同步健康狀態], ISNULL(CASE dbr.redo_rate WHEN 0 THEN -1 ELSE CAST(dbr.redo_queue_size AS FLOAT) / dbr.redo_rate END, -1) AS [Redo延遲(秒)] , ISNULL(CASE dbr.log_send_rate WHEN 0 THEN -1 ELSE CAST(dbr.log_send_queue_size AS FLOAT) / dbr.log_send_rate END, -1) AS [Log傳送延遲(秒)] , dbr.redo_queue_size AS [Redo等待佇列(KB)] , dbr.redo_rate AS [Redo速率(KB/S)] , dbr.log_send_queue_size AS [Log傳送等待佇列(KB)] , dbr.log_send_rate AS [Log傳送速率(KB/S)] FROM [master].sys.availability_replicas AS AR INNER JOIN [master].sys.dm_hadr_database_replica_states AS dbr ON ar.replica_id = dbr.replica_id WHERE dbr.redo_queue_size IS NOT NULL ) /* @syncMode BIT=NULL, --"同步模式" 是否為同步提交,是1否0,如為NULL則不處理。預設為 NULL (不處理是否非同步) @syncStateIsFinished BIT=0, --"同步狀態" 是否為 "SYNCHRONIZED",是1否0,如為NULL則不處理。預設為 0 (如同步狀態為未完成則預警) @syncHealth BIT=0, --"同步健康狀態" 是否為健康, 是1否0,如為NULL則不處理。預設為 0 否 (如健康狀態為不健康則預警) @redoDelaySeconds INT=60, --"Redo延遲(秒)" > 多少則預警。預設為 60 (s) @logDelaySeconds INT=600, --"Log傳送延遲(秒)" > 多少則預警。預設為 600 (s) @redoWaitQueueKB BIGINT=10240, --"Redo等待佇列(KB)" > 多少則預警。預設為 10240 (10MB) @logWaitQueueKB BIGINT=524288, --"Log傳送等待佇列(KB)" > 多少則預警。預設為 524288 (512MB) */ SELECT CASE WHEN ( (@syncMode=0 AND [同步模式]!='SYNCHRONOUS_COMMIT') or ( @syncMode=1 AND [同步模式]='SYNCHRONOUS_COMMIT' ) ) OR ( (@syncStateIsFinished=0 AND [同步狀態]!='SYNCHRONIZED') or ( @syncStateIsFinished=1 AND [同步狀態]='SYNCHRONIZED' ) ) OR ( (@syncHealth=0 AND [同步健康狀態]!='HEALTHY') or ( @syncHealth=1 AND [同步健康狀態]='HEALTHY' ) ) OR ( [Redo延遲(秒)] > @redoDelaySeconds ) OR ( [Log傳送延遲(秒)] > @logDelaySeconds ) OR ( [Redo等待佇列(KB)] > @redoWaitQueueKB ) OR ( [Log傳送等待佇列(KB)] > @logWaitQueueKB ) THEN 1 ELSE 0 END AS Warning, [副本名稱], [同步模式], [資料庫名稱], [資料庫狀態], [同步狀態], [同步健康狀態], [Redo延遲(秒)], [Log傳送延遲(秒)], [Redo等待佇列(KB)], [Redo速率(KB/S)], [Log傳送等待佇列(KB)], [Log傳送速率(KB/S)] FROM t END GO EXEC sys.sp_addextendedproperty @name=N'Version', @value=N'2.0' , @level0type=N'SCHEMA',@level0name=N'dbo', @level1type=N'PROCEDURE',@level1name=N'Proc_DBA_AlwaysonWarning'