dg broker校驗失敗的一個奇怪問題(二) (r8筆記第51天)
對昨天提出的問題做了一個簡單的分析和排查,也算是有了一個交代,上一篇文章在 dg broker校驗失敗的一個奇怪問題
我查看了最近的日誌,發現在半個月以前有一行日誌引起了我的注意。 Thu Mar 03 17:32:12 2016 ALTER SYSTEM SET log_archive_dest_state_2='DEFER' SCOPE=BOTH; 關於這個DEFER的設定,讓我想起了之前的一個設定。 原來的主庫發生了硬體電源故障,啟用備用電源之後,勉強撐了幾個小時,因為資料庫之前使用的異機邏輯備份,恢復起來還是需要些時間,直接就找了臺機器搭建 了dataguard,然後做了switchover,把資料庫遷移到了新的伺服器上,然後在新的備庫上又搭建了一套相應的dataguard環境,在搭 建新的dataguard之前,原有存在電源故障的機器還是可用,但是因為硬體已經過保,就直接做了伺服器退還。為了防止後續的歸檔檢查失敗,就設定了 log_archive-dest_state_2=DEFER,搭建dataguard成功之後,就把伺服器從dg broker裡刪除了。
所以從這個不起眼的過程來看,log_archive_dest_state有了三個狀態的切換,defer,reset,enable
我們能不能簡單復現一下這個問題,答案是肯定的。而且可以直接用這套環境來模擬一下。
首先dg broker檢查沒有任何問題。
DGMGRL> show configuration
Configuration - testdb_dg
Protection Mode: MaxPerformance
Databases:
sactvdb - Primary database
s2actvdb - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS
SQL> ALTER SYSTEM SET log_archive_dest_state_2=DEFER;
System altered.
這個時候如果再次檢查,dg broker就是下面的狀態了。
DGMGRL> show configuration;
Configuration - testdb_dg
Protection Mode: MaxPerformance
Databases:
sactvdb - Primary database
Error: ORA-16764: redo transport service to a standby database is not running
s2actvdb - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
ERROR
ALTER SYSTEM SET log_archive_dest_state_2='DEFER' SCOPE=BOTH;
Sat Mar 26 20:35:54 2016
***********************************************************************
Fatal NI connect error 12528, connecting to:
(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=xxxx)(PORT=1528)))(CONNECT_DATA=(SERVICE_NAME=s2actvdb_DGB)(CID=(PROGRAM=oracle)(HOST=testdb2.test.com)(USER=oracle))))
VERSION INFORMATION:
TNS for Linux: Version 11.2.0.3.0 - Production
TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.3.0 - Production
Time: 26-MAR-2016 20:35:54
Tracing not turned on.
Tns error struct:
ns main err code: 12564
TNS-12564: TNS:connection refused
ns secondary err code: 0
nt main err code: 0
nt secondary err code: 0
nt OS err code: 0
***********************************************************************
Fatal NI connect error 12528, connecting to:
(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=10.127.130.58)(PORT=1528)))(CONNECT_DATA=(SERVICE_NAME=s2actvdb_DGB)(CID=(PROGRAM=oracle)(HOST=testdb2.test.com)(USER=oracle))))
VERSION INFORMATION:
TNS for Linux: Version 11.2.0.3.0 - Production
TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.3.0 - Production
Time: 26-MAR-2016 20:35:54
Tracing not turned on.
Tns error struct:
ns main err code: 12564
TNS-12564: TNS:connection refused
ns secondary err code: 0
nt main err code: 0
nt secondary err code: 0
nt OS err code: 0
***********************************************************************
。。。
LNS: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (3135)
LNS: Destination LOG_ARCHIVE_DEST_2 network reconnect abandoned
Error 3135 for archive log file 2 to 's2actvdb'
Errors in file /U01/app/oracle/diag/rdbms/sactvdb/actvdb/trace/actvdb_nsa2_20231.trc:
ORA-03135: connection lost contact
LNS: Failed to archive log 2 thread 1 sequence 10137 (3135)
Sat Mar 26 20:36:04 2016
***********************************************************************
。。。
TNS-12564: TNS:connection refused
ns secondary err code: 0
nt main err code: 0
nt secondary err code: 0
nt OS err code: 0
Sat Mar 26 20:36:04 2016
ALTER SYSTEM SET log_archive_dest_state_2='RESET' SCOPE=BOTH;
然後重啟備庫到open狀態,稍作等待,檢查dg broker的狀態,一切顯示就正常了。
DGMGRL> show configuration;
Configuration - testdb_dg
Protection Mode: MaxPerformance
Databases:
sactvdb - Primary database
s2actvdb - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS
可以看到這個問題看起來比較清楚了,歸檔路徑從defer變為了reset,然後變為了enable.
那麼這個問題不設定defer,是否會出現reset的操作呢,能否復現,可以簡單再測試一下。
再次停掉備庫,然後檢視主庫的日誌如下:
Fatal NI connect error 12514, connecting to:
(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=10.127.130.58)(PORT=1528)))(CONNECT_DATA=(SERVICE_NAME=s2actvdb_DGB)(CID=(PROGRAM=oracle)(HOST=testdb2.test.com)(USER=oracle))))
VERSION INFORMATION:
TNS for Linux: Version 11.2.0.3.0 - Production
TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.3.0 - Production
Time: 26-MAR-2016 20:41:48
Tracing not turned on.
Tns error struct:
ns main err code: 12564
TNS-12564: TNS:connection refused
ns secondary err code: 0
nt main err code: 0
nt secondary err code: 0
nt OS err code: 0
Sat Mar 26 20:41:48 2016
ALTER SYSTEM SET log_archive_dest_state_2='RESET' SCOPE=BOTH;
可以看到,日誌末尾會有reset的字樣,所以說和之前的defer是沒有直接關係,dg broker的配置下,這是一個狀態的自動變化。
那麼備庫停掉之後,再次重啟是否會有dg broker中備庫為disable的狀態呢。
DGMGRL> show configuration;
Configuration - testdb_dg
Protection Mode: MaxPerformance
Databases:
sactvdb - Primary database
s2actvdb - Physical standby database (disabled)
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS
可以簡單復現問題,那就是備庫在nomount狀態,11g環境中,mount,open狀態下都沒有此類問題,也是因為這個時候備庫的RFS,MRP可以正常工作。
復現過程如下:
重啟備庫到nomount狀態
DGMGRL> show configuration;
Configuration - testdb_dg
Protection Mode: MaxPerformance
Databases:
sactvdb - Primary database
s2actvdb - Physical standby database (disabled)
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS
啟動備庫到mount
DGMGRL> show configuration;
Configuration - testdb_dg
Protection Mode: MaxPerformance
Databases:
sactvdb - Primary database
s2actvdb - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS
所以通過以上的測試復現,可以看到這個問題其實不奇怪,備庫重啟,但是備庫在nomount階段導致了這個奇怪的現象,但是對於dataguard而言,歸檔路徑的狀態有defer,reset,enable幾種情況,可能會以reset為一個臨界點來做轉換。