1. 描述:
使用crs_stat –t 命令查看rac服務,直接報CRS-0184: Cannot communicate with the CRS daemon.錯誤
但是奇怪的是我們的DB是沒有問題的。sqlplus / as sysdba可以繼續登陸,并使用。
2. 錯誤分析:
首先查看警告日誌:錯誤從2016/07/13號開始
/grid/11.2.0/log/phars1/alertphars1.log
2016-07-13 16:04:49.616: [crsd(21419)]CRS-2765:Resource 'ora.VOTDG.dg' has failed on server 'phars1'. 2016-07-13 16:04:49.702: [crsd(21419)]CRS-2878:Failed to restart resource 'ora.VOTDG.dg' 2016-07-13 16:04:49.703: [crsd(21419)]CRS-2769:Unable to failover resource 'ora.VOTDG.dg'. 2016-07-13 19:39:38.436: [crsd(21419)]CRS-1006:The OCR location +VOTDG is inaccessible. Details in /grid/11.2.0/log/phars1/crsd/crsd.log. 2016-07-13 19:39:38.437: [crsd(21419)]CRS-1006:The OCR location +VOTDG is inaccessible. Details in /grid/11.2.0/log/phars1/crsd/crsd.log. 2016-07-13 19:39:53.742: [/grid/11.2.0/bin/oraagent.bin(30612)]CRS-5822:Agent '/grid/11.2.0/bin/oraagent_oracle' disconnected from server. Details at (:CRSAGF00117:) {0:11:9490} in /grid/11.2 .0/log/phars1/agent/crsd/oraagent_oracle/oraagent_oracle.log. 2016-07-13 19:39:53.742: [/grid/11.2.0/bin/orarootagent.bin(21814)]CRS-5822:Agent '/grid/11.2.0/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117:) {0:3:36} in /grid/1 1.2.0/log/phars1/agent/crsd/orarootagent_root/orarootagent_root.log. 2016-07-13 19:39:53.743: [/grid/11.2.0/bin/oraagent.bin(21774)]CRS-5822:Agent '/grid/11.2.0/bin/oraagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:5:10} in /grid/11.2.0/log/phars1/agent/crsd/oraagent_grid/oraagent_grid.log. 2016-07-13 19:39:53.743: [/grid/11.2.0/bin/scriptagent.bin(1919)]CRS-5822:Agent '/grid/11.2.0/bin/scriptagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:13:12} in /grid/11. 2.0/log/phars1/agent/crsd/scriptagent_grid/scriptagent_grid.log. 2016-07-13 19:39:53.745: [ohasd(20149)]CRS-2765:Resource 'ora.crsd' has failed on server 'phars1'. 2016-07-13 19:39:55.153: [crsd(16165)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /grid/11.2.0/log/phars1/crsd/crsd.log. 2016-07-13 19:39:55.162: [crsd(16165)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage ]. Details at (:CRSD00111:) in /grid/11.2.0/log/phars1/crsd/crsd.log. 2016-07-13 19:39:55.774: [ohasd(20149)]CRS-2765:Resource 'ora.crsd' has failed on server 'phars1'. 2016-07-13 19:39:57.201: [crsd(16185)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /grid/11.2.0/log/phars1/crsd/crsd.log. 2016-07-13 19:39:57.210: [crsd(16185)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage ]. Details at (:CRSD00111:) in /grid/11.2.0/log/phars1/crsd/crsd.log. 2016-07-13 19:39:57.814: [ohasd(20149)]CRS-2765:Resource 'ora.crsd' has failed on server 'phars1'. 2016-07-13 19:39:59.206: [crsd(16210)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /grid/11.2.0/log/phars1/crsd/crsd.log. 2016-07-13 19:39:59.214: [crsd(16210)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage ]. Details at (:CRSD00111:) in /grid/11.2.0/log/phars1/crsd/crsd.log. 2016-07-13 19:39:59.843: [ohasd(20149)]CRS-2765:Resource 'ora.crsd' has failed on server 'phars1'. 2016-07-13 19:40:01.237: [crsd(16223)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /grid/11.2.0/log/phars1/crsd/crsd.log. 2016-07-13 19:40:01.245: [crsd(16223)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage ]. Details at (:CRSD00111:) in /grid/11.2.0/log/phars1/crsd/crsd.log. 2016-07-13 19:40:01.872: [ohasd(20149)]CRS-2765:Resource 'ora.crsd' has failed on server 'phars1'. 2016-07-13 19:40:03.263: [crsd(16238)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /grid/11.2.0/log/phars1/crsd/crsd.log. 2016-07-13 19:40:03.273: [crsd(16238)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage ]. Details at (:CRSD00111:) in /grid/11.2.0/log/phars1/crsd/crsd.log. 2016-07-13 19:40:03.900: [ohasd(20149)]CRS-2765:Resource 'ora.crsd' has failed on server 'phars1'. 2016-07-13 19:40:05.293: [crsd(16254)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /grid/11.2.0/log/phars1/crsd/crsd.log. 2016-07-13 19:40:05.302: [crsd(16254)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage ]. Details at (:CRSD00111:) in /grid/11.2.0/log/phars1/crsd/crsd.log. 2016-07-13 19:40:05.929: [ohasd(20149)]CRS-2765:Resource 'ora.crsd' has failed on server 'phars1'. 2016-07-13 19:40:07.325: [crsd(16271)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /grid/11.2.0/log/phars1/crsd/crsd.log. 2016-07-13 19:40:07.335: [crsd(16271)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage ]. Details at (:CRSD00111:) in /grid/11.2.0/log/phars1/crsd/crsd.log. 2016-07-13 19:40:07.956: [ohasd(20149)]CRS-2765:Resource 'ora.crsd' has failed on server 'phars1'. 2016-07-13 19:40:09.346: [crsd(16290)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /grid/11.2.0/log/phars1/crsd/crsd.log. 2016-07-13 19:40:09.355: [crsd(16290)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage ]. Details at (:CRSD00111:) in /grid/11.2.0/log/phars1/crsd/crsd.log. 2016-07-13 19:40:09.985: [ohasd(20149)]CRS-2765:Resource 'ora.crsd' has failed on server 'phars1'. 2016-07-13 19:40:11.376: [crsd(16327)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /grid/11.2.0/log/phars1/crsd/crsd.log. 2016-07-13 19:40:11.386: [crsd(16327)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage ]. Details at (:CRSD00111:) in /grid/11.2.0/log/phars1/crsd/crsd.log. 2016-07-13 19:40:12.013: [ohasd(20149)]CRS-2765:Resource 'ora.crsd' has failed on server 'phars1'. 2016-07-13 19:40:13.401: [crsd(16340)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /grid/11.2.0/log/phars1/crsd/crsd.log. 2016-07-13 19:40:13.411: [crsd(16340)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage ]. Details at (:CRSD00111:) in /grid/11.2.0/log/phars1/crsd/crsd.log. 2016-07-13 19:40:14.053: [ohasd(20149)]CRS-2765:Resource 'ora.crsd' has failed on server 'phars1'. 2016-07-13 19:40:14.053: [ohasd(20149)]CRS-2771:Maximum restart attempts reached for resource 'ora.crsd'; will not restart. 2016-07-13 19:40:14.053: [ohasd(20149)]CRS-2769:Unable to failover resource 'ora.crsd'.
分析上面這段日誌,過程就是 資源'ora.VOTDG.dg' failed=》嘗試重啟該資源=》重啟失敗=》OCR文件的位置+VOTDG無法訪問=》最後就導致了crs的異常,由於無法訪問物理存儲。=》嘗試重啟達到最大次數之後,放棄了重啟=》crsd失敗。
上面的全部證明就表示是由於VOTDG無法訪問,導致了crs服務的異常
接下來我們再看看/grid/11.2.0/log/phars1/crsd/crsd.log日誌
2016-07-13 16:04:49.615: [ AGFW][4118722304]{0:5:6} Agfw Proxy Server received the message: RESOURCE_STATUS[Proxy] ID 20481:162956 2016-07-13 16:04:49.615: [ AGFW][4118722304]{0:5:6} Verifying msg rid = ora.VOTDG.dg phars1 1 2016-07-13 16:04:49.615: [ AGFW][4118722304]{0:5:6} Received state change for ora.VOTDG.dg phars1 1 [old state = ONLINE, new state = OFFLINE] --這裡提示ora.VOTDG.dg的狀態變為了offline 2016-07-13 16:04:49.615: [ AGFW][4118722304]{0:5:6} Agfw Proxy Server sending message to PE, Contents = [MIDTo:2|OpID:3|FromA:{Invalid|Node:0|Process:0|Type:0}|ToA :{Invalid|Node:-1|Process:-1|Type:-1}|MIDFrom:0|Type:4|Pri2|Id:287142:Ver:2] 2016-07-13 16:04:49.615: [ AGFW][4118722304]{0:5:6} Agfw Proxy Server replying to the message: RESOURCE_STATUS[Proxy] ID 20481:162956 2016-07-13 16:04:49.616: [ CRSPE][4108216064]{0:5:6} State change received from phars1 for ora.VOTDG.dg phars1 1 2016-07-13 16:04:49.616: [ CRSPE][4108216064]{0:5:6} Processing PE command id=13336. Description: [Resource State Change (ora.VOTDG.dg phars1 1) : 0x7fb470104850] 2016-07-13 16:04:49.616: [ CRSPE][4108216064]{0:5:6} RI [ora.VOTDG.dg phars1 1] new external state [OFFLINE] old value: [ONLINE] on phars1 label = [] 2016-07-13 16:04:49.616: [ CRSD][4108216064]{0:5:6} {0:5:6} Resource Resource Instance ID[ora.VOTDG.dg phars1 1]. Values: STATE=OFFLINE TARGET=ONLINE LAST_SERVER=phars1 CURRENT_RCOUNT=0 LAST_RESTART=0 FAILURE_COUNT=0 FAILURE_HISTORY= STATE_DETAILS= INCARNATION=0 STATE_CHANGE_VERS=0 LAST_FAULT=0 LAST_STATE_CHANGE=1468397089 INTERNAL_STATE=0 DEGREE_ID=1 ID=ora.VOTDG.dg phars1 1 Lock Info: Write Locks:none ReadLocks:|STATE INITED||ONLINE STATERECOVERED| has failed! 2016-07-13 16:04:49.616: [ CRSPE][4108216064]{0:5:6} Processing unplanned state change for [ora.VOTDG.dg phars1 1] 2016-07-13 16:04:49.617: [ CRSPE][4108216064]{0:5:6} Scheduled local recovery for [ora.VOTDG.dg phars1 1] 2016-07-13 16:04:49.617: [ CRSRPT][4106114816]{0:5:6} Published to EVM CRS_RESOURCE_STATE_CHANGE for ora.VOTDG.dg 2016-07-13 16:04:49.617: [ CRSPE][4108216064]{0:5:6} Op 0x7fb4700c89d0 has 5 WOs 2016-07-13 16:04:49.618: [ CRSPE][4108216064]{0:5:6} RI [ora.VOTDG.dg phars1 1] new internal state: [STARTING] old value: [STABLE] 2016-07-13 16:04:49.618: [ CRSPE][4108216064]{0:5:6} Sending message to agfw: id = 287144 2016-07-13 16:04:49.618: [ CRSPE][4108216064]{0:5:6} CRS-2672: Attempting to start 'ora.VOTDG.dg' on 'phars1'
2016-07-13 16:04:49.618: [ AGFW][4118722304]{0:5:6} Agfw Proxy Server received the message: RESOURCE_START[ora.VOTDG.dg phars1 1] ID 4098:287144 2016-07-13 16:04:49.619: [ AGFW][4118722304]{0:5:6} Agfw Proxy Server forwarding the message: RESOURCE_START[ora.VOTDG.dg phars1 1] ID 4098:287144 to the agent /gr id/11.2.0/bin/oraagent_grid 2016-07-13 16:04:49.673: [ AGFW][4118722304]{0:5:6} Received the reply to the message: RESOURCE_START[ora.VOTDG.dg phars1 1] ID 4098:287145 from the agent /grid/11 .2.0/bin/oraagent_grid 2016-07-13 16:04:49.673: [ AGFW][4118722304]{0:5:6} Agfw Proxy Server sending the reply to PE for message:RESOURCE_START[ora.VOTDG.dg phars1 1] ID 4098:287144 2016-07-13 16:04:49.673: [ CRSPE][4108216064]{0:5:6} Received reply to action [Start] message ID: 287144 2016-07-13 16:04:49.701: [ AGFW][4118722304]{0:5:6} Received the reply to the message: RESOURCE_START[ora.VOTDG.dg phars1 1] ID 4098:287145 from the agent /grid/11 .2.0/bin/oraagent_grid 2016-07-13 16:04:49.701: [ AGFW][4118722304]{0:5:6} Agfw Proxy Server sending the last reply to PE for message:RESOURCE_START[ora.VOTDG.dg phars1 1] ID 4098:287144 2016-07-13 16:04:49.701: [ CRSPE][4108216064]{0:5:6} Received reply to action [Start] message ID: 287144 2016-07-13 16:04:49.701: [ CRSPE][4108216064]{0:5:6} RI [ora.VOTDG.dg phars1 1] new internal state: [STABLE] old value: [STARTING] 2016-07-13 16:04:49.701: [ CRSPE][4108216064]{0:5:6} CRS-2674: Start of 'ora.VOTDG.dg' on 'phars1' failed
這裡日誌也主要講'ora.VOTDG.dg' 失敗,導致crs的失敗
3. 錯誤解決:
①首先是提示我的crs服務不能通信,所以我首先去查看我的alert log 和 crs log
②通過查看crsd.log還看到下面這句話
2016-07-15 10:17:24.000: [ OCRASM][992749344]proprasmo: The ASM disk group VOTDG is not found or not mounted
這裡提示我的votedisk磁盤沒有找到或沒有mount
③因為我的DB是正常的,我去查看我的votedisk磁盤狀態
SQL> select name,state from v$asm_diskgroup;
NAME STATE
------------------------------ ----------- BACKUPDG CONNECTED DATADG CONNECTED SYSDG CONNECTED VOTDG DISMOUNTED這裡的確顯示我的votedisk dismounted了。正常狀態是必須mounted的
手動mount votedisk
grid@phars1: /home/grid> sqlplus / as sysasm --這裡注意要使用grid用戶的sysasm登陸
SQL*Plus: Release 11.2.0.4.0 Production on Fri Jul 15 11:38:40 2016
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Real Application Clusters and Automatic Storage Management options
SQL> alter diskgroup VOTDG mount; --手動mount votedisk磁盤
Diskgroup altered.
這個在兩邊都要做。
然後重啟一下cluster服務,就好了。注意在沒有mount起來重啟是無效的。只有mount了之後才能正常停起
[root@phaws1 ~]# crsctl stop cluster -all
CRS-2796: The command may not proceed when Cluster Ready Services is not runningCRS-2796: The command may not proceed when Cluster Ready Services is not runningCRS-4704: Shutdown of Clusterware failed on node phaws1.CRS-4704: Shutdown of Clusterware failed on node phaws2.CRS-4000: Command Stop failed, or completed with errors.[root@phaws1 ~]# crsctl start cluster -allCRS-2672: Attempting to start 'ora.crsd' on 'phaws1'CRS-2672: Attempting to start 'ora.crsd' on 'phaws2'CRS-2676: Start of 'ora.crsd' on 'phaws1' succeededCRS-2676: Start of 'ora.crsd' on 'phaws2' succeeded總結:crs異常主要是因為votedisk的無法訪問導致。主要還是要分析日誌,根據日誌得出正確的處理思路。