We have Exadata X6-2 Environment, where one of our Flashdisk was showing,
To identify a poor performance flash disk, use the following command:
This flash disk is in poor performance status.
Sun Oracle Exadata Storage Server is equipped with four PCIe cards. Each card has four flash disks (FDOMs) for a total of 16 flash disks. The 4 PCIe cards are present on PCI slot numbers 1, 2, 4, and 5. The PCIe cards are not hot-pluggable such that Exadata Cell must be powered down before replacing the flash disks or cards.
Hence DataCenter Team replaced a flash disk in co-ordination with us (DBA) because the flash disk was in poor performance status.
The following procedure describes how to power down Exadata Cell.Run the following command to check if there are offline disks on other cells that are mirrored with disks on this cell:
CellCLI > LIST GRIDDISK ATTRIBUTES name WHERE asmdeactivationoutcome != 'Yes'
If any grid disks are returned, then it is not safe to take the storage server offline because proper Oracle ASM disk group redundancy will not be intact. Taking the storage server offline when one or more grid disks are in this state will cause Oracle ASM to dismount the affected disk group, causing the databases to shut down abruptly.
Inactivate all the grid disks when Oracle Exadata Storage Server is safe to take offline using the following command:
CellCLI> ALTER GRIDDISK ALL INACTIVE
The preceding command will complete once all disks are inactive and offline. Depending on the storage server activity, it may take several minutes for this command to complete.
Verify all grid disks are
CellCLI> LIST GRIDDISK
If all grid disks are
Stop the cell services using the following command:
CellCLI> ALTER CELL SHUTDOWN SERVICES ALL
Shut down the cell.
2. Replace the failed flash disk based on the PCI number and FDOM number.
3. Power up the cell. The cell services will be started automatically.
4.Bring all grid disks are online using the following command:
Wait until asmmodestatus from SYNCING to ONLINE for all grid disks.
The following is an example of the output:
CellCLI> list physicaldisk where disktype=flashdisk
FLASH_1_1 S2T7NA0J304430 normal
FLASH_2_1 S2T7NAAH409309 normal
FLASH_4_1 S2T7NA0J304420 normal
FLASH_5_1 S2T7NA0JB00348 normal
CellCLI> list griddisk attributes name, asmmodestatus
DATAC1_CD_00_ABC ONLINE
DATAC1_CD_01_ABC ONLINE
DATAC1_CD_02_ABC ONLINE
DATAC1_CD_03_ABC ONLINE
DATAC1_CD_04_ABC ONLINE
DATAC1_CD_05_ABC ONLINE
DATAC1_CD_06_ABC ONLINE
DATAC1_CD_07_ABC ONLINE
DATAC1_CD_08_ABC ONLINE
DATAC1_CD_09_ABC ONLINE
DATAC1_CD_10_ABC ONLINE
DATAC1_CD_11_ABC ONLINE
DBFS_DG_CD_02_ABC ONLINE
DBFS_DG_CD_03_ABC ONLINE
DBFS_DG_CD_04_ABC ONLINE
DBFS_DG_CD_05_ABC ONLINE
DBFS_DG_CD_06_ABC ONLINE
DBFS_DG_CD_07_ABC ONLINE
DBFS_DG_CD_08_ABC ONLINE
DBFS_DG_CD_09_ABC ONLINE
DBFS_DG_CD_10_ABC ONLINE
DBFS_DG_CD_11_ABC ONLINE
RECOC1_CD_00_ABC ONLINE
RECOC1_CD_01_ABC ONLINE
RECOC1_CD_02_ABC ONLINE
RECOC1_CD_03_ABC ONLINE
RECOC1_CD_04_ABC ONLINE
RECOC1_CD_05_ABC ONLINE
RECOC1_CD_06_ABC ONLINE
RECOC1_CD_07_ABC ONLINE
RECOC1_CD_08_ABC ONLINE
RECOC1_CD_09_ABC ONLINE
RECOC1_CD_10_ABC ONLINE
RECOC1_CD_11_ABC ONLINE
Oracle ASM synchronization is only complete when all grid disks show attribute asmmodestatus=ONLINE. Before taking another storage server offline, Oracle ASM synchronization must complete on the restarted Oracle Exadata Storage Server. If synchronization is not complete, then the check performed on another storage server will fail.
The new flash disk will be automatically used by the system. If the flash disk is used for flash cache, then the effective cache size will increase. If the flash disk is used for grid disks, then the grid disks will be recreated on the new flash disk. If those gird disks were part of an Oracle ASM disk group, then they will be added back to the disk group and the data will be rebalanced on them based on the disk group redundancy and asm_power_limit parameter.
Oracle ASM rebalance occurs when dropping or adding a disk. To check the status of the rebalance, do the following:
Doc ID Referred :
HALRT-02011: Flash disk poor performance status (Doc ID 1206015.1)
Steps to shut down or reboot an Exadata storage cell without affecting ASM (Doc ID 1188080.1)
To identify a poor performance flash disk, use the following command:
This flash disk is in poor performance status.
Recommended Action
The flash disk has entered poor performance status. A white cell locator LED has been lit to help locate the affected cell. Please replace the flash disk.
If the flash disk is used for flash cache, then flash cache will be disabled on this disk thus reducing the effective flash cache size. If the flash disk is used for flash log, then flash log will be disabled on this disk thus reducing the effective flash log size. If the flash disk is used for grid disks, then Oracle ASM rebalance will automatically restore the data redundancy.
Sun Oracle Exadata Storage Server is equipped with four PCIe cards. Each card has four flash disks (FDOMs) for a total of 16 flash disks. The 4 PCIe cards are present on PCI slot numbers 1, 2, 4, and 5. The PCIe cards are not hot-pluggable such that Exadata Cell must be powered down before replacing the flash disks or cards.
Hence DataCenter Team replaced a flash disk in co-ordination with us (DBA) because the flash disk was in poor performance status.
1. Shut down the cell.
The following procedure describes how to power down Exadata Cell.Run the following command to check if there are offline disks on other cells that are mirrored with disks on this cell:
CellCLI > LIST GRIDDISK ATTRIBUTES name WHERE asmdeactivationoutcome != 'Yes'
If any grid disks are returned, then it is not safe to take the storage server offline because proper Oracle ASM disk group redundancy will not be intact. Taking the storage server offline when one or more grid disks are in this state will cause Oracle ASM to dismount the affected disk group, causing the databases to shut down abruptly.
Inactivate all the grid disks when Oracle Exadata Storage Server is safe to take offline using the following command:
CellCLI> ALTER GRIDDISK ALL INACTIVE
The preceding command will complete once all disks are inactive and offline. Depending on the storage server activity, it may take several minutes for this command to complete.
Verify all grid disks are
INACTIVE
to allow safe storage server shut down by running the following command.CellCLI> LIST GRIDDISK
If all grid disks are
INACTIVE
, then the storage server can be shutdown without affecting database availability.Stop the cell services using the following command:
CellCLI> ALTER CELL SHUTDOWN SERVICES ALL
Shut down the cell.
2. Replace the failed flash disk based on the PCI number and FDOM number.
3. Power up the cell. The cell services will be started automatically.
4.Bring all grid disks are online using the following command:
CellCLI> ALTER GRIDDISK ALL ACTIVE
5. Verify that all grid disks have been successfully put online using the following command:
CellCLI> LIST GRIDDISK ATTRIBUTES name, asmmodestatus
CellCLI> LIST GRIDDISK ATTRIBUTES name, asmmodestatus
Wait until asmmodestatus from SYNCING to ONLINE for all grid disks.
The following is an example of the output:
CellCLI> list physicaldisk where disktype=flashdisk
FLASH_1_1 S2T7NA0J304430 normal
FLASH_2_1 S2T7NAAH409309 normal
FLASH_4_1 S2T7NA0J304420 normal
FLASH_5_1 S2T7NA0JB00348 normal
CellCLI> list griddisk attributes name, asmmodestatus
DATAC1_CD_00_ABC ONLINE
DATAC1_CD_01_ABC ONLINE
DATAC1_CD_02_ABC ONLINE
DATAC1_CD_03_ABC ONLINE
DATAC1_CD_04_ABC ONLINE
DATAC1_CD_05_ABC ONLINE
DATAC1_CD_06_ABC ONLINE
DATAC1_CD_07_ABC ONLINE
DATAC1_CD_08_ABC ONLINE
DATAC1_CD_09_ABC ONLINE
DATAC1_CD_10_ABC ONLINE
DATAC1_CD_11_ABC ONLINE
DBFS_DG_CD_02_ABC ONLINE
DBFS_DG_CD_03_ABC ONLINE
DBFS_DG_CD_04_ABC ONLINE
DBFS_DG_CD_05_ABC ONLINE
DBFS_DG_CD_06_ABC ONLINE
DBFS_DG_CD_07_ABC ONLINE
DBFS_DG_CD_08_ABC ONLINE
DBFS_DG_CD_09_ABC ONLINE
DBFS_DG_CD_10_ABC ONLINE
DBFS_DG_CD_11_ABC ONLINE
RECOC1_CD_00_ABC ONLINE
RECOC1_CD_01_ABC ONLINE
RECOC1_CD_02_ABC ONLINE
RECOC1_CD_03_ABC ONLINE
RECOC1_CD_04_ABC ONLINE
RECOC1_CD_05_ABC ONLINE
RECOC1_CD_06_ABC ONLINE
RECOC1_CD_07_ABC ONLINE
RECOC1_CD_08_ABC ONLINE
RECOC1_CD_09_ABC ONLINE
RECOC1_CD_10_ABC ONLINE
RECOC1_CD_11_ABC ONLINE
Oracle ASM synchronization is only complete when all grid disks show attribute asmmodestatus=ONLINE. Before taking another storage server offline, Oracle ASM synchronization must complete on the restarted Oracle Exadata Storage Server. If synchronization is not complete, then the check performed on another storage server will fail.
The new flash disk will be automatically used by the system. If the flash disk is used for flash cache, then the effective cache size will increase. If the flash disk is used for grid disks, then the grid disks will be recreated on the new flash disk. If those gird disks were part of an Oracle ASM disk group, then they will be added back to the disk group and the data will be rebalanced on them based on the disk group redundancy and asm_power_limit parameter.
Oracle ASM rebalance occurs when dropping or adding a disk. To check the status of the rebalance, do the following:
- The rebalance operation may have been successfully run. Check the Oracle ASM alert logs to confirm
- The rebalance operation may be currently running. Check the GV$ASM_OPERATION view to determine if the rebalance operation is still running.
- The rebalance operation may have failed. Check the GV$ASM_OPERATION.ERROR view to determine if the rebalance operation failed.
- Rebalance operations from multiple disk groups can be done on different Oracle ASM instances in the same cluster if the physical disk being replaced contains ASM disks from multiple disk groups. One Oracle ASM instance can run one rebalance operation at a time. If all Oracle ASM instances are busy, then rebalance operations will be queued.
Doc ID Referred :
HALRT-02011: Flash disk poor performance status (Doc ID 1206015.1)
Steps to shut down or reboot an Exadata storage cell without affecting ASM (Doc ID 1188080.1)