Friday, January 10, 2020

Flash Disk Replacement due to poor performance in Exadata X6-2 Environment

We have Exadata X6-2 Environment, where one of our Flashdisk was showing,

To identify a poor performance flash disk, use the following command:


This flash disk is in poor performance status.

Recommended Action  

The flash disk has entered poor performance status. A white cell locator LED has been lit to help locate the affected cell. Please replace the flash disk.
If the flash disk is used for flash cache, then flash cache will be disabled on this disk thus reducing the effective flash cache size. If the flash disk is used for flash log, then flash log will be disabled on this disk thus reducing the effective flash log size. If the flash disk is used for grid disks, then Oracle ASM rebalance will automatically restore the data redundancy.

Sun Oracle Exadata Storage Server is equipped with four PCIe cards. Each card has four flash disks (FDOMs) for a total of 16 flash disks. The 4 PCIe cards are present on PCI slot numbers 1, 2, 4, and 5. The PCIe cards are not hot-pluggable such that Exadata Cell must be powered down before replacing the flash disks or cards.

Hence DataCenter Team replaced a flash disk in co-ordination with us (DBA) because the flash disk was in poor performance status.

1. Shut down the cell.

The following procedure describes how to power down Exadata Cell.Run the following command to check if there are offline disks on other cells that are mirrored with disks on this cell:

CellCLI > LIST GRIDDISK ATTRIBUTES name WHERE asmdeactivationoutcome != 'Yes'

If any grid disks are returned, then it is not safe to take the storage server offline because proper Oracle ASM disk group redundancy will not be intact. Taking the storage server offline when one or more grid disks are in this state will cause Oracle ASM to dismount the affected disk group, causing the databases to shut down abruptly.

Inactivate all the grid disks when Oracle Exadata Storage Server is safe to take offline using the following command:

CellCLI> ALTER GRIDDISK ALL INACTIVE

The preceding command will complete once all disks are inactive and offline. Depending on the storage server activity, it may take several minutes for this command to complete.

Verify all grid disks areINACTIVEto allow safe storage server shut down by running the following command.

CellCLI> LIST GRIDDISK

If all grid disks areINACTIVE, then the storage server can be shutdown without affecting database availability.

Stop the cell services using the following command:

CellCLI> ALTER CELL SHUTDOWN SERVICES ALL

Shut down the cell.

2. Replace the failed flash disk based on the PCI number and FDOM number.

3. Power up the cell. The cell services will be started automatically.

4.Bring all grid disks are online using the following command:
CellCLI> ALTER GRIDDISK ALL ACTIVE
5. Verify that all grid disks have been successfully put online using the following command:

CellCLI> LIST GRIDDISK ATTRIBUTES name, asmmodestatus

        Wait until asmmodestatus from SYNCING to ONLINE for all grid disks.
        The following is an example of the output:


CellCLI> list physicaldisk where disktype=flashdisk
         FLASH_1_1       S2T7NA0J304430  normal
         FLASH_2_1       S2T7NAAH409309  normal
         FLASH_4_1       S2T7NA0J304420  normal
         FLASH_5_1       S2T7NA0JB00348  normal

CellCLI> list griddisk attributes name, asmmodestatus
         DATAC1_CD_00_ABC   ONLINE
         DATAC1_CD_01_ABC   ONLINE
         DATAC1_CD_02_ABC   ONLINE
         DATAC1_CD_03_ABC   ONLINE
         DATAC1_CD_04_ABC   ONLINE
         DATAC1_CD_05_ABC   ONLINE
         DATAC1_CD_06_ABC   ONLINE
         DATAC1_CD_07_ABC   ONLINE
         DATAC1_CD_08_ABC   ONLINE
         DATAC1_CD_09_ABC   ONLINE
         DATAC1_CD_10_ABC   ONLINE
         DATAC1_CD_11_ABC   ONLINE
         DBFS_DG_CD_02_ABC  ONLINE
         DBFS_DG_CD_03_ABC  ONLINE
         DBFS_DG_CD_04_ABC  ONLINE
         DBFS_DG_CD_05_ABC  ONLINE
         DBFS_DG_CD_06_ABC  ONLINE
         DBFS_DG_CD_07_ABC  ONLINE
         DBFS_DG_CD_08_ABC  ONLINE
         DBFS_DG_CD_09_ABC  ONLINE
         DBFS_DG_CD_10_ABC  ONLINE
         DBFS_DG_CD_11_ABC  ONLINE
         RECOC1_CD_00_ABC   ONLINE
         RECOC1_CD_01_ABC   ONLINE
         RECOC1_CD_02_ABC   ONLINE
         RECOC1_CD_03_ABC   ONLINE
         RECOC1_CD_04_ABC   ONLINE
         RECOC1_CD_05_ABC   ONLINE
         RECOC1_CD_06_ABC   ONLINE
         RECOC1_CD_07_ABC   ONLINE
         RECOC1_CD_08_ABC   ONLINE
         RECOC1_CD_09_ABC   ONLINE
         RECOC1_CD_10_ABC   ONLINE
         RECOC1_CD_11_ABC   ONLINE


Oracle ASM synchronization is only complete when all grid disks show attribute asmmodestatus=ONLINE. Before taking another storage server offline, Oracle ASM synchronization must complete on the restarted Oracle Exadata Storage Server. If synchronization is not complete, then the check performed on another storage server will fail.

The new flash disk will be automatically used by the system. If the flash disk is used for flash cache, then the effective cache size will increase. If the flash disk is used for grid disks, then the grid disks will be recreated on the new flash disk. If those gird disks were part of an Oracle ASM disk group, then they will be added back to the disk group and the data will be rebalanced on them based on the disk group redundancy and asm_power_limit parameter.

Oracle ASM rebalance occurs when dropping or adding a disk. To check the status of the rebalance, do the following:

    • The rebalance operation may have been successfully run. Check the Oracle ASM alert logs to confirm
    • The rebalance operation may be currently running. Check the GV$ASM_OPERATION view to determine if the rebalance operation is still running.
    • The rebalance operation may have failed. Check the GV$ASM_OPERATION.ERROR view to determine if the rebalance operation failed.
    • Rebalance operations from multiple disk groups can be done on different Oracle ASM instances in the same cluster if the physical disk being replaced contains ASM disks from multiple disk groups. One Oracle ASM instance can run one rebalance operation at a time. If all Oracle ASM instances are busy, then rebalance operations will be queued.

Doc ID Referred :

HALRT-02011: Flash disk poor performance status (Doc ID 1206015.1)

Steps to shut down or reboot an Exadata storage cell without affecting ASM (Doc ID 1188080.1)

No comments: