Full Log/Data Disk

Business and Database Symptoms

Full data disk

OceanBase Cloud Platform (OCP) reports a data disk alert, such as the alert for ob_host_data_path_disk_percent.

The business application reports an error. OBServer nodes may fail to perform minor compactions, major compactions, or memory release, resulting in a failure to write data to the cluster.

Full log disk

OCP reports a log disk alert, such as ob_host_log_disk_percent_over_threshold.

The business application reports an error. OBServer nodes may not function properly, affecting the election process.

The methods described in the preceding topics are applicable when other programs generate a large amount of data in the log disk or data disk of an OBServer node. Do not manually delete log files or data files on OBServer nodes. Otherwise, the system may fail to be restored.

Troubleshooting Approach

Full data disk

Connect to the oceanbase database through the sys tenant and query information about the data disk usage of each node in the cluster, including the total allocated size, occupied size, and remaining size.

SELECT b.zone, a.svr_ip, a.svr_port,
        ROUND(a.total_size/1024/1024/1024,3) total_size_GB,
        ROUND((a.total_size-a.free_size)/1024/1024/1024,3) used_size_GB,
        ROUND(a.free_size/1024/1024/1024,3) free_size_GB,
        ROUND((a.total_size-a.free_size)/total_size,2)*100 disk_used_percentage
FROM oceanbase.__all_virtual_disk_stat a
INNER JOIN oceanbase.__all_server b
  ON a.svr_ip=b.svr_ip AND a.svr_port=b.svr_port
ORDER BY zone

*************************** 1. row ***************************
                zone: zone1
              svr_ip: 1.2.3.4
            svr_port: 22602
       total_size_GB: 8.000
        used_size_GB: 0.307
        free_size_GB: 7.693
disk_used_percentage: 4.00

If the disk_used_percentage value exceeds the default alert threshold, which is 97%, the alert reported by OCP is valid.

Full log disk

The issue of a full log disk rarely occurs because expired logs are recycled.

Check whether the alert reported by OCP is valid on the affected node. The cluster and host information for the node can be found in the alert.

Log in to the sys tenant and connect to the oceanbase database. Run the following command to check whether the log disk usage in a tenant exceeds the threshold.

select a.svr_ip,a.svr_port,a.tenant_id,b.tenant_name,
    CAST(a.data_disk_in_use/1024/1024/1024 as DECIMAL(15,2)) data_disk_use_G, 
    CAST(a.log_disk_size/1024/1024/1024 as DECIMAL(15,2)) log_disk_size, 
    CAST(a.log_disk_in_use/1024/1024/1024 as DECIMAL(15,2)) log_disk_use_G,
    log_disk_in_use/log_disk_size 'usage%'
from oceanbase.__all_virtual_unit a,dba_ob_tenants b 
where a.tenant_id=b.tenant_id\G

*************************** 1. row ***************************
         svr_ip: 1.2.3.4
       svr_port: 22602
      tenant_id: 1
    tenant_name: sys
data_disk_use_G: 0.10
  log_disk_size: 2.00
 log_disk_use_G: 1.54
         usage%: 0.7693
*************************** 2. row ***************************
         svr_ip: 1.2.3.4
       svr_port: 22602
      tenant_id: 1001
    tenant_name: META$1002
data_disk_use_G: 0.07
  log_disk_size: 0.60
 log_disk_use_G: 0.43
         usage%: 0.7174
*************************** 3. row ***************************
         svr_ip: 1.2.3.4
       svr_port: 22602
      tenant_id: 1002
    tenant_name: mysql
data_disk_use_G: 0.05
  log_disk_size: 5.40
 log_disk_use_G: 3.13
         usage%: 0.5789

Troubleshooting Procedure

Full data disk

Check whether the operating system disk that stores data files has enough remaining capacity for allocation.

If so, modify the cluster-level parameter datafile_size or datafile_disk_percentage to increase the available capacity of the data disk for the database.

Run the following commands to check the values of the datafile_size and datafile_disk_percentage parameters:

show parameters like 'datafile_size'\G
*************************** 1. row ***************************
         zone: zone1
     svr_type: observer
       svr_ip: 1.2.3.4
     svr_port: 22602
         name: datafile_size
    data_type: CAPACITY
        value: 2G
         info: size of the data file. Range: [0, +∞)
      section: SSTABLE
        scope: CLUSTER
       source: DEFAULT
   edit_level: DYNAMIC_EFFECTIVE
default_value: 0M
    isdefault: 0
1 row in set (0.006 sec)

The default value of the datafile_size parameter is 0M. If the default value is used, the available capacity of the data disk is controlled by the datafile_disk_percentage parameter.

show parameters like 'datafile_disk_percentage'\G
*************************** 1. row ***************************
         zone: zone1
     svr_type: observer
       svr_ip: 11.158.31.20
     svr_port: 22602
         name: datafile_disk_percentage
    data_type: INT
        value: 0
         info: the percentage of disk space used by the data files. Range: [0,99] in integer
      section: SSTABLE
        scope: CLUSTER
       source: DEFAULT
   edit_level: DYNAMIC_EFFECTIVE
default_value: 0
    isdefault: 1
1 row in set (0.006 sec)

The default value of the datafile_disk_percentage parameter is 0. If the default value is used, the system automatically calculates the percentage of the total disk space occupied by the data file in Shared-Nothing (SN) mode or local cache in Shared-Storage (SS) mode based on whether the logs and data share the same disk.

If the same disk is shared, the percentage of the total disk space occupied by data files or local cache is 60%.
If the disk is not shared, the percentage of the total disk space occupied by data files or local cache is 90%. If this parameter and the datafile_size parameter are both specified, the value of the datafile_size parameter prevails.

Run the following commands to set the parameters to greater values:

-- Set the size of the data file to 80 GB.
obclient> ALTER SYSTEM SET datafile_size = '80G';

-- Alternatively, set the percentage of the total disk space occupied by the data file to 95%.
obclient> ALTER SYSTEM SET datafile_disk_percentage = 90;

OceanBase Database supports auto scale-out of disk space available for data files based on the actual situation. We recommend that you set the datafile_next and datafile_maxsize parameters for auto scale-out when you deploy a cluster. For more information, see Configure automatic scale-out of disk space for data files.

In a production environment, we recommend that the size of the log disk be at least three times the memory size of the host. To avoid I/O performance issues, we recommend that you mount the data directory and the log directory to different disks.

If the system disk has no remaining capacity for allocation, add nodes to the zone and migrate resource units to evenly distribute data across nodes.

You can perform GUI-based operations in OCP to add nodes to the zone and migrate resource units. For more information, see Migrate a resource unit from an OceanBase Database tenant.

If an OBServer node hosts resource units of multiple tenants, migrate the resource units to evenly distribute data across nodes.

If an OBServer node hosts only one user tenant, purge the recycle bin to delete redundant data.

Full log disk

Check whether the log disk of the cluster has remaining capacity for allocation. If the value of the log_disk_free parameter is 0, the log disk has no remaining capacity for allocation.

select zone,concat(SVR_IP,':',SVR_PORT) observer,
    cpu_capacity_max cpu_total,cpu_assigned_max cpu_assigned,
    cpu_capacity-cpu_assigned_max as cpu_free,
    round(memory_limit/1024/1024/1024,2) as memory_total,
    round((memory_limit-mem_capacity)/1024/1024/1024,2) as system_memory,
    round(mem_assigned/1024/1024/1024,2) as mem_assigned,
    round((mem_capacity-mem_assigned)/1024/1024/1024,2) as memory_free,
    round(log_disk_capacity/1024/1024/1024,2) as log_disk_capacity,
    round(log_disk_assigned/1024/1024/1024,2) as log_disk_assigned,
    round((log_disk_capacity-log_disk_assigned)/1024/1024/1024,2) as log_disk_free,
    round((data_disk_capacity/1024/1024/1024),2) as data_disk,
    round((data_disk_in_use/1024/1024/1024),2) as data_disk_used,
    round((data_disk_capacity-data_disk_in_use)/1024/1024/1024,2) as data_disk_free
from gv$ob_servers\G
*************************** 1. row ***************************
             zone: zone1
         observer: 11.158.31.20:22602
        cpu_total: 8
     cpu_assigned: 4
         cpu_free: 4
     memory_total: 8.00
    system_memory: 1.00
     mem_assigned: 3.00
      memory_free: 4.00
log_disk_capacity: 21.00
log_disk_assigned: 8.00
    log_disk_free: 13.00
        data_disk: 8.00
   data_disk_used: 0.27
   data_disk_free: 7.73
1 row in set (0.006 sec)

If the value of the log_disk_free parameter is not 0, the log disk has remaining capacity for allocation. In this case, you can set the log disk size to a greater value for the resource unit of the tenant.

ALTER RESOURCE UNIT unit_name LOG_DISK_SIZE = '10G';

If the value of the log_disk_free parameter is 0, the log disk space allocated to the cluster is used up. You can run the df -h command to check whether all the capacity of the disk to which the log directory of the node is mounted is allocated to the log directory.

If the disk has remaining capacity for allocation, you can modify the cluster-level parameter log_disk_size or log_disk_percentage to increase the log disk size available for the cluster. Then, run the ALTER RESOURCE UNIT command to increase the log disk size available for the tenant.

ALTER system SET log_disk_size = '40G';

ALTER system SET log_disk_percentage = 90;

If the operating system disk has no remaining capacity for allocation, stop the write operations to the abnormal tenant. Otherwise, the space temporarily released from the clog disk can be quickly used up again. At the same time, modify the log_disk_utilization_limit_threshold parameter to increase the threshold of the clog disk usage from 95% to 98%.

ALTER system SET log_disk_utilization_limit_threshold = 98;

Wait a period of time until clogs have been synchronized. The capacity is then automatically recycled by the database system and the cluster can be automatically restored.

Business and Database Symptoms​

Full data disk​

Full log disk​

Troubleshooting Approach​

Full data disk​

Full log disk​

Troubleshooting Procedure​

Full data disk​

Check whether the operating system disk that stores data files has enough remaining capacity for allocation.​

If the system disk has no remaining capacity for allocation, add nodes to the zone and migrate resource units to evenly distribute data across nodes.​

Full log disk​

References​

Business and Database Symptoms

Full data disk

Full log disk

Troubleshooting Approach

Full data disk

Full log disk

Troubleshooting Procedure

Full data disk

Check whether the operating system disk that stores data files has enough remaining capacity for allocation.

If the system disk has no remaining capacity for allocation, add nodes to the zone and migrate resource units to evenly distribute data across nodes.

Full log disk

References