Skip to main content

FAQ

1. How do I make sure that a resource is ready?

Assume that you want to view the resource status of a cluster. Run the following command:

kubectl get obclusters.oceanbase.oceanbase.com test -n oceanbase

If the status is running in the response, the resource is ready.

# desired output
NAME STATUS AGE
test running 6m2s

2. How do I view the O&M status of a resource?

Assume that you want to view the resource status of a cluster. Run the following command:

kubectl get obclusters.oceanbase.oceanbase.com test -n oceanbase -o yaml

You can check the status and progress of O&M tasks based on values of parameters in the operationContext section in the response.

status:
image: oceanbase/oceanbase-cloud-native:4.2.0.0-101000032023091319
obzones:
- status: delete observer
zone: obcluster-1-zone1
- status: delete observer
zone: obcluster-1-zone2
- status: delete observer
zone: obcluster-1-zone3
operationContext:
failureRule:
failureStatus: running
failureStrategy: retry over
retryCount: 0
idx: 2
name: modify obzone replica
targetStatus: running
task: wait obzone topology match
taskId: c04aeb28-01e7-4f85-b390-8d855b9f30e3
taskStatus: running
tasks:
- modify obzone replica
- wait obzone topology match
- wait obzone running
parameters: []
status: modify obzone replica

3. How do I do troubleshooting for ob-operator and OceanBase?

  • Generally, you need to first analyze the logs of ob-operator to locate an error. Run the following command to view the logs of ob-operator:
kubectl logs oceanbase-controller-manager-86cfc8f7bf-js95z -n oceanbase-system -c manager  | less
  • View the logs of the OBServer node
# Log on to the container of the OBServer node.
kubectl exec -it obcluster-1-zone1-8ab645f4d0f9 -n oceanbase -c observer -- bash

# The directory where the log files are located.
cd /home/admin/oceanbase/log

4. How do I fix a "stuck" resource in ob-operator?

tip

OBResourceRescue is a feature that is only available in ob-operator 2.2.0 and later. You can fix a "stuck" resource through patching the resource with K8s API before 2.2.0. Refer to kubectl patch for more information.

As ob-operator uses a state machine and task flow to manage custom resources (CRs) and their O&M operations, there may be situations where CRs are in an unexpected state. This could include continuously retrying a task flow that is bound to fail, failing to delete a resource, or mistakenly deleting a resource that needs to be recovered. In cases where a CR cannot be restored to normal through regular operations, you can use the OBResourceRescue resource to rescue the problematic CR. The OBResourceRescue resource includes four types of operations: reset, delete, retry, and skip.

A typical OBResourceRescue CR configuration is as follows:

apiVersion: oceanbase.oceanbase.com/v1alpha1
kind: OBResourceRescue
metadata:
generateName: rescue-reset- # generateName needs to be used with kubectl create -f
spec:
type: reset
targetKind: OBCluster
targetResName: test
targetStatus: running # The target status needs to be filled in when the type is reset

The key configurations are explained in the following table:

Configuration itemOptional valuesDescription
typereset, delete, retry, skipThe type of the resource rescue action
targetKindOBCluster, OBZone, OBTenant, and other CRD kind managed by ob-operatorThe kind of the resource to be rescued
targetResName/The name of the resource to be rescued
targetStatus/This field needs to be filled in when the type is reset, indicating the status of the resource after the reset

Reset

The configuration example of the typical CR above is a reset type of resource rescue. After creating this resource in the K8s cluster using the kubectl create -f command, ob-operator sets the status.status of the resource whose kind is OBCluster and name is test to running (the targetStatus set in the configuration file), and sets the status.operationContext of the resource to empty.

Delete

The configuration example of the delete type of rescue action is as follows. After creating this resource in the cluster, ob-operator clears the finalizers field of the target resource and sets the deletionTimestamp of the resource to the current time.

# ...
spec:
type: delete
targetKind: OBCluster
targetResName: test

Retry

The configuration example of the retry type of rescue action is as follows. After creating this resource in the cluster, ob-operator sets the status.operationContext.retryCount of the target resource to 0 and sets the status.operationContext.taskStatus to pending. Resources in this state will retry the current task.

# ...
spec:
type: retry
targetKind: OBCluster
targetResName: test

Skip

The configuration example of the skip type of rescue action is as follows. After creating this resource in the cluster, ob-operator directly sets the status.operationContext.taskStatus of the target resource to successful. After receiving this message, the task manager will execute the next task in the tasks field.

# ...
spec:
type: skip
targetKind: OBCluster
targetResName: test