FAQ
1. How do I make sure that a resource is ready?
Assume that you want to view the resource status of a cluster. Run the following command:
kubectl get obclusters.oceanbase.oceanbase.com test -n oceanbase
If the status is running
in the response, the resource is ready.
# desired output
NAME STATUS AGE
test running 6m2s
2. How do I view the O&M status of a resource?
Assume that you want to view the resource status of a cluster. Run the following command:
kubectl get obclusters.oceanbase.oceanbase.com test -n oceanbase -o yaml
You can check the status and progress of O&M tasks based on values of parameters in the operationContext
section in the response.
status:
image: oceanbase/oceanbase-cloud-native:4.2.0.0-101000032023091319
obzones:
- status: delete observer
zone: obcluster-1-zone1
- status: delete observer
zone: obcluster-1-zone2
- status: delete observer
zone: obcluster-1-zone3
operationContext:
failureRule:
failureStatus: running
failureStrategy: retry over
retryCount: 0
idx: 2
name: modify obzone replica
targetStatus: running
task: wait obzone topology match
taskId: c04aeb28-01e7-4f85-b390-8d855b9f30e3
taskStatus: running
tasks:
- modify obzone replica
- wait obzone topology match
- wait obzone running
parameters: []
status: modify obzone replica
3. How do I do troubleshooting for ob-operator and OceanBase?
- Generally, you need to first analyze the logs of ob-operator to locate an error. Run the following command to view the logs of ob-operator:
kubectl logs oceanbase-controller-manager-86cfc8f7bf-js95z -n oceanbase-system -c manager | less
- View the logs of the OBServer node
# Log on to the container of the OBServer node.
kubectl exec -it obcluster-1-zone1-8ab645f4d0f9 -n oceanbase -c observer -- bash
# The directory where the log files are located.
cd /home/admin/oceanbase/log
4. How do I fix a "stuck" resource in ob-operator?
OBResourceRescue
is a feature that is only available in ob-operator 2.2.0 and later. You can fix a "stuck" resource through patching the resource with K8s API before 2.2.0. Refer to kubectl patch for more information.
As ob-operator uses a state machine and task flow to manage custom resources (CRs) and their O&M operations, there may be situations where CRs are in an unexpected state. This could include continuously retrying a task flow that is bound to fail, failing to delete a resource, or mistakenly deleting a resource that needs to be recovered. In cases where a CR cannot be restored to normal through regular operations, you can use the OBResourceRescue
resource to rescue the problematic CR. The OBResourceRescue
resource includes four types of operations: reset
, delete
, retry
, and skip
.
A typical OBResourceRescue
CR configuration is as follows:
apiVersion: oceanbase.oceanbase.com/v1alpha1
kind: OBResourceRescue
metadata:
generateName: rescue-reset- # generateName needs to be used with kubectl create -f
spec:
type: reset
targetKind: OBCluster
targetResName: test
targetStatus: running # The target status needs to be filled in when the type is reset
The key configurations are explained in the following table:
Configuration item | Optional values | Description |
---|---|---|
type | reset , delete , retry , skip | The type of the resource rescue action |
targetKind | OBCluster , OBZone , OBTenant , and other CRD kind managed by ob-operator | The kind of the resource to be rescued |
targetResName | / | The name of the resource to be rescued |
targetStatus | / | This field needs to be filled in when the type is reset, indicating the status of the resource after the reset |
Reset
The configuration example of the typical CR above is a reset type of resource rescue. After creating this resource in the K8s cluster using the kubectl create -f
command, ob-operator sets the status.status
of the resource whose kind is OBCluster
and name is test
to running
(the targetStatus
set in the configuration file), and sets the status.operationContext
of the resource to empty.
Delete
The configuration example of the delete type of rescue action is as follows. After creating this resource in the cluster, ob-operator clears the finalizers
field of the target resource and sets the deletionTimestamp
of the resource to the current time.
# ...
spec:
type: delete
targetKind: OBCluster
targetResName: test
Retry
The configuration example of the retry type of rescue action is as follows. After creating this resource in the cluster, ob-operator sets the status.operationContext.retryCount
of the target resource to 0 and sets the status.operationContext.taskStatus
to pending
. Resources in this state will retry the current task.
# ...
spec:
type: retry
targetKind: OBCluster
targetResName: test
Skip
The configuration example of the skip type of rescue action is as follows. After creating this resource in the cluster, ob-operator directly sets the status.operationContext.taskStatus
of the target resource to successful
. After receiving this message, the task manager will execute the next task in the tasks
field.
# ...
spec:
type: skip
targetKind: OBCluster
targetResName: test