Skip to main content

Troubleshoot ODP Connection Issues

When OceanBase Database Proxy (ODP) is used, the execution link of a request is as follows: a client sends a request to ODP, ODP routes the request to the corresponding OBServer node, the OBServer node processes the request and returns a response to ODP, and ODP forwards the response to the client.

A disconnection may occur on the link in the following cases: the client does not receive a response from ODP due to a long request processing time, login fails due to incorrect cluster or tenant information, or an internal error occurs in ODP or OceanBase Database.

Troubleshooting Procedure

We recommend that you carefully read this section. You can handle most exceptions according to logs retrieved based on trace IDs.

The troubleshooting procedure for ODP is similar to that for OBServer nodes.

For example, if a connection error occurs, check the error log.

[xiaofeng.lby@sqaobnoxdn011161204091.sa128 /home/xiaofeng.lby]
$obclient -h 127.0.0.1 -P2883 -uroot@sys#xiaofeng_91_435 -Dtest
ERROR 4669 (HY000): cluster not exist

This is not a very good example though. When the first error "cluster not exist" is returned, you already know what the problem is.

If you are not sure whether the problem is caused by ODP, you can run the grep "ret=-4669" * command in the /oceanbase/log directory. If the log directory contains many log files, replace the asterisk (*) behind grep with a specific log file name like observer.log.202412171219* based on the actual situation.

[xiaofeng.lby@sqaobnoxdn011161204091.sa128 /home/xiaofeng.lby/oceanbase/log]
$sudo grep "ret=-4669" *

If the grep command does not find any error information in the observer directory, the problem is probably caused by ODP. In this case, you can run the grep command in the /obproxy/log directory.

[xiaofeng.lby@sqaobnoxdn011161204091.sa128 /home/xiaofeng.lby/obproxy/log]
$grep "ret=-4669" *

# The output here is originally one line, and is manually split into multiple lines for display on the web page.
obproxy.log:[2024-12-17 14:34:14.024891]
WDIAG [PROXY.SM] setup_get_cluster_resource (ob_mysql_sm.cpp:1625)
[125907][Y0-00007F630AAA2A70] [lt=0] [dc=0]
cluster does not exist, this connection will disconnect
(sm_id=26403246, is_clustername_from_default=false, cluster_name=xiaofeng_91_435, ret=-4669)

You can see in the log that the cluster named xiaofeng_91_435 does not exist. Therefore, the disconnection occurs.

You can also obtain the trace ID [Y0-00007F630AAA2A70] from the preceding log. You can run the grep command based on this trace ID to obtain all logs related to this operation.

[xiaofeng.lby@sqaobnoxdn011161204091.sa128 /home/xiaofeng.lby/obproxy/log]
$grep Y0-00007F630AAA2A70 *

obproxy_diagnosis.log:[2024-12-17 14:34:14.024938] [125907][Y0-00007F630AAA2A70] [LOGIN](trace_type="LOGIN_TRACE", connection_diagnosis={cs_id:278640, ss_id:0, proxy_session_id:0, server_session_id:0, client_addr:"127.0.0.1:9988", server_addr:"*Not IP address [0]*:0", cluster_name:"xiaofeng_91_435", tenant_name:"sys", user_name:"root", error_code:-4669, error_msg:"cluster does not exist", request_cmd:"OB_MYSQL_COM_LOGIN", sql_cmd:"OB_MYSQL_COM_LOGIN", req_total_time(us):196}{internal_sql:"", login_result:"failed"})

obproxy_error.log:2024-12-17 14:34:14.024960,xiaofeng_cluster_430_proxy,,,,xiaofeng_91_435:sys:,OB_MYSQL,,,OB_MYSQL_COM_LOGIN,,failed,-4669,,194us,0us,0us,0us,Y0-00007F630AAA2A70,,127.0.0.1:9988,,0,,cluster not exist,

obproxy.log:[2024-12-17 14:34:13.584801] INFO [PROXY.NET] accept (ob_mysql_session_accept.cpp:36) [125907][Y0-00007F630AAA2A70] [lt=0] [dc=0] [ObMysqlSessionAccept:main_event] accepted connection(netvc=0x7f630aa7d2e0, client_ip={127.0.0.1:9980})
...

After a disconnection occurs, ODP generates a disconnection log in the obproxy_diagnosis.log file to record detailed information about the disconnection. The following is a disconnection log that records a login failure due to an incorrect tenant name.

# The output here is originally one line, and is manually split into multiple lines for display on the web page.
[2023-08-23 20:11:08.567425]
[109316][Y0-00007F285BADB4E0] [CONNECTION]
(trace_type="LOGIN_TRACE",
connection_diagnosis={
cs_id:1031798792, ss_id:0, proxy_session_id:0, server_session_id:0,
client_addr:"10.10.10.1:58218", server_addr:"*Not IP address [0]*:0",
cluster_name:"undefined", tenant_name:"test", user_name:"root",
error_code:-4043,
error_msg:"dummy entry is empty, please check if the tenant exists", request_cmd:"COM_SLEEP", sql_cmd:"COM_LOGIN"}{internal_sql:""})

You can guess the reason for the disconnection based on the please check if the tenant exists information in the error message.

The latter part of this topic is all "dictionary" content that you can consult when necessary. You only need to browse through the content to get a general idea.

We recommend that you add this topic to your favorites first. If an error occurs, pull the error log and search for the corresponding solution based on the key information provided in the log.

The general fields in an obproxy_diagnosis log are described as follows:

  • LOG_TIME: the time when the log was recorded, which is 2023-08-23 20:11:08.567425 in this example.

  • TID: the ID of the thread, which is 109316 in this example.

  • TRACE_ID: the trace ID, which is Y0-00007F285BADB4E0 in this example. You can associate the log with other logs based on the trace ID.

  • CONNECTION: indicates that this log is related to connection diagnostics.

  • trace_type: the diagnostic type, which varies based on the cause of disconnection. Valid values:

    • LOGIN_TRACE: indicates that the disconnection is caused by a login failure.

    • SERVER_INTERNAL_TRACE: indicates that the disconnection is caused by an internal error in OceanBase Database.

    • PROXY_INTERNAL_TRACE: indicates that the disconnection is caused by an internal error in ODP.

    • CLIENT_VC_TRACE: indicates that the disconnection is actively initiated by the client.

    • SERVER_VC_TRACE: indicates that the disconnection is actively initiated by OceanBase Database.

    • TIMEOUT_TRACE: indicates that the disconnection is caused by an execution timeout of the ODP process.

  • CS_ID: the internal ID used by ODP to identify the client connection.

  • SS_ID: the internal ID used by ODP to identify the connection between ODP and OceanBase Database.

  • PROXY_SS_ID: the ID generated by ODP to identify the client connection. This ID is passed to OceanBase Database and can be used to filter OceanBase Database logs or the sql_audit table.

  • SERVER_SS_ID: the ID generated by OceanBase Database to identify the connection between ODP and OceanBase Database.

  • CLIENT_ADDR: the IP address of the client.

  • SERVER_ADDR: the IP address of the OBServer node where an error or disconnection occurs.

  • CLUSTER_NAME: the name of the cluster.

  • TENANT_NAME: the name of the tenant.

  • USER_NAME: the username.

  • ERROR_CODE: the error code.

  • ERROR_MSG: the error message, which is the key information for diagnosing disconnections.

  • REQUEST_CMD: the type of the statement being processed by ODP, which can be an internal request.

  • SQL_CMD: the type of the user statement.

Besides the preceding general information, a diagnostic log can contain additional diagnostic information, which is subject to the diagnostic type.

General Disconnection Scenarios

This section describes several common disconnection scenarios and how to locate and resolve these disconnections.

Disconnection upon a login failure

The diagnostic type is LOGIN_TRACE. Here is a sample diagnostic log that records a disconnection caused by an incorrect tenant name during login.

[2023-09-08 10:37:21.028960] [90663][Y0-00007F8EB76544E0] [CONNECTION](trace_type="LOGIN_TRACE", connection_diagnosis={cs_id:1031798785, ss_id:0, proxy_session_id:0, server_session_id:0, client_addr:"10.10.10.1:44018", server_addr:"*Not IP address [0]*:0", cluster_name:"undefined", tenant_name:"sys", user_name:"root", error_code:-10018, error_msg:"fail to check observer version, empty result", request_cmd:"COM_SLEEP", sql_cmd:"COM_LOGIN"}{internal_sql:"SELECT ob_version() AS cluster_version"})

The additional diagnostic information is internal_sql, which indicates that an internal request is being processed by ODP.

The causes of a disconnection upon a login failure are complex. This section describes the causes and solutions from the perspectives of user operations and OceanBase Database.

The following table describes the disconnection scenarios of user operations and the corresponding solutions.

ScenarioError codeError messageSolution
The cluster name is incorrect.4669cluster xxx does not existMake sure that the corresponding cluster exists and the cluster name is correct. You can directly connect to the OBServer node and run the show parameters like 'cluster'; command for verification. The value value in the output is the name of the cluster to connect to.
The tenant name is incorrect.4043dummy entry is empty, please check if the tenant existsMake sure that the corresponding tenant exists. You can directly connect to the OBServer node as the root@sys user and execute the SELECT * FROM DBA_OB_TENANTS; statement to view all tenants in the cluster.
ODP allowlist verification fails.8205user xxx@xxx can not pass white listCheck whether ODP allowlists are correctly configured in the console. For more information, see "Allowlists" in OceanBase Cloud documentation.
OceanBase Database allowlist verification fails.1227Access deniedView the ob_tcp_invited_nodes variable to check whether OceanBase Database allowlists are correctly configured.
The number of client connections reaches the upper limit.5059too many sessionsExecute the ALTER proxyconfig SET <var_name> = <var_value>; statement to modify the ODP parameter client_max_connections to work around this issue.
ODP is configured to use the SSL protocol but a user request is initiated by using a non-SSL protocol.8004obproxy is configured to use ssl connectionChange the value of the enable_client_ssl parameter to false, which specifies not to use SSL for connections, or initiate an SSL access request.
The proxyro@sys user is used to directly access OceanBase Database.10021user proxyro is rejected while proxyro_check onYou cannot directly access OceanBase Database as the proxyro@sys user.
A cloud user uses a username in the three-segment format for access when enable_cloud_full_user_name is disabled.10021connection with cluster name and tenant name is rejected while cloud_full_user_name_check offA cloud user cannot use a username in the three-segment format for access when enable_cloud_full_user_name is disabled You can enable the enable_cloud_full_user_name parameter or access by using a regular username not in the three-segment format.
The password of the proxyro user is incorrect.10018fail to check observer version, proxyro@sys access denied, error resp { code:1045, msg:Access denied for user xxx }If the default password for the proxyro user is retained, this error will not occur. If you manually change the password of the proxyro@sys user in OceanBase Database, make sure that the value of the ODP parameter observer_sys_password is the same as the new password of the proxyro@sys user.
The configured RootService list is unavailable when ODP is started.10018fail to check observer version, empty resultDirectly connect to the OBServer node and execute the SHOW PARAMETERS LIKE 'rootservice_list'; statement to view RootService of OceanBase Database to check whether the configured server IP addresses are available when ODP is started.

The following table describes the disconnection scenarios of OceanBase Database and the corresponding solutions.

ScenarioError codeError messageSolution
The return result of a cluster information query is empty.4669cluster info is emptyDirectly connect to OceanBase Database and execute an SQL statement. Then, view the internal_sql column in the output to check whether the cluster information returned from OceanBase Database is empty.
Cluster information query fails.10018fail to check observer version
fail to check cluster info
fail to init server state
Directly connect to OceanBase Database and execute an SQL statement. Then, view the internal_sql column in the output to check whether the cluster information returned from OceanBase Database is empty.
Information query on the config server fails.10301fail to fetch root server list from config server
fail to fetch root server list from local
Manually pull the config server URL specified by the obproxy_config_server_url parameter at startup to check whether the information returned by the config server is normal.

Disconnection upon timeout

The diagnostic type is TIMEOUT_TRACE. Here is a sample diagnostic log that records a disconnection caused by the timeout of cluster information.

[2023-08-17 17:10:46.834897] [119826][Y0-00007FBF120324E0] [CONNECTION](trace_type="TIMEOUT_TRACE", connection_diagnosis={cs_id:1031798785, ss_id:7, proxy_session_id:7230691830869983235, server_session_id:3221504994, client_addr:"10.10.10.1:42468", server_addr:"10.10.10.1:21100", cluster_name:"undefined", tenant_name:"sys", user_name:"root", error_code:-10022, error_msg:"OBProxy inactivity timeout", request_cmd:"COM_SLEEP", sql_cmd:"COM_END"}{timeout:1, timeout_event:"CLIENT_DELETE_CLUSTER_RESOURCE", total_time(us):21736})

The additional fields are described as follows:

  • timeout_event: indicates the timeout event.

  • total_time: indicates the request execution time.

The following table describes how to resolve disconnections caused by different timeout events.

Timeout eventScenarioError codeRelated parameterSolution
CLIENT_DELETE_CLUSTER_RESOURCEThe cluster information is changed.10022ODP parameter cluster_expire_timeExecute the ALTER proxyconfig SET <var_name> = <var_value>; statement to modify the ODP parameter cluster_expire_time to work around this issue. The default value of cluster_expire_time is 1 day. The modification takes effect for new requests.
CLIENT_INTERNAL_CMD_TIMEOUTThe execution of an internal request times out.10022Fixed value of 30sThis timeout event is abnormal. We recommend that you contact OceanBase Technical Support for help.
CLIENT_CONNECT_TIMEOUTThe connection establishment between the client and ODP times out.10022Fixed value of 10sThis timeout event is abnormal. We recommend that you contact OceanBase Technical Support for help.
CLIENT_NET_READ_TIMEOUTA timeout event occurs when ODP waits for requested data.10022System variable net_read_timeout of OceanBase DatabaseModify the system variable net_read_timeout. Note that the modification of a global system variable does not take effect for existing connections.
CLIENT_NET_WRITE_TIMEOUTA timeout event occurs when ODP waits for a response packet.10022System variable net_write_timeout of OceanBase DatabaseModify the system variable net_write_timeout. Note that the modification of a global system variable does not take effect for existing connections.
CLIENT_WAIT_TIMEOUTThe client connection times out after being left idle for a long period during a user request.10022System variable wait_timeout of OceanBase DatabaseModify the system variable wait_timeout to work around this issue.
SERVER_QUERY_TIMEOUTA user query request times out.10022System variable ob_query_timeout of OceanBase Database and query_timeout specified in a hintModify the ob_query_timeout system variable to work around this issue.
SERVER_TRX_TIMEOUTThe transaction execution times out.10022System variable ob_trx_timeout of OceanBase DatabaseModify the ob_trx_timeout system variable to work around this issue.
SERVER_WAIT_TIMEOUTThe connection to OceanBase Database times out after being left idle for a long period during a user request.10022System variable wait_timeout of OceanBase DatabaseModify the wait_timeout system variable to work around this issue.

Disconnection initiated by OceanBase Database

The diagnostic type is SERVER_VC_TRACE. Here is a sample diagnostic log that records a disconnection when ODP fails to establish a connection with OceanBase Database.

[2023-08-10 23:35:00.132805] [32339][Y0-00007F74C9A244E0] [CONNECTION](trace_type="SERVER_VC_TRACE", connection_diagnosis={cs_id:838860809, ss_id:0, proxy_session_id:7230691830869983240, server_session_id:0, client_addr:"10.10.10.1:45765", server_addr:"", cluster_name:"undefined", tenant_name:"sys", user_name:"root", error_code:-10013, error_msg:"Fail to build connection to observer", request_cmd:"COM_QUERY", sql_cmd:"COM_HANDSHAKE"}{vc_event:"unknown event", total_time(us):2952626, user_sql:"select 1 from dual"})

The additional fields are described as follows:

  • vc_event: indicates the disconnection event. You do not need to be concerned about this field.

  • total_time: indicates the request execution time.

  • user_sql: indicates a user request.

The following table describes the scenarios of disconnection actively initiated by OceanBase Database and the corresponding solutions.

ScenarioError codeError messageSolution
ODP fails to establish a connection with OceanBase Database.10013Fail to build connection to observerPerform diagnostics based on relevant logs of OceanBase Database.
The connection is disconnected when ODP transmits a request to OceanBase Database.10016An EOS event received while proxy transferring requestPerform diagnostics based on relevant logs of OceanBase Database.
The connection is disconnected when ODP transmits the packet returned from OceanBase Database.10014An EOS event received while proxy reading responsePerform diagnostics based on relevant logs of OceanBase Database.

Note

When OceanBase Database actively disconnects from ODP, ODP cannot collect detailed information. If the status of the OBServer node configured in ODP is normal, you need to perform diagnostics based on the relevant logs of OceanBase Database.

Disconnection initiated by the client

The diagnostic type is CLIENT_VC_TRACE. Here is a sample diagnostic log that records a disconnection initiated by the client when ODP reads the request.

[2023-08-10 23:28:24.699168] [32339][Y0-00007F74C9A244E0] [CONNECTION](trace_type="CLIENT_VC_TRACE", connection_diagnosis={cs_id:838860807, ss_id:26, proxy_session_id:7230691830869983239, server_session_id:3221698209, client_addr:"10.10.10.1:44701", server_addr:"10.10.10.1:21100", cluster_name:"undefined", tenant_name:"sys", user_name:"root", error_code:-10010, error_msg:"An EOS event received from client while obproxy reading request", request_cmd:"COM_SLEEP", sql_cmd:"COM_END"}{vc_event:"VC_EVENT_EOS", total_time(us):57637, user_sql:""})

The additional fields are described as follows:

  • vc_event: indicates the disconnection event. You do not need to concern about this field.

  • total_time: indicates the request execution time.

  • user_sql: indicates a user request.

The following table describes the scenarios of disconnection actively initiated by the client.

ScenarioError codeError messageSolution
The client actively disconnects from ODP when ODP receives or sends a request.10010An EOS event received from client while obproxy reading requestPerform diagnostics based on relevant logs of the client.
The client actively disconnects from ODP when ODP processes a request.10011An EOS event received from client while obproxy handling responsePerform diagnostics based on relevant logs of the client.
The client actively disconnects from ODP when ODP returns a packet.10012An EOS event received from client while obproxy transferring responsePerform diagnostics based on relevant logs of the client.

Note

When the client is disconnected from ODP, ODP cannot collect detailed information and records only the action of the client to actively disconnect from ODP. Active disconnections can be triggered by driver timeout, initiated by middleware such as Druid, HikariCP, and Nginx, or caused by network jitters. You can perform diagnostics based on relevant logs of the client.

Disconnection upon internal errors of ODP or OceanBase Database

The diagnostic type is PROXY_INTERNAL_TRACE for disconnections caused by internal errors of ODP, and is SERVER_INTERNAL_TRACE for disconnections caused by internal errors of OceanBase Database. Here is a sample diagnostic log that records a disconnection caused by an internal error of ODP.

[2023-08-10 23:26:12.558201] [32339][Y0-00007F74C9A244E0] [CONNECTION](trace_type="PROXY_INTERNAL_TRACE", connection_diagnosis={cs_id:838860805, ss_id:0, proxy_session_id:7230691830869983237, server_session_id:0, client_addr:"10.10.10.1:44379", server_addr:"", cluster_name:"undefined", tenant_name:"sys", user_name:"root", error_code:-10019, error_msg:"OBProxy reached the maximum number of retrying request", request_cmd:"COM_QUERY", sql_cmd:"COM_QUERY"}{user_sql:"USE `ý<8f>ý<91>ý<92>`"})

user_sql is an additional field that indicates the user request SQL.

The following table describes the scenarios of disconnections caused by internal errors of ODP or OceanBase Database and the corresponding solutions.

Diagnostic typeScenarioError codeError messageSolution
PROXY_INTERNAL_TRACEThe query for tenant partition information fails.4664dummy entry is empty, disconnectThis is an unexpected error scenario. You can contact OceanBase Technical Support for help or submit your question in the Q&A forum of the OceanBase community.
PROXY_INTERNAL_TRACEThe execution of some internal requests of ODP fails.10018proxy execute internal request failed, received error resp, error_type: xxxThis is an unexpected error scenario. You can contact OceanBase Technical Support for help or submit your question in the Q&A forum of the OceanBase community.
PROXY_INTERNAL_TRACEThe number of retries in ODP reaches the upper limit.10019OBProxy reached the maximum number of retrying requestThis is an unexpected error scenario. You can contact OceanBase Technical Support for help or submit your question in the Q&A forum of the OceanBase community.
PROXY_INTERNAL_TRACEThe target session is closed in ODP.10001target session is closed, disconnectThis is an unexpected error scenario. You can contact OceanBase Technical Support for help or submit your question in the Q&A section forum of the OceanBase community.
PROXY_INTERNAL_TRACEOther unexpected error scenarios10001The diagnostic information is empty.This is an unexpected error scenario. You can contact OceanBase Technical Support for help or submit your question in the Q&A forum of the OceanBase community.
SERVER_INTERNAL_TRACEA checksum verification error occurs.10001ora fatal errorThis is an unexpected error scenario. You can contact OceanBase Technical Support for help or submit your question in the Q&A forum of the OceanBase community.
SERVER_INTERNAL_TRACEA primary/standby switchover is performed.10001primary cluster switchover to standby, disconnectDuring a primary/standby switchover, a disconnection is expected.

Other scenarios

Besides the preceding scenarios, the following disconnection scenarios are expected and recorded in diagnostic logs. The diagnostic type is PROXY_INTERNAL_TRACE.

ScenarioError codeError messageRemarks
The current session is killed.5065connection was killed by user self, cs_id: xxxThis is an expected scenario and is recorded in diagnostic logs.
Other sessions are killed.5065connection was killed by user session xxxThis is an expected scenario and is recorded in diagnostic logs.

Here is a sample diagnostic log. user_sql is an additional field that indicates the user request SQL.

[2023-08-10 23:27:15.107427] [32339][Y0-00007F74CAAE84E0] [CONNECTION](trace_type="PROXY_INTERNAL_TRACE", connection_diagnosis={cs_id:838860806, ss_id:21, proxy_session_id:7230691830869983238, server_session_id:3221695443, client_addr:"10.10.10.1:44536", server_addr:"10.10.10.1:21100", cluster_name:"undefined", tenant_name:"sys", user_name:"", error_code:-5065, error_msg:"connection was killed by user self, cs_id: 838860806", request_cmd:"COM_QUERY", sql_cmd:"COM_QUERY"}{user_sql:"kill 838860806"})

Examples

The following figure shows the general links of requests initiated by a client to OceanBase Database.

Link diagram

A request initiated by a client to OceanBase Database needs to pass multiple nodes. The client connection can be disconnected when an error occurs on any node. Therefore, when a connection is disconnected but the client does not receive any explicit error packet to indicate the cause of the disconnection, identify the node where the disconnection occurs and then find the cause based on the relevant logs on this node. Specifically, perform the following operations:

Step 1: Identify the node where the disconnection occurs

If the current ODP is capable of connection diagnostics, you can quickly identify the node where the disconnection occurs based on the obproxy_diagnosis.log file. You can quickly find the disconnection log based on information such as the username, tenant name, cluster name, thread ID (corresponding to cs_id in the log file) obtained from the driver, and the time when the disconnection occurred. Then, determine the node where the disconnection occurs based on the trace_type field. Valid values of trace_type are as follows:

  • CLIENT_VC_TRACE: indicates that the disconnection is initiated by the client.

  • SERVER_VC_TRACE: indicates that the disconnection is initiated by OceanBase Database.

  • SERVER_INTERNAL_TRACE: indicates that the disconnection is caused by an internal error in OceanBase Database.

  • PROXY_INTERNAL_TRACE: indicates that the disconnection is caused by an internal error in ODP.

  • LOGIN_TRACE: indicates that the disconnection is caused by a login failure.

  • TIMEOUT_TRACE: indicates that the disconnection is caused by a timeout.

Step 2: Identify the cause of disconnection

You can identify the cause of the disconnection based on the node where the disconnection occurs.

Disconnection initiated by the client

The default value of socketTimeout is 0 for Java Database Connectivity (JDBC), which indicates that socket timeouts will not occur. However, some clients such as Druid and MyBatis have a socket timeout control parameter. If a disconnection occurs due to a long request execution time, you can first check the value of the socket timeout control parameter. For more information, see Database connection pool configuration in OceanBase Database documentation.

  1. View basic information about the disconnection in the connection diagnostic logs of ODP.

    [2023-09-07 15:59:52.308553] [122701][Y0-00007F7071D194E0] [CONNECTION](trace_type="CLIENT_VC_TRACE", connection_diagnosis={cs_id:524328, ss_id:0, proxy_session_id:7230691833961840700, server_session_id:0, client_addr:"10.10.10.1:38877", server_addr:"10.10.10.2:50110", cluster_name:"ob1.changluo.cc.10.10.10.2", tenant_name:"mysql", user_name:"root", error_code:-10011, error_msg:"An unexpected connection event received from client while obproxy handling request", request_cmd:"COM_QUERY", sql_cmd:"COM_QUERY"}{vc_event:"VC_EVENT_EOS", total_time(us):5016353, user_sql:"select sleep(20) from dual"})

    The fields in the diagnostic information are described as follows:

    • trace_type: the diagnostic type, which is CLIENT_VC_TRACE in this example, indicating that the disconnection is initiated by the client.

    • error_msg: the error message, which is An unexpected connection event received from client while obproxy handling request in this example, indicating that the client initiates a disconnection when ODP processes a request.

    • total_time: the request execution time, which is 5016353 in this example, indicating that the total request execution time is about 5s. You can check the timeout value on the client.

  2. View the JDBC stack of the client.

    The last packet successfully received from the server was 5,016 milliseconds ago.  The last packet sent successfully to the server was 5,011 milliseconds ago.
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
    at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1129)
    at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3720)
    at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3609)
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4160)
    at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2617)
    at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2778)
    at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2819)
    at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2768)
    at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:949)
    at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:795)
    at odp.Main.main(Main.java:12)
    Caused by: java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
    at java.net.SocketInputStream.read(SocketInputStream.java:170)
    at java.net.SocketInputStream.read(SocketInputStream.java:141)
    at com.mysql.jdbc.util.ReadAheadInputStream.fill(ReadAheadInputStream.java:114)
    at com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(ReadAheadInputStream.java:161)
    at com.mysql.jdbc.util.ReadAheadInputStream.read(ReadAheadInputStream.java:189)
    at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3163)
    at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3620)
    9 more

    The stack and packet sending and receiving time indicate that the client initiates a disconnection due to socket timeout.

Disconnection initiated by ODP

ODP reads the net_write_timeout value from OceanBase Database to control the timeout value for packet transmission. The default value is 60s. In the case of extreme network environment conditions or if OceanBase Database does not return a packet after a long period of time, ODP may be disconnected due to a timeout. Here takes the scenario where a timeout occurs when ODP waits for a response packet from OceanBase Database as an example.

Determine the node where the disconnection occurs based on the diagnostic logs of ODP.

[2023-09-08 01:22:17.229436] [81506][Y0-00007F455197E4E0] [CONNECTION](trace_type="TIMEOUT_TRACE", connection_diagnosis={cs_id:1031798827, ss_id:342, proxy_session_id:7230691830869983244, server_session_id:3221753829, client_addr:"10.10.10.1:34901", server_addr:"10.10.10.1:21102", cluster_name:"undefined", tenant_name:"mysql", user_name:"root", error_code:-10022, error_msg:"OBProxy inactivity timeout", request_cmd:"COM_QUERY", sql_cmd:"COM_QUERY"}{timeout(us):6000000, timeout_event:"CLIENT_NET_WRITE_TIMEOUT", total_time(us):31165295})

The fields in the diagnostic information are described as follows:

  • trace_type: the diagnostic type, which is TIMEOUT_TRACE in this example, indicating that the disconnection occurs due to an execution timeout of ODP.

  • timeout_event: the timeout event, which is CLIENT_NET_WRITE_TIMEOUT in this example, indicating that a timeout occurs when ODP waits for a response packet from OceanBase Database.

The diagnostic information indicates that net_write_timeout is triggered. The client connection is disconnected after being left idle for more than 6s (which is not the default value). In this case, you can change the timeout period to a larger value to work around this issue.

Login failure

This section provides two scenarios.

  • Scenario 1: The OBServer node specified in the RootService list is unavailable. Here is a sample diagnostic log.

    [2023-09-08 10:37:21.028960] [90663][Y0-00007F8EB76544E0] [CONNECTION](trace_type="LOGIN_TRACE", connection_diagnosis={cs_id:1031798785, ss_id:0, proxy_session_id:0, server_session_id:0, client_addr:"10.10.10.1:44018", server_addr:"*Not IP address [0]*:0", cluster_name:"undefined", tenant_name:"sys", user_name:"root", error_code:-10018, error_msg:"fail to check observer version, empty result", request_cmd:"COM_SLEEP", sql_cmd:"COM_LOGIN"}{internal_sql:"SELECT ob_version() AS cluster_version"})

    The fields in the diagnostic information are described as follows:

    • trace_type: the diagnostic type, which is LOGIN_TRACE in this example, indicating that the disconnection is caused by a login failure.

    • internal_sql: the internal request being processed by ODP, which is SELECT ob_version() AS cluster_version in this example, indicating that ODP fails to execute this internal request during login.

    • error_msg: the error message, which is fail to check observer version, empty result in this example, indicating that the request execution failure is caused by an empty result set.

    To sum up, ODP fails to execute the internal request SELECT ob_version() AS cluster_version because the result set is empty. The SQL statement SELECT ob_version() AS cluster_version is a request for ODP to query the cluster version. ODP executes this request to verify the cluster information when you log in for the first time. If the RootService list configured when ODP is started is incorrect or if the OBServer node breaks down, the query will fail, thereby causing a login failure.

  • Scenario 2: The number of client connections reaches the upper limit of ODP.

    You can troubleshoot the issue by using the following methods:

    • Method 1: Check the connection diagnostic logs.

      [2023-09-08 11:19:26.617385] [110562][Y0-00007FE1F06AC4E0] [CONNECTION](trace_type="LOGIN_TRACE", connection_diagnosis={cs_id:1031798805, ss_id:0, proxy_session_id:0, server_session_id:0, client_addr:"127.0.0.1:40004", server_addr:"*Not IP address [0]*:0", cluster_name:"undefined", tenant_name:"sys", user_name:"root", error_code:-5059, error_msg:"Too many sessions", request_cmd:"COM_SLEEP", sql_cmd:"COM_LOGIN"}{internal_sql:""})

      The fields in the diagnostic information are described as follows:

      • trace_type: the diagnostic type, which is LOGIN_TRACE in this example, indicating that the disconnection is caused by a login failure.

      • error_msg: the error message, which is Too many session in this example, indicating that the login fails because the number of connections reaches the upper limit.

    • Method 2: Check the error message. The error message Too many sessions is returned when you run a connection command.

      $ obclient -h127.0.0.1 -P2899 -uroot@sys -Dtest -A -c 
      ERROR 1203 (42000): Too many sessions
Loading...