Skip to main content

SQL Development Specifications

Note:

The specifications in this topic are only recommended. You can determine whether to follow these specifications based on your business requirements.

Specifications for the SELECT Statement

  • For an SQL statement that uses a partitioned table, we recommend that you include the partitioning key in the predicate to avoid unnecessary full table scans.

  • We recommend that the columns specified in the ORDER BY clause be unique or unique in combination.

    • Unlike a standalone database, a distributed database may produce unstable sorting results when sorting data by a non-unique column, as it implements sorting in a fundamentally different way. For example, the result of ORDER BY c1 may be as follows in a distributed database:

      c1c2
          1        a    
      1b
      2c
      2d

      The result may also be as follows:

      c1c2
          1        b    
      1a
      2d
      2c
  • We recommend that you do not use an IN operator. Instead, try to rewrite a subquery with an IN operator by using a JOIN operator. If an IN operator is required, do not specify more than 100 collection elements after the operator.

  • If data deduplication is not required, we recommend that you use the UNION ALL operator instead of the UNION operator. The UNION ALL operator can reduce unnecessary resource consumption for sorting.

  • COUNT(*) counts the rows with a NULL value, whereas COUNT(column name) does not count rows with a NULL value. Select a proper aggregation method based on your business requirements.

  • We recommend that you do not compare or join columns of different data types. Otherwise, implicit data type conversions may be performed and indexes cannot be used. If you must compare or join columns of different data types, you can use the CAST or CONVERT function to implement explicit data type conversions.

  • We do not recommend the WHERE (c1, c2) = ('abc', 'def') syntax. First, it is not standard SQL syntax and is applicable only to specific databases, which can cause compatibility issues. Second, although the syntax is supported by specific databases, it cannot utilize indexes or be optimized, compromising query performance. We recommend that you rewrite the preceding filter condition as WHERE c1 = 'abc' AND c2 = 'def'.

  • When you set the degree of parallelism (DOP) for a query, the query shows the best performance if the DOP equals the number of table shards, and shows the second-best performance if the DOP generated by the auto DOP feature is used.

  • We recommend that you do not add a read lock by using SELECT ... FOR UPDATE (WITH CS). Otherwise, it can cause a lock wait and impact the business in the case of highly concurrent large transactions.

  • We recommend that you specify column names in the SELECT statement as needed. Try to avoid using the SELECT * syntax, as it may lead to the following issues:

    • The parsing cost of the query analyzer is increased and additional I/O costs are generated.
    • Table access by index primary key may occur and compromise query performance.
    • Adding or removing columns may cause inconsistencies with the upstream and downstream resultMap configurations.

Specifications for DML Statements

  • We recommend that you specify column names in an INSERT statement instead of using the INSERT INTO TBL VALUES(......) syntax. Otherwise, unexpected issues may occur when you add or remove columns in a table.

  • A change made by a DML statement is persisted only when the transaction is committed. A single DML statement can be a transaction. You can configure the autocommit system variable to specify whether to enable automatic commit of transactions. The default value of the system variable is 1, indicating that a DML statement is taken as a transaction and automatically committed.

  • When you execute an UPDATE statement that involves a large number of data records without using the WHERE clause, a large transaction is generated. If the transaction times out, the UPDATE statement may fail to be executed.

    • We recommend that you use a WHERE clause to control the number of rows to be updated in an UPDATE statement. This ensures that the transaction is in a proper size.
    • We recommend that you specify a WHERE clause in a DELETE statement to delete data in batches. Alternatively, you can use the TRUNCATE TABLE statement to delete all data from the table. This can improve data deletion performance. However, the TRUNCATE statement is a DDL statement and cannot be executed in another transaction.
    • You can set greater values for the ob_query_timeout and ob_trx_timeout variables to ensure that large transactions can be executed.
  • If you frequently insert, delete, or update data in a table, the buffer table issue may occur. You can enable the adaptive major compaction feature or set the table_mode parameter to troubleshoot the issue. For more information, see Adaptive major compactions.

Specifications for DDL statements

  • For more information about the online and offline DDL operations supported by OceanBase Database, see Online and offline DDL operations. When offline DDL operations are performed on a table, the table is locked and you cannot perform DML operations on the table.

  • We recommend that you perform DDL operations during off-peak hours to reduce the impact on the system.

Loading...