SQL optimization in partitioned table scenarios

SQL optimization in partitioned table scenarios

Guide

One table is partitioned, one partition per day.

There is a query on this table. It often only queries the data of a certain day in the table, but almost all data in the entire partition is scanned every time. Is there any way to optimize it?

Scene to be optimized

There is a large table, the amount of data generated every day is about 1 million, so the table partitioning scheme is adopted, one partition per day.

Here is the DDL of the table:

CREATE TABLE `t1` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `date` date NOT NULL,
  `kid` int(11) DEFAULT '0',
  `uid` int(11) NOT NULL,
  `iid` int(11) DEFAULT '0',
  `icnt` int(8) DEFAULT '0',
  `tst` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `countp` smallint(11) DEFAULT '1',
  `isr` int(2) NOT NULL DEFAULT '0',
  `clv` int(5) NOT NULL DEFAULT '1',
  PRIMARY KEY (`id`,`date`),
  UNIQUE KEY `date` (`date`,`uid`,`iid`),
  KEY `date_2` (`date`,`kid`)
) ENGINE=InnoDB AUTO_INCREMENT=3180686682 DEFAULT CHARSET=utf8mb4
/*!50500 PARTITION BY RANGE COLUMNS(`date`)
(PARTITION p20161201 VALUES LESS THAN ('2016-12-02') ENGINE = InnoDB,
 PARTITION p20161202 VALUES LESS THAN ('2016-12-03') ENGINE = InnoDB,
 PARTITION p20161203 VALUES LESS THAN ('2016-12-04') ENGINE = InnoDB,
...

The following slow queries often occur on this table:

SELECT ... FROM `t1` WHERE `date` = '2017-04-01' AND `icnt`> 300 AND `id` = '801301';

SQL optimization road

SQL optimization ideas

Want to optimize a SQL, it is to look at the general implementation plan, to observe whether or not to use the index as much as possible , at the same time pay attention to the number of scanning lines is expected , as well as whether or not a temporary table (Using temporary) or whether you need to sort (Using filesort ) , find a way to eliminate these situations.

Further optimization strategies may require adjustment of program code logic, or even technical architecture or business requirements. This action is relatively large. Generally, core problems on non-core systems will not be so violent. In most cases, you still need to rely on DBA as much as possible. Use your ingenuity to solve it.

SQL performance bottleneck positioning

Now, let's look at the execution plan of this SQL:

yejr@imysql.com[myDB]> EXPLAIN PARTITIONS SELECT ... FROM `t1` WHERE 
  `date` = '2017-03-02' AND `icnt`> 100 AND `iid` = '502302'\G
*************************** 1. row ******************** *******
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: p20170302
         type: range
possible_keys: date,date_2
          key: date
      key_len: 3
          ref: const
         rows: 9384602
        Extra: Using where

This execution plan looks good, with indexes available, no temporary tables, and no filesort. However, we also noticed that the expected number of rows to be scanned is still quite large rows: 9384602, and all data in the entire partition is scanned. No wonder the efficiency is not high, it is always SLOW QUERY.

Optimizing thinking

We noticed that this SQL always needs to query the data of a certain day. This table has been partitioned by day. Is it possible to ignore the time condition in the WHERE clause ?

Also, since the date condition has been removed, looking at the table DDL, it seems that there is no suitable index for the remaining conditions, right?

So, we try to create a new index:

yejr@imysql.com[myDB]> ALTER TABLE t1 ADD INDEX iid (iid, icnt);

Then, transform the SQL into the following, and then look at the execution plan:

yejr@imysql.com[myDB]> EXPLAIN PARTITIONS SELECT ... FROM `t1` partition(p2017030) WHERE 
  `icnt`> 100 AND `iid` = '502302'\G
*************************** 1. row ******************** *******
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: p20170302
         type: ref
possible_keys: date,date_2,iid
          key: iid
      key_len: 10
          ref: const
         rows: 7800
        Extra: Using where
This optimizes the effect, leveraging.

In fact, if the partition is not mandatory, the optimization effect can be achieved:

yejr@imysql.com[myDB]> EXPLAIN PARTITIONS SELECT ... FROM `t1` WHERE 
  `date` = '2017-03-02' AND `icnt`> 100 AND `iid` = '502302'\G
*************************** 1. row ******************** *******
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: p20170302
         type: ref
possible_keys: date,date_2,iid
          key: iid
      key_len: 10
          ref: NULL
         rows: 7800
        Extra: Using where

postscript

The vast majority of SQL is completed by simple methods such as adding indexes and appropriately adjusting the SQL code (for example, adjusting the order of driving tables).

To say a few more words, when you encounter SQL optimization performance bottlenecks and want to ask the technical group for advice, please provide a few necessary information first:

  • Table DDL
  • General table statistics, you can execute SHOW TABLE STATUS LIKE't1' to view
  • Table index distribution information, execute SHOW INDEX FROM t1 to view
  • If the SQL in question and the corresponding execution plan don't have this information, don't bother others.
Reference: https://cloud.tencent.com/developer/article/1005390 SQL optimization in the partition table scenario-Cloud + Community-Tencent Cloud