Optimization case: CASE WHEN for SQL rewriting optimization

Optimization case: CASE WHEN for SQL rewriting optimization

Guide

Today, I will share with you a SQL optimization case that is unique through SQL rewriting.

Scene to be optimized

Found the following record in SLOW QUERY LOG:

...
# Query_time: 59.503827 Lock_time: 0.000198 Rows_sent: 641227 Rows_examined: 13442472 Rows_affected: 0
...
select uid,sum(power) powerup from t1 where 
date>='2017-03-31' and 
UNIX_TIMESTAMP(STR_TO_DATE(concat(date,'',hour),'%Y-%m-%d %H'))>=1490965200 and 
UNIX_TIMESTAMP(STR_TO_DATE(concat(date,'',hour),'%Y-%m-%d %H'))<1492174801 and 
aType in (1,6,9) group by uid;

To be honest, I can't help but want to curse when I see this SQL. Which brain-dead XX dog designed it?

Even the date and hour in the date and time are separated into two columns, and then combined into a new condition when querying, it is simply powerless to complain.

Tucao return to Tucao, work should be done, who made us a DBA, SQL optimization is our good game, isn’t it~

SQL optimization road

SQL optimization ideas

Take the trouble to repeat the SQL optimization ideas again.

To optimize a SQL, generally speaking, first look at the execution plan and observe whether the index is used as much as possible. At the same time, pay attention to the number of rows that are expected to be scanned, and whether a temporary table (Using temporary) is generated or whether it needs to be sorted (Using filesort). ), find a way to eliminate these situations.

SQL performance bottleneck positioning

There is no doubt that if you want to optimize, first look at the table DDL and execution plan:

CREATE TABLE `t1` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `date` date NOT NULL DEFAULT '0000-00-00',
  `hour` char(2) NOT NULL DEFAULT '00',
  `kid` int(4) NOT NULL DEFAULT '0',
  `uid` int(11) NOT NULL DEFAULT '0',
  `aType` tinyint(2) NOT NULL DEFAULT '0',
  `src` tinyint(2) NOT NULL DEFAULT '1',
  `aid` int(11) NOT NULL DEFAULT '1',
  `acount` int(11) NOT NULL DEFAULT '1',
  `power` decimal(20,2) DEFAULT '0.00',
  PRIMARY KEY (`id`,`date`),
  UNIQUE KEY `did` (`date`,`hour`,`kid`,`uid`,`aType`,`src`,`aid`)
) ENGINE=InnoDB AUTO_INCREMENT=50486620 DEFAULT CHARSET=utf8mb4
/*!50500 PARTITION BY RANGE COLUMNS(`date`)
(PARTITION p20170316 VALUES LESS THAN ('2017-03-17') ENGINE = InnoDB,
 PARTITION p20170317 VALUES LESS THAN ('2017-03-18') ENGINE = InnoDB
...

yejr@imysql.com[myDB]> EXPLAIN select uid,sum(power) powerup from t1 where 
date>='2017-03-31' and 
UNIX_TIMESTAMP(STR_TO_DATE(concat(date,'',hour),'%Y-%m-%d %H'))>=1490965200 and 
UNIX_TIMESTAMP(STR_TO_DATE(concat(date,'',hour),'%Y-%m-%d %H'))<1492174801 and 
aType in (1,6,9) group by uid\G
*************************** 1. row ******************** *******
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: p20170324,p20170325,...all partition
         type: ALL
possible_keys: did
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 25005577
     filtered: 15.00
        Extra: Using where; Using temporary; Using filesort

Obviously, the efficiency of this SQL is very low. Full table scans, no indexes, temporary tables, and additional sorting are needed . All the unfortunate things have caught up.

Optimizing thinking

This SQL is to count the sum of the power columns that meet the conditions. Although the date column has an index, but the WHERE clause adds a function to the date column, and it is a combination of date and hour columns, then this index cannot be used Up.

Fortunately, there is a smart girl who suddenly thought (in fact, this girl is good at SQL optimization~), you can use the CASE WHEN method to transform SQL, and change it to the following:

select uid,sum(powerup+powerup1) from
(
   select uid,
          case when concat(date,'',hour) >='2017-03-24 13:00' then power else '0' end as powerup,
          case when concat(date,'',hour) <'2017-03-25 13:00' then power else '0' end as powerup1
   from t1
   where date>='2017-03-24' 
   and date <'2017-03-25'
   and aType in (1,6,9)
) a group by uid;

Is it very talented? Use CASE WHEN to transform the condition that can't be used for index directly. Look at the new SQL execution plan:

*************************** 1. row ******************** *******
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: p20170324
         type: range
possible_keys: did
          key: idx2_date_addRedType
      key_len: 4
          ref: NULL
         rows: 876375
     filtered: 30.00
        Extra: Using index condition; Using temporary; Using filesort

Look at the execution cost of this SQL:

+----------------------------+---------+
| Variable_name | Value |
+----------------------------+---------+
| Handler_read_first | 1 |
| Handler_read_key | 1834590 |
| Handler_read_last | 0 |
| Handler_read_next | 1834589 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 232276 |
| Handler_read_rnd_next | 232277 |
+----------------------------+---------+

And the information recorded by SLOW QUERY LOG:

# Query_time: 6.381254 Lock_time: 0.000166 Rows_sent: 232276 Rows_examined: 2299141 Rows_affected: 0
# Bytes_sent: 4237347 Tmp_tables: 1 Tmp_disk_tables: 0 Tmp_table_sizes: 4187168
# InnoDB_trx_id: 0
# QC_Hit: No Full_scan: No Full_join: No Tmp_table: Yes Tmp_table_on_disk: No
# Filesort: Yes Filesort_on_disk: No Merge_passes: 0
# InnoDB_IO_r_ops: 0 InnoDB_IO_r_bytes: 0 InnoDB_IO_r_wait: 0.000000
# InnoDB_rec_lock_wait: 0.000000 InnoDB_queue_wait: 0.000000
# InnoDB_pages_distinct: 9311

It doesn't seem to be ideal. Although the entire table is no longer scanned, there are still temporary tables and additional sorting . After finding a way to eliminate them, compare them.

I don’t know if you have noticed a change. The new SLOW QUERY LOG records a lot of information. This is because the Percona branch version of the plug-in is only supported. This function is really good. It can even record the detailed information of Profiling. recommend.

Let's create a new index on the uid column to see how the cost of removing the temporary table and sorting is, and see if the cost of this is lower.

yejr@imysql.com[myDB]> ALTER TABLE t1 ADD INDEX idx_uid(uid);
yejr@imysql.com[myDB]> EXPLAIN select uid,sum(powerup+powerup1) from
(
   select uid,
          case when concat(date,'',hour) >='2017-03-24 13:00' then power else '0' end as powerup,
          case when concat(date,'',hour) <'2017-03-25 13:00' then power else '0' end as powerup1
   from t1
   where date>='2017-03-24' 
   and date <'2017-03-25'
   and aType in (1,6,9)
) a group by uid\G

*************************** 1. row ******************** *******
           id: 1
  select_type: SIMPLE
        table: if_date_hour_army_count
   partitions: p20170331,p20170401...
         type: index
possible_keys: did,idx_uid
          key: idx_uid
      key_len: 4
          ref: NULL
         rows: 12701520
     filtered: 15.00
        Extra: Using where

Look at the execution cost of SQL after adding an index:

+----------------------------+---------+
| Variable_name | Value |
+----------------------------+---------+
| Handler_read_first | 1 |
| Handler_read_key | 1 |
| Handler_read_last | 0 |
| Handler_read_next | 1834589 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 0 |
| Handler_read_rnd_next | 0 |
+----------------------------+---------+

And the information recorded by SLOW QUERY LOG:

# Query_time: 5.772286 Lock_time: 0.000330 Rows_sent: 232276 Rows_examined: 1834589 Rows_affected: 0
# Bytes_sent: 4215071 Tmp_tables: 0 Tmp_disk_tables: 0 Tmp_table_sizes: 0
# InnoDB_trx_id: 0
# QC_Hit: No Full_scan: Yes Full_join: No Tmp_table: No Tmp_table_on_disk: No
# Filesort: No Filesort_on_disk: No Merge_passes: 0
# InnoDB_IO_r_ops: 0 InnoDB_IO_r_bytes: 0 InnoDB_IO_r_wait: 0.000000
# InnoDB_rec_lock_wait: 0.000000 InnoDB_queue_wait: 0.000000
# InnoDB_pages_distinct: 11470

We noticed that although the SQL scan after adding the uid column index has more data pages, the execution efficiency is actually higher, because the temporary table and extra sorting are eliminated , which can also be seen from the results of Handlerread% Obviously, it has more sequential I/O and less random I/O, so although there are more data pages that need to be scanned, the efficiency is actually faster.

postscript

Then think about whether there is room for optimization in this SQL. Obviously, it is to redesign the data table and integrate the date and hour columns together, so that there is no need to put together conditions and indexes can also be used.

Reference: https://cloud.tencent.com/developer/article/1005389 Optimization case: CASE WHEN for SQL rewriting optimization-Cloud + Community-Tencent Cloud