Today, I will share with you a SQL optimization case that is unique through SQL rewriting.
Found the following record in SLOW QUERY LOG:
... # Query_time: 59.503827 Lock_time: 0.000198 Rows_sent: 641227 Rows_examined: 13442472 Rows_affected: 0 ... select uid,sum(power) powerup from t1 where date>='2017-03-31' and UNIX_TIMESTAMP(STR_TO_DATE(concat(date,'',hour),'%Y-%m-%d %H'))>=1490965200 and UNIX_TIMESTAMP(STR_TO_DATE(concat(date,'',hour),'%Y-%m-%d %H'))<1492174801 and aType in (1,6,9) group by uid;
To be honest, I can't help but want to curse when I see this SQL. Which brain-dead XX dog designed it?
Even the date and hour in the date and time are separated into two columns, and then combined into a new condition when querying, it is simply powerless to complain.
Tucao return to Tucao, work should be done, who made us a DBA, SQL optimization is our good game, isn’t it~
Take the trouble to repeat the SQL optimization ideas again.
To optimize a SQL, generally speaking, first look at the execution plan and observe whether the index is used as much as possible. At the same time, pay attention to the number of rows that are expected to be scanned, and whether a temporary table (Using temporary) is generated or whether it needs to be sorted (Using filesort). ), find a way to eliminate these situations.
There is no doubt that if you want to optimize, first look at the table DDL and execution plan:
CREATE TABLE `t1` ( `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT, `date` date NOT NULL DEFAULT '0000-00-00', `hour` char(2) NOT NULL DEFAULT '00', `kid` int(4) NOT NULL DEFAULT '0', `uid` int(11) NOT NULL DEFAULT '0', `aType` tinyint(2) NOT NULL DEFAULT '0', `src` tinyint(2) NOT NULL DEFAULT '1', `aid` int(11) NOT NULL DEFAULT '1', `acount` int(11) NOT NULL DEFAULT '1', `power` decimal(20,2) DEFAULT '0.00', PRIMARY KEY (`id`,`date`), UNIQUE KEY `did` (`date`,`hour`,`kid`,`uid`,`aType`,`src`,`aid`) ) ENGINE=InnoDB AUTO_INCREMENT=50486620 DEFAULT CHARSET=utf8mb4 /*!50500 PARTITION BY RANGE COLUMNS(`date`) (PARTITION p20170316 VALUES LESS THAN ('2017-03-17') ENGINE = InnoDB, PARTITION p20170317 VALUES LESS THAN ('2017-03-18') ENGINE = InnoDB ... yejr@imysql.com[myDB]> EXPLAIN select uid,sum(power) powerup from t1 where date>='2017-03-31' and UNIX_TIMESTAMP(STR_TO_DATE(concat(date,'',hour),'%Y-%m-%d %H'))>=1490965200 and UNIX_TIMESTAMP(STR_TO_DATE(concat(date,'',hour),'%Y-%m-%d %H'))<1492174801 and aType in (1,6,9) group by uid\G *************************** 1. row ******************** ******* id: 1 select_type: SIMPLE table: t1 partitions: p20170324,p20170325,...all partition type: ALL possible_keys: did key: NULL key_len: NULL ref: NULL rows: 25005577 filtered: 15.00 Extra: Using where; Using temporary; Using filesort
Obviously, the efficiency of this SQL is very low. Full table scans, no indexes, temporary tables, and additional sorting are needed . All the unfortunate things have caught up.
This SQL is to count the sum of the power columns that meet the conditions. Although the date column has an index, but the WHERE clause adds a function to the date column, and it is a combination of date and hour columns, then this index cannot be used Up.
Fortunately, there is a smart girl who suddenly thought (in fact, this girl is good at SQL optimization~), you can use the CASE WHEN method to transform SQL, and change it to the following:
select uid,sum(powerup+powerup1) from ( select uid, case when concat(date,'',hour) >='2017-03-24 13:00' then power else '0' end as powerup, case when concat(date,'',hour) <'2017-03-25 13:00' then power else '0' end as powerup1 from t1 where date>='2017-03-24' and date <'2017-03-25' and aType in (1,6,9) ) a group by uid;
Is it very talented? Use CASE WHEN to transform the condition that can't be used for index directly. Look at the new SQL execution plan:
*************************** 1. row ******************** ******* id: 1 select_type: SIMPLE table: t1 partitions: p20170324 type: range possible_keys: did key: idx2_date_addRedType key_len: 4 ref: NULL rows: 876375 filtered: 30.00 Extra: Using index condition; Using temporary; Using filesort
Look at the execution cost of this SQL:
+----------------------------+---------+ | Variable_name | Value | +----------------------------+---------+ | Handler_read_first | 1 | | Handler_read_key | 1834590 | | Handler_read_last | 0 | | Handler_read_next | 1834589 | | Handler_read_prev | 0 | | Handler_read_rnd | 232276 | | Handler_read_rnd_next | 232277 | +----------------------------+---------+
And the information recorded by SLOW QUERY LOG:
# Query_time: 6.381254 Lock_time: 0.000166 Rows_sent: 232276 Rows_examined: 2299141 Rows_affected: 0 # Bytes_sent: 4237347 Tmp_tables: 1 Tmp_disk_tables: 0 Tmp_table_sizes: 4187168 # InnoDB_trx_id: 0 # QC_Hit: No Full_scan: No Full_join: No Tmp_table: Yes Tmp_table_on_disk: No # Filesort: Yes Filesort_on_disk: No Merge_passes: 0 # InnoDB_IO_r_ops: 0 InnoDB_IO_r_bytes: 0 InnoDB_IO_r_wait: 0.000000 # InnoDB_rec_lock_wait: 0.000000 InnoDB_queue_wait: 0.000000 # InnoDB_pages_distinct: 9311
It doesn't seem to be ideal. Although the entire table is no longer scanned, there are still temporary tables and additional sorting . After finding a way to eliminate them, compare them.
I don’t know if you have noticed a change. The new SLOW QUERY LOG records a lot of information. This is because the Percona branch version of the plug-in is only supported. This function is really good. It can even record the detailed information of Profiling. recommend.
Let's create a new index on the uid column to see how the cost of removing the temporary table and sorting is, and see if the cost of this is lower.
yejr@imysql.com[myDB]> ALTER TABLE t1 ADD INDEX idx_uid(uid); yejr@imysql.com[myDB]> EXPLAIN select uid,sum(powerup+powerup1) from ( select uid, case when concat(date,'',hour) >='2017-03-24 13:00' then power else '0' end as powerup, case when concat(date,'',hour) <'2017-03-25 13:00' then power else '0' end as powerup1 from t1 where date>='2017-03-24' and date <'2017-03-25' and aType in (1,6,9) ) a group by uid\G *************************** 1. row ******************** ******* id: 1 select_type: SIMPLE table: if_date_hour_army_count partitions: p20170331,p20170401... type: index possible_keys: did,idx_uid key: idx_uid key_len: 4 ref: NULL rows: 12701520 filtered: 15.00 Extra: Using where
Look at the execution cost of SQL after adding an index:
+----------------------------+---------+ | Variable_name | Value | +----------------------------+---------+ | Handler_read_first | 1 | | Handler_read_key | 1 | | Handler_read_last | 0 | | Handler_read_next | 1834589 | | Handler_read_prev | 0 | | Handler_read_rnd | 0 | | Handler_read_rnd_next | 0 | +----------------------------+---------+
And the information recorded by SLOW QUERY LOG:
# Query_time: 5.772286 Lock_time: 0.000330 Rows_sent: 232276 Rows_examined: 1834589 Rows_affected: 0 # Bytes_sent: 4215071 Tmp_tables: 0 Tmp_disk_tables: 0 Tmp_table_sizes: 0 # InnoDB_trx_id: 0 # QC_Hit: No Full_scan: Yes Full_join: No Tmp_table: No Tmp_table_on_disk: No # Filesort: No Filesort_on_disk: No Merge_passes: 0 # InnoDB_IO_r_ops: 0 InnoDB_IO_r_bytes: 0 InnoDB_IO_r_wait: 0.000000 # InnoDB_rec_lock_wait: 0.000000 InnoDB_queue_wait: 0.000000 # InnoDB_pages_distinct: 11470
We noticed that although the SQL scan after adding the uid column index has more data pages, the execution efficiency is actually higher, because the temporary table and extra sorting are eliminated , which can also be seen from the results of Handlerread% Obviously, it has more sequential I/O and less random I/O, so although there are more data pages that need to be scanned, the efficiency is actually faster.
Then think about whether there is room for optimization in this SQL. Obviously, it is to redesign the data table and integrate the date and hour columns together, so that there is no need to put together conditions and indexes can also be used.