SQL Optimisation Techniques

Database is an essential software component for any business. It contains organisational data and allows businesses to build features to serve their customers. Here at Halodoc, we extensively use Relational Database Management Systems(RDBMS) as well NoSQL databases for our specific needs.
We have setup active and passive monitoring systems for checking slow queries in the database. We have enabled slow log and we have a performance insight monitor for passive monitoring. For active monitoring, we have alerts on CPU, memory and application behaviour. We have also created custom scripts to trigger alerts when query execution time reaches a particular threshold. These scripts also terminate queries based on predefined criteria.
This blog provides query optimisation techniques for designing good logic and extracting maximum performance from the database. Query logic impacts database server, application server, network, IO and end-user experience.
Query optimisation has many benefits, some of which are listed below.
- Minimize production issues: Every slow query requires high CPU, memory, and IOPS. We have seen most production database issues are caused by non-optimised queries.
- Performance issues: Slow query means slower response time of applications using the query, which results in poor end-user experience. It is very important to test logic/query with sufficient data before running it in production.
- Save infra cost: Unoptimised query requires more CPU, IOPS and memory. Additional load can be put on the same server if queries are optimised.
Query optimisation techniques and Common mistakes
Avoid Using Different Character Encoding:
It requires implicit conversion while making join on tables with different charset. Most of time optimiser does on perform proper implicit conversion of charset and not utilise index. We should use the same encoding across tables to utilise the better indexing feature of the DB.
Both tables below uses different CHARSETS
Table USER
CREATE TABLE `USER` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_name` varchar(25),
`phone_no` varchar(15),
PRIMARY KEY (`id`),
KEY `name_idx` (`user_name`),
KEY `phone_idx` (`phone_no`)
) ENGINE = InnoDB DEFAULT CHARSET = latin1CREATE TABLE `USER` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_name` varchar(25),
`phone_no` varchar(15),
PRIMARY KEY (`id`),
KEY `name_idx` (`user_name`),
KEY `phone_idx` (`phone_no`)
) ENGINE = InnoDB DEFAULT CHARSET = latin1Table CUSTOMER
CREATE TABLE `CUSTOMER` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`customer_name` varchar(25),
`order_date` datetime,
`mobile_no` bigint,
PRIMARY KEY (`id`),
KEY `name_idx` (`customer_name`),
KEY `mobile_idx` (`mobile_no`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8CREATE TABLE `CUSTOMER` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`customer_name` varchar(25),
`order_date` datetime,
`mobile_no` bigint,
PRIMARY KEY (`id`),
KEY `name_idx` (`customer_name`),
KEY `mobile_idx` (`mobile_no`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8In the example below a.user_name = b.customer_name compares with different encoding and hence it may not utilise indexes even-though index is present and datatype are same for user_name and customer_name.
select a.id,a.user_name,b.customer_name
from USER a inner join CUSTOMER b
on a.user_name = b.customer_nameselect a.id,a.user_name,b.customer_name
from USER a inner join CUSTOMER b
on a.user_name = b.customer_nameIt requires to alter table to fix charset or need to covert charset as show in below query
select a.id,a.user_name,b.customer_name
from USER a inner join CUSTOMER b
on convert(a.user_name using utf8)= b.customer_nameselect a.id,a.user_name,b.customer_name
from USER a inner join CUSTOMER b
on convert(a.user_name using utf8)= b.customer_nameUnderstanding of comparing columns datatypes:
In the above tables, phone_no is stored as VARCHAR in the first table and mobile_no is BIGINT in the second table. When two columns with different datatypes get compared in a query it might not utilise the index.
select a.user_name,a.phone_no, b.customer_name
from USER as a
inner join CUSTOMER as b
on a.phone_no = b.mobile_noselect a.user_name,a.phone_no, b.customer_name
from USER as a
inner join CUSTOMER as b
on a.phone_no = b.mobile_noIt is required to alter one of the column's datatype so they match.
Understanding of column datatype comparing with values:
In queries, one should avoid comparing different datatype column and value. When a query compares different datatypes, it uses implicit datatype conversion.
For example in the query below, mobile_no is a BIGINT datatype but query comparing as CHAR value.
select customer_name from CUSTOMER where mobile_no=’9876543210’select customer_name from CUSTOMER where mobile_no=’9876543210’For better utilisation of indexes, comparison should be done as the same data type as an integer to integer or varchar to varchar.
Avoid using Function-based clause in where condition:
When a query uses a function in the where or join clause on the column, it would not utilise the index.
where date(mail_sent) ='2018-01-29'
where lower(user_name)='abc'
where concat(first_name,' ',last_name)='jon bon'
inner join lower(first_name)=lower(user_name)
date(sent_time)=date(now)where date(mail_sent) ='2018-01-29'
where lower(user_name)='abc'
where concat(first_name,' ',last_name)='jon bon'
inner join lower(first_name)=lower(user_name)
date(sent_time)=date(now)One way to optimise query below:
select
count(email_sent)
from
EMAIL_TABLE
where
date(sent_time)= date(now())select
count(email_sent)
from
EMAIL_TABLE
where
date(sent_time)= date(now())is by changing date(mail_sent)=date(now) as following:
select
count(email_sent)
from
EMAIL_TABLE
where
sent_time between str_to_date(
concat(current_date, ' 00:00:00'),
'%Y-%m-%d %H:%i:%s'
)
and str_to_date(
concat(current_date, ' 23:59:59'),
'%Y-%m-%d %H:%i:%s'
)select
count(email_sent)
from
EMAIL_TABLE
where
sent_time between str_to_date(
concat(current_date, ' 00:00:00'),
'%Y-%m-%d %H:%i:%s'
)
and str_to_date(
concat(current_date, ' 23:59:59'),
'%Y-%m-%d %H:%i:%s'
)Avoid using DISTINCT and GROUP BY at the same time:
The query below has performance overhead when it uses Distinct and Group By together, GROUP BY itself makes rows distinct. Using GROUP BY as in the below query also has one drawback, it will give random values for those columns(b.customer_name) that do not have a group function.
select
distinct a.user_name,
b.customer_name,
count(a.id)
from
USER a
inner join CUSTOMER b on a.user_name = b.customer_name
group by
a.user_nameselect
distinct a.user_name,
b.customer_name,
count(a.id)
from
USER a
inner join CUSTOMER b on a.user_name = b.customer_name
group by
a.user_nameThere is no need to use DISTINCT and GROUP BY together. This query could be re-written as follows
select
a.user_name,
group_concat(b.customer_name),
count(a.id)
from
USER a
inner join CUSTOMER b on a.user_name = b.customer_name
group by
a.user_nameselect
a.user_name,
group_concat(b.customer_name),
count(a.id)
from
USER a
inner join CUSTOMER b on a.user_name = b.customer_name
group by
a.user_nameAvoid using UNION and DISTINCT at the same time:
Union itself makes distinct records, so we need not use DISTINCT with UNION as shown in the query below.
select distinct user_name from USER
UNION
select distinct customer_name from CUSTOMERselect distinct user_name from USER
UNION
select distinct customer_name from CUSTOMERAvoid selecting unnecessary columns:
Selecting unnecessary columns would be a waste of memory, CPU cycle and network. Every query should select only the required columns for better query performance.
As example query below which select all records:
select * from CUSTOMER where id = 123;select * from CUSTOMER where id = 123;It is better to select column name instead of * or extra columns.
select customer_name from CUSTOMER where id = 123;select customer_name from CUSTOMER where id = 123;Avoid using Long-running transaction:
When a DML query is executed on a large data range, during the execution time the transaction may lock rows or may lock the full table. During this lock time the database disallows concurrent transactions and other DML queries would be in a wait state, this similar to small downtime on the table access. Breaking long running transactions into smaller transaction can be more optimised while making an UPDATE or Delete query.
delete from CUSTOMER where order_date between ‘1990-01-01’ and ‘2020-12-01’delete from CUSTOMER where order_date between ‘1990-01-01’ and ‘2020-12-01’update CUSTOMER set mobile_no=18273278291 where upper(customer_name)=’abc’;update CUSTOMER set mobile_no=18273278291 where upper(customer_name)=’abc’;In the above queries the 1st transaction is deleting huge data because of long date range and the 2nd query is not utilising index and responding slower and locking the table during execution time.
Avoid subquery whenever possible:
MySQL and many databases perform slower when there is a subquery in SQL statement query. Subquery creates temp tables to store data and sometimes it creates temp tables on the disk thereby slowing the query execution. We should avoid such queries as much as possible as it is not scalable.
select
a.user_name,
b.customer_name
from
USER a
join (
select
customer_name
from
CUSTOMER
where
order_date = ‘2020 - 10 - 02’
) b on a.user_name = b.customer_nameselect
a.user_name,
b.customer_name
from
USER a
join (
select
customer_name
from
CUSTOMER
where
order_date = ‘2020 - 10 - 02’
) b on a.user_name = b.customer_nameAvoid using Order by in the Subquery:
Using Order By in subquery gives overhead as it unnecessarily file sort subquery data.
select
user_name
from
USER
where
user_name IN (
select
customer_name
from
CUSTOMER
where
order_date > ‘2020 - 10 - 02’
order by
order_date
)select
user_name
from
USER
where
user_name IN (
select
customer_name
from
CUSTOMER
where
order_date > ‘2020 - 10 - 02’
order by
order_date
)Such Order By makes sense when limiting rows is required.
select
user_name
from
USER
where
user_name IN (
select
customer_name
from
CUSTOMER
where
order_date > ‘2020 - 10 - 02’
order by
order_date desc
limit
10
)select
user_name
from
USER
where
user_name IN (
select
customer_name
from
CUSTOMER
where
order_date > ‘2020 - 10 - 02’
order by
order_date desc
limit
10
)Avoid using Subquery in IN and NOT IN:
MySQL and many databases perform slower when there is a subquery in SQL statement query. Subquery creates temp tables to store data and sometimes it creates temp tables on the disk thereby slowing the query execution.
In the query below NOT IN (select id from t2) creates a temp table.
select
user_name
from
USER
where
id NOT IN (
select
id
from
CUSTOMER
where
order_date = ‘2020 - 10 - 02’
)select
user_name
from
USER
where
id NOT IN (
select
id
from
CUSTOMER
where
order_date = ‘2020 - 10 - 02’
)We can optimise the above query in two ways.
select
group_concat(id)
from
CUSTOMER
where
order_date = ‘2020 - 10 - 02’select
group_concat(id)
from
CUSTOMER
where
order_date = ‘2020 - 10 - 02’select
user_name
from
USER
where
id NOT IN (
id get
from
1st query
)select
user_name
from
USER
where
id NOT IN (
id get
from
1st query
)Avoid joining tables with OR condition:
Joining tables with OR condition may convert the query into a cross join like query and this query would take a long time and high resources to execute.
select
a.user,
b.employee_name,
b.manager_name
from
EVENTS a
join EMPLOYEE b on a.user = b.employee_name
OR a.user = b.manager_nameselect
a.user,
b.employee_name,
b.manager_name
from
EVENTS a
join EMPLOYEE b on a.user = b.employee_name
OR a.user = b.manager_namecan be optimised by making changes to query as shown below by changing OR to UNION ALL.
select
a.user,
a.event_name,
b.employee_name,
b.manager_name
from
EVENTS a
join EMPLOYEE b on a.user = b.employee_name
UNION ALL
select
a.user,
b.employee_name,
b.manager_name
from
EVENTS a
join EMPLOYEE b on a.user = b.manager_nameselect
a.user,
a.event_name,
b.employee_name,
b.manager_name
from
EVENTS a
join EMPLOYEE b on a.user = b.employee_name
UNION ALL
select
a.user,
b.employee_name,
b.manager_name
from
EVENTS a
join EMPLOYEE b on a.user = b.manager_nameAvoid Full-Text Search
select
user_name
from
USER
where
user_name like ‘% abc %’select
user_name
from
USER
where
user_name like ‘% abc %’It is recommended to search using like 'keyword%' instead of searching using like ‘%keyword%’ so as to utilise the indexes as shown below. For a full-text search, it is advisable to use the Full-Text Index.
select
user_name
from
USER
where
user_name like ‘abc % ’select
user_name
from
USER
where
user_name like ‘abc % ’