SQL Optimisation Techniques

Database is an essential software component for any business. It contains organisational data and allows businesses to build features to serve their customers. Here at Halodoc, we extensively use Relational Database Management Systems(RDBMS) as well NoSQL databases for our specific needs.

We have setup active and passive monitoring systems for checking slow queries in the database. We have enabled slow log and we have a performance insight monitor for passive monitoring. For active monitoring, we have alerts on CPU, memory and application behaviour. We have also created custom scripts to trigger alerts when query execution time reaches a particular threshold. These scripts also terminate queries based on predefined criteria.

This blog provides query optimisation techniques for designing good logic and extracting maximum performance from the database. Query logic impacts database server, application server, network, IO and end-user experience.

Query optimisation has many benefits, some of which are listed below.

Minimize production issues: Every slow query requires high CPU, memory, and IOPS. We have seen most production database issues are caused by non-optimised queries.
Performance issues: Slow query means slower response time of applications using the query, which results in poor end-user experience. It is very important to test logic/query with sufficient data before running it in production.
Save infra cost: Unoptimised query requires more CPU, IOPS and memory. Additional load can be put on the same server if queries are optimised.

Query optimisation techniques and Common mistakes

Avoid Using Different Character Encoding:

It requires implicit conversion while making join on tables with different charset. Most of time optimiser does on perform proper implicit conversion of charset and not utilise index. We should use the same encoding across tables to utilise the better indexing feature of the DB.

Both tables below uses different CHARSETS

Table `USER`

sql

CREATE TABLE `USER` (
  `id` int(11) NOT NULL AUTO_INCREMENT, 
  `user_name` varchar(25), 
  `phone_no` varchar(15), 
  PRIMARY KEY (`id`), 
  KEY `name_idx` (`user_name`), 
  KEY `phone_idx` (`phone_no`)
) ENGINE = InnoDB DEFAULT CHARSET = latin1

CREATE TABLE `USER` (
  `id` int(11) NOT NULL AUTO_INCREMENT, 
  `user_name` varchar(25), 
  `phone_no` varchar(15), 
  PRIMARY KEY (`id`), 
  KEY `name_idx` (`user_name`), 
  KEY `phone_idx` (`phone_no`)
) ENGINE = InnoDB DEFAULT CHARSET = latin1

Table `CUSTOMER`

sql

CREATE TABLE `CUSTOMER` (
  `id` int(11) NOT NULL AUTO_INCREMENT, 
  `customer_name` varchar(25), 
  `order_date` datetime, 
  `mobile_no` bigint, 
  PRIMARY KEY (`id`), 
  KEY `name_idx` (`customer_name`), 
  KEY `mobile_idx` (`mobile_no`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8

CREATE TABLE `CUSTOMER` (
  `id` int(11) NOT NULL AUTO_INCREMENT, 
  `customer_name` varchar(25), 
  `order_date` datetime, 
  `mobile_no` bigint, 
  PRIMARY KEY (`id`), 
  KEY `name_idx` (`customer_name`), 
  KEY `mobile_idx` (`mobile_no`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8

In the example below a.user_name = b.customer_name compares with different encoding and hence it may not utilise indexes even-though index is present and datatype are same for user_name and customer_name.

sql

select a.id,a.user_name,b.customer_name
from USER a inner join CUSTOMER b
on a.user_name = b.customer_name

select a.id,a.user_name,b.customer_name
from USER a inner join CUSTOMER b
on a.user_name = b.customer_name

It requires to alter table to fix charset or need to covert charset as show in below query

sql

select a.id,a.user_name,b.customer_name
from USER a inner join CUSTOMER b
on convert(a.user_name using utf8)= b.customer_name

select a.id,a.user_name,b.customer_name
from USER a inner join CUSTOMER b
on convert(a.user_name using utf8)= b.customer_name

Understanding of comparing columns datatypes:

In the above tables, phone_no is stored as VARCHAR in the first table and mobile_no is BIGINT in the second table. When two columns with different datatypes get compared in a query it might not utilise the index.

sql

select a.user_name,a.phone_no, b.customer_name
from  USER as a 
inner join CUSTOMER as b
on a.phone_no = b.mobile_no

select a.user_name,a.phone_no, b.customer_name
from  USER as a 
inner join CUSTOMER as b
on a.phone_no = b.mobile_no

It is required to alter one of the column's datatype so they match.

Understanding of column datatype comparing with values:

In queries, one should avoid comparing different datatype column and value. When a query compares different datatypes, it uses implicit datatype conversion.

For example in the query below, mobile_no is a BIGINT datatype but query comparing as CHAR value.

sql

select customer_name from CUSTOMER where mobile_no=’9876543210’

select customer_name from CUSTOMER where mobile_no=’9876543210’

For better utilisation of indexes, comparison should be done as the same data type as an integer to integer or varchar to varchar.

Avoid using Function-based clause in where condition:

When a query uses a function in the where or join clause on the column, it would not utilise the index.

sql

where date(mail_sent) ='2018-01-29'

where lower(user_name)='abc'

where concat(first_name,' ',last_name)='jon bon'

inner join lower(first_name)=lower(user_name)

date(sent_time)=date(now)

where date(mail_sent) ='2018-01-29'

where lower(user_name)='abc'

where concat(first_name,' ',last_name)='jon bon'

inner join lower(first_name)=lower(user_name)

date(sent_time)=date(now)

One way to optimise query below:

sql

select 
  count(email_sent) 
from 
  EMAIL_TABLE 
where 
  date(sent_time)= date(now())

select 
  count(email_sent) 
from 
  EMAIL_TABLE 
where 
  date(sent_time)= date(now())

is by changing date(mail_sent)=date(now) as following:

sql

select 
  count(email_sent) 
from 
  EMAIL_TABLE 
where 
  sent_time between str_to_date(
    concat(current_date, ' 00:00:00'), 
    '%Y-%m-%d %H:%i:%s'
  ) 
  and str_to_date(
    concat(current_date, ' 23:59:59'), 
    '%Y-%m-%d %H:%i:%s'
  )

select 
  count(email_sent) 
from 
  EMAIL_TABLE 
where 
  sent_time between str_to_date(
    concat(current_date, ' 00:00:00'), 
    '%Y-%m-%d %H:%i:%s'
  ) 
  and str_to_date(
    concat(current_date, ' 23:59:59'), 
    '%Y-%m-%d %H:%i:%s'
  )

Avoid using `DISTINCT` and `GROUP BY` at the same time:

The query below has performance overhead when it uses Distinct and Group By together, GROUP BY itself makes rows distinct. Using GROUP BY as in the below query also has one drawback, it will give random values for those columns(b.customer_name) that do not have a group function.

sql

select 
  distinct a.user_name, 
  b.customer_name, 
  count(a.id) 
from 
  USER a 
  inner join CUSTOMER b on a.user_name = b.customer_name 
group by 
  a.user_name

select 
  distinct a.user_name, 
  b.customer_name, 
  count(a.id) 
from 
  USER a 
  inner join CUSTOMER b on a.user_name = b.customer_name 
group by 
  a.user_name

There is no need to use DISTINCT and GROUP BY together. This query could be re-written as follows

sql

select 
  a.user_name, 
  group_concat(b.customer_name), 
  count(a.id) 
from 
  USER a 
  inner join CUSTOMER b on a.user_name = b.customer_name 
group by 
  a.user_name

select 
  a.user_name, 
  group_concat(b.customer_name), 
  count(a.id) 
from 
  USER a 
  inner join CUSTOMER b on a.user_name = b.customer_name 
group by 
  a.user_name

Avoid using `UNION` and `DISTINCT` at the same time:

Union itself makes distinct records, so we need not use DISTINCT with UNION as shown in the query below.

sql

select distinct user_name from USER 
UNION
select distinct customer_name from CUSTOMER

select distinct user_name from USER 
UNION
select distinct customer_name from CUSTOMER

Avoid selecting unnecessary columns:

Selecting unnecessary columns would be a waste of memory, CPU cycle and network. Every query should select only the required columns for better query performance.

As example query below which select all records:

sql

select * from CUSTOMER where id = 123;

select * from CUSTOMER where id = 123;

It is better to select column name instead of * or extra columns.

sql

select customer_name from CUSTOMER where id = 123;

select customer_name from CUSTOMER where id = 123;

Avoid using Long-running transaction:

When a DML query is executed on a large data range, during the execution time the transaction may lock rows or may lock the full table. During this lock time the database disallows concurrent transactions and other DML queries would be in a wait state, this similar to small downtime on the table access. Breaking long running transactions into smaller transaction can be more optimised while making an UPDATE or Delete query.

sql

delete from CUSTOMER where order_date between ‘1990-01-01’ and ‘2020-12-01’

delete from CUSTOMER where order_date between ‘1990-01-01’ and ‘2020-12-01’

sql

update CUSTOMER set mobile_no=18273278291 where upper(customer_name)=’abc’;

update CUSTOMER set mobile_no=18273278291 where upper(customer_name)=’abc’;

In the above queries the 1st transaction is deleting huge data because of long date range and the 2nd query is not utilising index and responding slower and locking the table during execution time.

Avoid subquery whenever possible:

MySQL and many databases perform slower when there is a subquery in SQL statement query. Subquery creates temp tables to store data and sometimes it creates temp tables on the disk thereby slowing the query execution. We should avoid such queries as much as possible as it is not scalable.

sql

select 
  a.user_name, 
  b.customer_name 
from 
  USER a 
  join (
    select 
      customer_name 
    from 
      CUSTOMER 
    where 
      order_date = ‘2020 - 10 - 02’
  ) b on a.user_name = b.customer_name

select 
  a.user_name, 
  b.customer_name 
from 
  USER a 
  join (
    select 
      customer_name 
    from 
      CUSTOMER 
    where 
      order_date = ‘2020 - 10 - 02’
  ) b on a.user_name = b.customer_name

Avoid using `Order by` in the Subquery:

Using Order By in subquery gives overhead as it unnecessarily file sort subquery data.

sql

select 
  user_name 
from 
  USER 
where 
  user_name IN (
    select 
      customer_name 
    from 
      CUSTOMER 
    where 
      order_date > ‘2020 - 10 - 02’ 
    order by 
      order_date
  )

select 
  user_name 
from 
  USER 
where 
  user_name IN (
    select 
      customer_name 
    from 
      CUSTOMER 
    where 
      order_date > ‘2020 - 10 - 02’ 
    order by 
      order_date
  )

Such Order By makes sense when limiting rows is required.

sql

select 
  user_name 
from 
  USER 
where 
  user_name IN (
    select 
      customer_name 
    from 
      CUSTOMER 
    where 
      order_date > ‘2020 - 10 - 02’ 
    order by 
      order_date desc 
    limit 
      10
  )

select 
  user_name 
from 
  USER 
where 
  user_name IN (
    select 
      customer_name 
    from 
      CUSTOMER 
    where 
      order_date > ‘2020 - 10 - 02’ 
    order by 
      order_date desc 
    limit 
      10
  )

Avoid using Subquery in `IN` and `NOT IN`:

In the query below NOT IN (select id from t2) creates a temp table.

sql

select 
  user_name 
from 
  USER 
where 
  id NOT IN (
    select 
      id 
    from 
      CUSTOMER 
    where 
      order_date = ‘2020 - 10 - 02’
  )

select 
  user_name 
from 
  USER 
where 
  id NOT IN (
    select 
      id 
    from 
      CUSTOMER 
    where 
      order_date = ‘2020 - 10 - 02’
  )

We can optimise the above query in two ways.

sql

select 
  group_concat(id) 
from 
  CUSTOMER 
where 
  order_date = ‘2020 - 10 - 02’

select 
  group_concat(id) 
from 
  CUSTOMER 
where 
  order_date = ‘2020 - 10 - 02’

sql

select 
  user_name 
from 
  USER 
where 
  id NOT IN (
    id get 
    from 
      1st query
  )

select 
  user_name 
from 
  USER 
where 
  id NOT IN (
    id get 
    from 
      1st query
  )

Avoid joining tables with `OR` condition:

Joining tables with OR condition may convert the query into a cross join like query and this query would take a long time and high resources to execute.

sql

select 
  a.user, 
  b.employee_name, 
  b.manager_name 
from 
  EVENTS a 
  join EMPLOYEE b on a.user = b.employee_name 
  OR a.user = b.manager_name

select 
  a.user, 
  b.employee_name, 
  b.manager_name 
from 
  EVENTS a 
  join EMPLOYEE b on a.user = b.employee_name 
  OR a.user = b.manager_name

can be optimised by making changes to query as shown below by changing OR to UNION ALL.

sql

select 
  a.user, 
  a.event_name, 
  b.employee_name, 
  b.manager_name 
from 
  EVENTS a 
  join EMPLOYEE b on a.user = b.employee_name 
UNION ALL 
select 
  a.user, 
  b.employee_name, 
  b.manager_name 
from 
  EVENTS a 
  join EMPLOYEE b on a.user = b.manager_name

select 
  a.user, 
  a.event_name, 
  b.employee_name, 
  b.manager_name 
from 
  EVENTS a 
  join EMPLOYEE b on a.user = b.employee_name 
UNION ALL 
select 
  a.user, 
  b.employee_name, 
  b.manager_name 
from 
  EVENTS a 
  join EMPLOYEE b on a.user = b.manager_name

Avoid Full-Text Search

sql

select 
  user_name 
from 
  USER 
where 
  user_name like ‘% abc %’

select 
  user_name 
from 
  USER 
where 
  user_name like ‘% abc %’

It is recommended to search using like 'keyword%' instead of searching using like ‘%keyword%’ so as to utilise the indexes as shown below. For a full-text search, it is advisable to use the Full-Text Index.

sql

select 
  user_name 
from 
  USER 
where 
  user_name like ‘abc % ’

select 
  user_name 
from 
  USER 
where 
  user_name like ‘abc % ’

SQL Optimisation Techniques ​

Query optimisation techniques and Common mistakes ​

Avoid Using Different Character Encoding: ​

Table USER ​

Table CUSTOMER ​

Understanding of comparing columns datatypes: ​

Avoid using Function-based clause in where condition: ​

Avoid using DISTINCT and GROUP BY at the same time: ​

Avoid using UNION and DISTINCT at the same time: ​

Avoid selecting unnecessary columns: ​

Avoid subquery whenever possible: ​

Avoid using Order by in the Subquery: ​

Avoid using Subquery in IN and NOT IN: ​

Avoid joining tables with OR condition: ​

Avoid Full-Text Search ​