Introduction
In this article I am trying to discuss related to BETWEEN clause and >= AND <= comparisons operators and which one is best.
Most of the junior developer has some bad concepts related to BETWEEN clause and they told, not to use it. In this article I am demon staring the BETWEEN clause and how good it is.
What is the Difference
Let's take an example to understand it properly.
Step-1 [ Create the Base Table ]
CREATE TABLE my_TestTab
(ID INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
VAL DATETIME NOT NULL);
Step-2 [ Inserting Records ]
INSERT INTO my_TestTab
(VAL)
VALUES('05-01-2012'),
('05-07-2012'),
('05-11-2012'),
('05-15-2012'),
('05-22-2012'),
('05-23-2012'),
('05-25-2012'),
('05-27-2012'),
('05-28-2012');
Step-3 [ Now use Between Clause ]
SELECT *
FROM my_TestTab
WHERE VAL BETWEEN '05-01-2012' AND '05-28-2012';
Output:
ID VAL
1 2012-05-01 00:00:00.000
2 2012-05-07 00:00:00.000
3 2012-05-11 00:00:00.000
4 2012-05-15 00:00:00.000
5 2012-05-22 00:00:00.000
6 2012-05-23 00:00:00.000
7 2012-05-25 00:00:00.000
8 2012-05-27 00:00:00.000
9 2012-05-28 00:00:00.000
Step-4 [ Now using >= AND <= ]
SELECT *
FROM my_TestTab
WHERE VAL >='05-01-2012' AND VAL<='05-28-2012';
Output:
ID VAL
1 2012-05-01 00:00:00.000
2 2012-05-07 00:00:00.000
3 2012-05-11 00:00:00.000
4 2012-05-15 00:00:00.000
5 2012-05-22 00:00:00.000
6 2012-05-23 00:00:00.000
7 2012-05-25 00:00:00.000
8 2012-05-27 00:00:00.000
9 2012-05-28 00:00:00.000
To observe the difference between Step-3 and Step-4 just execute the SQL of Step-3 Again and see the actual execution plan.
SELECT *
FROM my_TestTab
WHERE VAL BETWEEN '05-01-2012' AND '05-28-2012';
If we see the execution plan we find that
SELECT * FROM my_TestTab WHERE [VAL] >= @1 AND [VAL]<=@2
So there is no difference between Step-3 and Step-4. Actually internally the BETWEEN clause is converted to >= and <= logical operators.
Summary
As per me using BETWEEN clause is much easier then the >= and <= operators and it looks great in complex query.
It actually depends on developer and there writing style of T-SQL.
It also depends on having a clustered index on the column you are comparing. Notice that your query plan shows a clustered index scan (effectively a table scan).
ReplyDeleteIf you had a clustered index on the column being compared, you would have an index seek, not a scan. This is one of many reasons to NOT have a clustered index on your primary key, but to instead use clustered indexes where the engine would most benefit from it.
Your example does benefit from having the dates inserted in pre-sorted order. In an OLTP system, it is highly unlikely that your dates will be entered in pre-sorted order.
First of all thanks for your interest.
DeleteI am not getting what you mean by INDEXING. Here I am trying to figure out the difference between [ >= AND <= ] and BETWEEN clause. Which one is good? As per me both are the same and if we think about the performance factors both are same (Here in this example I am not considering the pre sorted data).
If you think that there is some difference between them in term of performance, please explain it.
A blast from the past :). This one made me recollect one of my older posts: http://beyondrelational.com/modules/2/blogs/77/posts/11353/date-comparison-dissecting-between-use-between-or-comparison-greater-thanless-than-operators-with-eq.aspx
ReplyDeleteThanks "Nakul"
DeleteCan u tell me which one is better performance wise?
This question seemed to grab my attention so I set up a real world test it and wanted to share my results.
ReplyDeleteThe following test scenario is using a query to select created dates for users in one my databases.
The table is about 0.969 MB and has a good load of data for this example with 4750 rows spanning 5 years
The tables index set up is 1 clustered on the PK and 5 non-clustered indexes exist but none on the create_timestamp column and exactly as Marc pointed out in real world it is unsorted data.
Query tuning reflected that SQL shows this query missing an index on the date column. For performance the addition of the non-clustered index on this column is recommended. I did not add it but I wanted to point out that it is a recommended change and here is the details of the impact.
Missing Index Details -
The Query Processor estimates that implementing the following index could improve the query cost by 94.1297%.
CREATE NONCLUSTERED INDEX []
ON [dbo].[user_mstr] ([create_timestamp])
select create_timestamp from user_mstr
where create_timestamp between '2007-02-19 11:01:32.727' and '2011-01-28 11:55:42.313'
This returned 3683 rows.
Profiler showed the following.
CPU: 15
Reads: 132
Write: 0
Duration: 4 <=====
Same query.
This returned 3683 rows.
select create_timestamp from user_mstr
where create_timestamp >= '2007-02-19 11:01:32.727'
and create_timestamp <= '2011-01-28 11:55:42.313'
Profiler showed the following.
CPU: 0
Reads: 132
Write: 0
Duration: 6 <=====
It’s important to note that behavior was only significantly different on the first execution (prior to caching) afterward, the execution of the two were virtually the same.
Here are 4 subsequent executions.
Using Between
CPU: 0
Reads: 126
Write: 0
Duration: 2 <=====
Using >=/<=
CPU: 0
Reads: 126
Write: 0
Duration: 3 <=====
Using Between
CPU: 0
Reads: 126
Write: 0
Duration: 3 <=====
Using >=/<=
CPU: 0
Reads: 126
Write: 0
Duration: 2 <=====
In the end I agree with Nakul as using between as a best practice for readability and asthetics but with the above results it seems that Between is also a better choice for performance.
Thanks "Jeremiah Davis"
DeleteThank you for your guideline.