Are you tired of doing self joins, just to get the value of a column in the previus/subsequent row, or maybe even worse you are doing multiple self joins. This is where you will start smiling, let’s have a look at an exampel.

I’ll start by creating some demo data:

  1. CREATE TABLE MonthlyProfit  
  2. (  
  3.     yearid int,  
  4.     Monthid int,  
  5.     Profit bigint  
  6. )  
  7. INSERT INTO MonthlyProfit (yearid, Monthid, Profit)  
  8. VALUES  
  9. (2010,1,1766), (2010,2,100), (2010,3,1500), (2010,4,15000), (2010,5,900), (2010,6,45),  
  10. (2010,7,1766), (2010,8,9100), (2010,9,-100), (2010,10,50), (2010,11,900000), (2010,12,6575)  

What I would like to do now is to write a select statement that gives me Yearid, MonthId, ProfitThisMonth and ProfitLastMonth, so that I’m able to compare my profit with last month. Before SQL Server Denali CTP3 my SQL statement would look something like this:

  1. SELECT  
  2.  t1.yearid,   
  3.  t1.Monthid,   
  4.  t1.Profit as ProfitThisMonth,   
  5.  t2.Profit as ProfitLastMonth  
  6. FROM MonthlyProfit as t1  
  7. LEFT JOIN MonthlyProfit as t2 on (t1.yearid = t2.yearid) AND (t1.Monthid = t2.Monthid+1)  


A pretty simple statement by looking at it, but as mentioned earlier this is not possible without the nasty self join.

SQL Server Denali CTP3 finally gives us LAG and LEAD. Here is the description directly stolen from BOL:

Accesses data from a previous row in the same result set without the use of a self-join in Microsoft SQL Server Code-Named “Denali”, Community Technology Preview 3 (CTP 3). LAG provides access to a row at a given physical offset that comes before the current row. Use this analytic function in a SELECT statement to compare values in the current row with values in a previous row.

The syntax for LAG and LEAD is the same

LAG (scalar_expression [,offset] [,default]) OVER ( [ partition_by_clause ] order_by_clause)

Let’s have a look at my statement now:

  1. SELECT  
  2.  yearid,   
  3.  monthid,   
  4.  Profit as ProfitThisMonth,  
  5.  LAG(Profit, 1, 0) OVER(Order by yearid, monthid) as ProfitLastMonth  
  6. FROM MonthlyProfit   

No more self joins, and a much more simple statement. Read the full documentation to the LAG and LEAD in BOL.

What about performance?

As described in an earlier blog post, I set IO statistics on and run both queries again. Here is the results:

  1. (12 row(s) affected)  
  2. Table 'MonthlyProfit'. Scan count 2, logical reads 13, physical reads 0, read-ahead reads 0,…  
  3.   
  4. (12 row(s) affected)  
  5. Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0,…  
  6. Table 'MonthlyProfit'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0,  
  7. So the conclusion is, we get a much simpler query that also does fewer reads, that's what I call a WIN-WIN situation.  

@ms1333

 
Geniiius ApS