The Insider: Journeys in SQL

We all know what an RBAR is. It’s the single most effective way of killing performance in a database. But there are cases where something seemingly innocuous, like a user defined function, can cause SQL Server to process something differently “under the hood” which equate to the same thing. One of the major offenders is the Scalar UDF – a much over-used construct for abstracting away functionality…

Isn’t abstraction good?

In general computing terms, yes, abstraction is good. In fact, in life in general, abstraction is good. For example – the screen you’re reading this text on now – do you care how it works? Or do you just care that it works? Abstraction allows us to treat things as ‘black boxes’, in that we don’t need to care how they function, just that they function.

So what’s the problem with Scalar UDFs?

A scalar UDF is much like a function in any procedural language. It takes some input data, and gives some output data based on some logic. However, there is a major difference between the way that compiled languages and SQL treat their usage. In compiled languages, small functions will often be inlined by the compiler – meaning that the control flow at the instruction level is the same as what it would be (there or thereabouts) if the function had been written directly into the code that was calling it. However, functions are not inlined by SQL Server when executing. Let’s consider a reasonably simple UDF:

CREATE FUNCTION [dbo].[func_TestDescriptionPlain](@TypeCode int)
RETURNS varchar(25)
AS
BEGIN  
  RETURN 'Test Value ' + CONVERT(varchar, @TypeCode)
END

This UDF is relatively pointless, of course, but for the purposes of showing how it affects query performance, is just what we want. Let’s consider how it might be used, and it’s inlined equivalent:

Using scalar function:

SELECT SUM(LEN(idat.[Description])) FROM (
  SELECT tdata.ID, tdata.TypeCode, [dbo].[func_TestDescriptionPlain](tdata.TypeCode) AS Description 
    FROM [dbo].[testdata] tdata 
) idat

Inlined equivalent:

SELECT SUM(LEN(idat.[Description])) FROM (
  SELECT tdata.ID, tdata.TypeCode, 'Test Value ' + CONVERT(varchar, tdata.TypeCode) AS Description
    FROM [dbo].[testdata] tdata
) idat

These two queries produce exactly the same result. The only difference being that the inlined equivalent does not make use of the scalar function, it just does the same inline.

Let’s have a look at executing the two (please note that the scripts used are available at the end of the article).

Scalar function:
CPU time = 6015 ms, elapsed time = 6346 ms.

Inlined equivalent:
CPU time = 1125 ms, elapsed time = 1263 ms.

Ok, so that’s quite a big difference – there must be a big difference in the execution plan for those – let’s take a look:

Scalar Function:

Inline Scalar Function:

So – not a massive difference – just an extra sub-tree with no cost associated with it. In fact, SSMS won’t even show the extra tree – the plans look identical there.

So what explains the big difference?

Query Optimization. Or lack of it. Even though the scalar function we are calling is intensely simple, SQL Server calls the user defined function once per row, rather than inlining the definition. There is not much documentation available about what the overhead is in terms of calling a user-defined function, but, at the very least, the same operations involved with standard procedural code would cause an overhead. I won’t go into detail about how function calls are invoked in general computing here, that’s beyond the scope of the article.

The Insider: Journeys in SQL

Monday 25 January 2010

No comments:

Post a Comment

Atlantis Links