Wednesday 10 March 2010

Objects and schema scoping

Most common objects in SQL Server are schema-scoped, that is they belong to a schema. In SQL Server 2000, schemas and database users were effectively the same thing, so objects could be called ‘user-scoped’. However, starting with SQL Server 2005 schemas and users (database principals) were separated. This means that schemas are now no longer tied directly to the users in the database.

So, schema scoped objects are those that belong directly to a schema. A schema can be thought of as a logical partition within a database which directly serves two purposes:


  • To separate logical areas of the database such that users have a clear security constraints within your databases. understanding of what objects interact with which other objects. This can make your database a lot easier to understand for new people coming in and trying to understand your schemas. When you need to hire help, it’s probably because you are busy. Do you want to spend your time explaining your schema to your new help? Probably not so much.

  • To enable security control to be applied to those logical groups to individual database users. This makes it both more convenient to apply permissions to logical groups of objects, and easier to understand the security picture in your databases.


Schema scoped objects can be found generally in the [sys].[objects] table. This table contains the following object types:

Aggregate function (CLR)
CHECK constraint
DEFAULT (constraint or stand-alone)
FOREIGN KEY constraint
SQL scalar function
Assembly (CLR) scalar-function
Assembly (CLR) table-valued function
SQL inline table-valued function
SQL Stored Procedure
Assembly (CLR) stored-procedure
PRIMARY KEY constraint
Rule (old-style, stand-alone)
Replication-filter-procedure
Synonym
Service queue
Assembly (CLR) DML trigger
SQL table-valued-function
SQL DML trigger
Table type
Table (user-defined)
UNIQUE constraint
View
Extended stored procedure

In this list, some of the object types are directly schema-scoped (i.e. they have a schema because you have chosen to place them in a schema for a reason) and some are indirectly schema-scoped (i.e. they have a schema because they are a child object of another object that belongs to a schema). While indirectly schema-scoped objects belong to their schema, they cannot have a schema different to that of their parent object.

So, the directly schema scoped objects in [sys].[objects] are:

Aggregate function (CLR)
Assembly (CLR) scalar-function
SQL scalar function
Assembly (CLR) table-valued function
SQL table-valued-function
SQL inline table-valued function
Assembly (CLR) stored-procedure
SQL Stored Procedure
Default (old-style, stand-alone)
Rule (old-style, stand-alone)
Replication-filter-procedure
Synonym
Service queue
Table (user-defined)
View
Table type
Extended stored procedure

And the indirectly schema scoped objects in [sys].[objects] are:

CHECK constraint
DEFAULT constraint
FOREIGN KEY constraint
PRIMARY KEY constraint
Assembly (CLR) DML trigger
SQL DML trigger
UNIQUE constraint

However, there are other object types which are schema-scoped that are not present in [sys].[objects]. Hang on, doesn’t the MSDN description for [sys].[objects] say:

Contains a row for each user-defined, schema-scoped object that is created within a database.

Yes it does, but XML Schema Collections, user defined SQL types and user defined CLR types are all schema-scoped, yet do not appear in [sys].[objects]. Hey, that’s life (and also why I had a hard time writing my schema comparison tool). In the next article, I will explain what the columns in [sys].[objects] mean, and begin to explain the meta-data for the various supplementary system views that provide extra information that is relevant to each different type.

Monday 25 January 2010

We all know what an RBAR is. It’s the single most effective way of killing performance in a database. But there are cases where something seemingly innocuous, like a user defined function, can cause SQL Server to process something differently “under the hood” which equate to the same thing. One of the major offenders is the Scalar UDF – a much over-used construct for abstracting away functionality…

Isn’t abstraction good?

In general computing terms, yes, abstraction is good. In fact, in life in general, abstraction is good. For example – the screen you’re reading this text on now – do you care how it works? Or do you just care that it works? Abstraction allows us to treat things as ‘black boxes’, in that we don’t need to care how they function, just that they function.

So what’s the problem with Scalar UDFs?

A scalar UDF is much like a function in any procedural language. It takes some input data, and gives some output data based on some logic. However, there is a major difference between the way that compiled languages and SQL treat their usage. In compiled languages, small functions will often be inlined by the compiler – meaning that the control flow at the instruction level is the same as what it would be (there or thereabouts) if the function had been written directly into the code that was calling it. However, functions are not inlined by SQL Server when executing. Let’s consider a reasonably simple UDF:

CREATE FUNCTION [dbo].[func_TestDescriptionPlain](@TypeCode int)
RETURNS varchar(25)
AS
BEGIN
RETURN 'Test Value ' + CONVERT(varchar, @TypeCode)
END


This UDF is relatively pointless, of course, but for the purposes of showing how it affects query performance, is just what we want. Let’s consider how it might be used, and it’s inlined equivalent:

Using scalar function:

SELECT SUM(LEN(idat.[Description])) FROM (
SELECT tdata.ID, tdata.TypeCode, [dbo].[func_TestDescriptionPlain](tdata.TypeCode) AS Description
FROM [dbo].[testdata] tdata
) idat


Inlined equivalent:

SELECT SUM(LEN(idat.[Description])) FROM (
SELECT tdata.ID, tdata.TypeCode, 'Test Value ' + CONVERT(varchar, tdata.TypeCode) AS Description
FROM [dbo].[testdata] tdata
) idat


These two queries produce exactly the same result. The only difference being that the inlined equivalent does not make use of the scalar function, it just does the same inline.

Let’s have a look at executing the two (please note that the scripts used are available at the end of the article).

Scalar function:
CPU time = 6015 ms, elapsed time = 6346 ms.

Inlined equivalent:
CPU time = 1125 ms, elapsed time = 1263 ms.

Ok, so that’s quite a big difference – there must be a big difference in the execution plan for those – let’s take a look:

Scalar Function:



Inline Scalar Function:



So – not a massive difference – just an extra sub-tree with no cost associated with it. In fact, SSMS won’t even show the extra tree – the plans look identical there.

So what explains the big difference?

Query Optimization. Or lack of it. Even though the scalar function we are calling is intensely simple, SQL Server calls the user defined function once per row, rather than inlining the definition. There is not much documentation available about what the overhead is in terms of calling a user-defined function, but, at the very least, the same operations involved with standard procedural code would cause an overhead. I won’t go into detail about how function calls are invoked in general computing here, that’s beyond the scope of the article.