27 October 2011

Multiple values in one column (aka GROUP_CONCAT for T-SQL)

There are times when you need create SELECT which includes column with multiple values from different table. Apparently there are functions for that for other databases like MySQL or Oracle. MS SQL does not have similar feature so we have to create one ourselves.

There are various way to do this but first of all SQL tables. I have own structure already in place with some thousands of records (some might say not enough, but we will see clear speed difference anyway):

Table #1 - page
id int
id_hash nvarchar(40)
title nvarchar(100)

Typical data look like:
1    004AAEAE62F8B...   Title XYZ
2    00EAF730FC50B...   Title 123

Table #2 - page_modules
id int
id_parent nvarchar(40)
module_type nvarchar(20)
module_id nvarchar(40)

Typical data look like:
1    004AAEAE62F8B...   GALLERY      01D2255A09E....
2    004AAEAE62F8B...   ARTICLE      045A6F30D05....
3    00EAF730FC50B...   ARTICLE      068FC352B6B....

And now what we want to have:
1    004AAEAE62F8B...   Title XYZ   GALLERY|ARTICLE
2    00EAF730FC50B...   Title 123   ARTICLE      

This is just example what I want to have in my SELECT results. Original table #1 has over 15 fields and table #2 is pretty much same. Table #1 has around 1000 records, table #2 has around 3 times more because of modules linked to pages.

I have tried 3 different method how to get result I need. 

Method #1: Creating T-SQL function with COALESCE

CREATE FUNCTION [dbo].[GetModules]
    @parent nvarchar(40)
RETURNS varchar(max)
    declare @output varchar(max)
    select @output = COALESCE(@output + ' ', '') + module_type
    from vwPagesModules
    where parent = @parent

    return @output

and test SELECT:

SELECT *, dbo.GetModules(id_hash) FROM vwPages AS p

                                10 runs average
Client processing time          98.2000
Total execution time            107.3000
Wait time on server replies 9.1000

Method #2: Creating SELECT within SELECT

    modules = (
        select module_type + ' ' from vwPagesModules pm
        where pm.parent = p.id_hash
        for xml path('')
from vwPages p
                                10 runs average
Client processing time          31.9000
Total execution time            46.2000
Wait time on server replies 14.3000

Method #3: Creating SELECT with CROSS APPLY

SELECT *, LEFT(module_type , LEN(module_type )) AS module_type
FROM vwPages AS p
    SELECT module_type + ' '
    FROM vwPagesModules m
    WHERE m.parent = p.id_hash
    FOR XML PATH('')
) pre_trimmed (module_type)

                                10 runs average
Client processing time          32.3000
Total execution time            49.0000
Wait time on server replies 16.7000

These tests are by no means scientific tests, I would need more data and probably optimize tables a bit but I hope that it can quickly show you what methods can be used for this functionality and how it behaves. Personally I like method #1 because you dont have to write complicated Selects but it's the slowest of pack and I believe having 100 000 records the difference would be even bigger. Method #2 and #3 are pretty much same speed.

I am planning to repeat these tests on much larger database and also implement some WHERE filtering to see how these methods cope.

Finally, all tests have been done directly on MS SQL server machine locally, using MS SQL Server Management Studio.

Please leave comments with your opinions, ideas or just to say thanks.


Anonymous said...

Hi there. There's another way to do the concat: you can use a variable and then 'select @var += a.col + ',' from table as a'. After that you just output the variable. This approach is suitable for a SQL scalar function.

Stephanie Burch said...

Grateful for you writing this blog