We all know the code (all over the internet) that is used to split a csv string, based on the delimiter. I good example is:

ALTER FUNCTION [dbo].[ToDelimitedTable] (@Value NVARCHAR(MAX), @Delimiter NVARCHAR(3) = ',')
RETURNS @t TABLE (id INT IDENTITY(1, 1), val NVARCHAR(MAX))
AS
    BEGIN
        DECLARE @xml XML;
        SET @xml = N'<root><r>' + REPLACE(@Value, @Delimiter, '</r><r>') + '</r></root>';
        INSERT INTO @t (val)
        SELECT
            RTRIM(LTRIM(r.value('.', 'NVARCHAR(max)'))) AS item
        FROM @xml.nodes('//root/r') AS records(r);
        RETURN;
    END;

If you run this SQL:

SELECT * FROM [dbo].[ToDelimitedTable]('aaaa,bbbb,cccc,dddd', ',');

then your output is:

id val
1 aaaa
2 bbbb
3 cccc
4 dddd

Extracting Tokens

The next challenge is extracting all tokens within a text string. What I mean with tokens is ([this]):
“Dear [Firstname]. How are you today? We would like to confirm [Product] is working as expected. Regards [Sender]

Now I know the following code is slow CHARINDEX functions, but I had to get a solution quickly and the code where I use it is within an offline process, so it’s not that I’m trying to get millisecond improvements.

ALTER FUNCTION [dbo].[ExtractTokens]
(
@Source NVARCHAR(MAX)
,@StartCharacter NVARCHAR(1)
,@EndCharacter NVARCHAR(1)
)
RETURNS @t TABLE (id INT IDENTITY(1, 1), val NVARCHAR(MAX))
AS
    BEGIN
        DECLARE @RemainingLength INT = LEN(@Source);
        DECLARE @StartPosition INT;
        DECLARE @EndPosition INT;
        DECLARE @RemainingText NVARCHAR(MAX);
        SELECT @RemainingText = @Source;
        WHILE @RemainingLength > 0
            BEGIN
                SET @StartPosition = CHARINDEX(@StartCharacter, @RemainingText, 1);
                IF (@StartPosition = 0) SET @RemainingLength = 0;
                ELSE
                    BEGIN
                        SET @RemainingText = SUBSTRING(@RemainingText, @StartPosition, LEN(@RemainingText));
                        SET @EndPosition = CHARINDEX(@EndCharacter, @RemainingText, 1);
                        IF @EndPosition = 0 SET @RemainingLength = 0;
                        ELSE
                            BEGIN
                                INSERT INTO @t (val) SELECT SUBSTRING(@RemainingText, 1, @EndPosition);
                                SET @RemainingText = SUBSTRING(@RemainingText, @EndPosition, LEN(@RemainingText));
                            END;
                    END;
            END;
        RETURN;
    END;
GO

SELECT val FROM [dbo].[ExtractTokens] ('This is some [sample] [text] wich include [tokens] wrapped in square brackets','[',']')

This is the results:

val
[sample]
[text]
[tokens]

I use this in a reporting tool, where the user can create there own reports from pre-defined views. Only showing specific columns, their conditions and sort. My Views are version controller (schema change trigger), so when db data guy change the view, the users’ version of this view my become broken, because of invalid columns. My DB version trigger will flag those user copies to be “dirty” (just due to version differences). I then come afterwards with a integrity checker, looking at the uses’s chosen columns, order by and where parts, and then see if some columns are invalid. My user query builder is enforcing square brackets around [Column Names], so without having to worry about details of all the in-between SQL, I can extract the columns, get it in a table, and then compare it with the column names I extract from sys.Columns and sys.Views.

Author

Write A Comment