Bluedog67

Random Thoughts About SQL Server and My Dog Blue

About Bluedog67

My name is Stephen Horne. I am a well-trained owner of an Australian Cattle Dog named Blue. I also develop software mainly using SQL Server, ASP.NET, and C#. I love data, databases, T-SQL, and turning raw data into actionable information. My plan for this blog is to write about SQL Server and sometimes my dog Blue. Please contact me at stephen at bluedog67 dot com. Follow me on Twitter at bluedog67.

SQL Script To Find Variable Character Values with Leading/Trailing Spaces

I had always assumed that trailing spaces were always trimmed from variable characters columns in SQL Server (both varchar and nvarchar). This may have been the default behavior in previous versions of SQL Server. However, in SQL Server 2008, variable character columns do not automatically have trailing spaces trimmed. The behavior between nvarchar and varchar columns is different. For nvarchar columns, trailing spaces are never trimmed. For varchar columns it depends upon the value of ANSI_PADDING when the varchar column was created. If ANSI_PADDING was ON when the varchar column was created, trailing spaces are not trimmed (same behavior as nvarchar). If ANSI_PADDING was off when the varchar column was created, trailing spaces are trimmed (old-style behavior).

Trailing spaces in variable character columns can cause problems in areas such as equivalence matching, character length versus data length, and storage. My strong preference is that there be no trailing (or leading) spaces in variable character columns. This is a 3-step problem. First step is finding the variable character column values with trailing (and leading) spaces. That is what this blog post is about. The next step is fixing the trailing space problem. This will be my next post. And third, is stopping trailing and leading spaces in the future. This will be another future blog post.

For more information about this topic you can look in SQL BOL and at an excellent post by Jack Corbett (registration required).

The following script can be used to list out all the variable character columns (varchar and nvarchar) in a SQL Server database and the count of non-null values in each variable character column that contains leading and/or trailing spaces. The list includes the SchemaName, TableName, ColumnName, ColumnDataType, TrailingSpacesCount, and TableCount. Please note that this is fairly quick and dirty and could be enhanced and tweaked a great deal. I am working on a quick followup script that could be used to remove leading/trailing spaces. Also, I may add a script to convert empty strings in variable character columns to NULLs.

Please note that this script does NOT modify any production data.

If you wish to restrict to a single schema, table, or column you have some flexibility by setting the values of the @SchemaName, @TableName, and/or @ColumnName variables. To show all use the % (percent) value.

USE AdventureWorks;

SET NOCOUNT ON;

CREATE TABLE #RESULTS
(SchemaName nvarchar(4000),
TableName nvarchar(4000),
ColumnName nvarchar(4000),
ColumnDataType nvarchar(4000),
TrailingSpacesCount int,
TableCount int);

DECLARE @SQL2Execute nvarchar(4000);

-- % means all.
DECLARE @SchemaName nvarchar(4000) = '%';
DECLARE @TableName nvarchar(4000) = '%';
DECLARE @ColumnName nvarchar(4000) = '%';

DECLARE MyCursor CURSOR
FOR SELECT SQLSTR = 'INSERT INTO #RESULTS ' +
'SELECT ''' + C.TABLE_SCHEMA + ''' AS SchemaName, ' +
'''' + C.TABLE_NAME + ''' AS TableName, ' +
'''' + C.COLUMN_NAME + ''' AS ColumnName, ' +
'''' + C.DATA_TYPE + '''' + ' AS ColumnDataType, ' +
'COUNT(*) TrailingSpacesCount, ' +
'(SELECT COUNT(*) FROM [' + C.TABLE_SCHEMA + '].[' + C.TABLE_NAME + '] (NOLOCK)) TableCount ' +
'FROM [' + C.TABLE_SCHEMA + '].[' + C.TABLE_NAME + '] (NOLOCK) ' +
'WHERE [' + C.COLUMN_NAME + '] + ''X'' != ' +
'LTRIM(RTRIM([' + C.COLUMN_NAME + '])) + ''X'' ' +
'AND [' + C.COLUMN_NAME + '] IS NOT NULL;'
FROM INFORMATION_SCHEMA.TABLES T JOIN INFORMATION_SCHEMA.COLUMNS C
ON T.TABLE_CATALOG = C.TABLE_CATALOG
AND T.TABLE_SCHEMA = C.TABLE_SCHEMA
AND T.TABLE_NAME = C.TABLE_NAME
WHERE T.TABLE_TYPE = 'BASE TABLE'
AND C.TABLE_CATALOG = DB_NAME()
AND C.DATA_TYPE IN ('varchar', 'nvarchar')
AND ((@SchemaName = '%') OR (@SchemaName <> '%' AND C.TABLE_SCHEMA = @SchemaName))
AND ((@TableName = '%') OR (@TableName <> '%' AND C.TABLE_Name = @TableName))
AND ((@ColumnName = '%') OR (@ColumnName <> '%' AND C.Column_Name = @ColumnName))
ORDER BY C.TABLE_SCHEMA,
C.TABLE_NAME,
C.ORDINAL_POSITION;

OPEN MyCursor;

FETCH NEXT FROM MyCursor
INTO @SQL2Execute;

WHILE @@FETCH_STATUS = 0
BEGIN

EXECUTE(@SQL2Execute);
--PRINT @SQL2Execute;

FETCH NEXT FROM MyCursor
INTO @SQL2Execute;

END;

CLOSE MyCursor;
DEALLOCATE MyCursor;

SELECT *
FROM #RESULTS R
ORDER BY R.SchemaName,
R.TableName,
R.ColumnName;

DROP TABLE #RESULTS;

Following is a sample subset using the AdventureWorks2008 sample database. I dummied some of the data and added trailing spaces to some of the variable character column values.

SchemaNameTableNameColumnNameColumnDataTypeTrailingSpacesCountTableCount
SalesSpecialOfferCategorynvarchar516
SalesSpecialOfferDescriptionnvarchar216
SalesSpecialOfferTypenvarchar1116
SalesStoreNamenvarchar4701

Please note that if you switch the EXECUTE line over to the PRINT line in order to see the listing of INSERT SELECT queries you may need to switch over to Results to Text and increase the number of characters displayed by going in SSMS to Tools => Options => Query Results => SQL Server => Results to Text and bumping up "Maximum number of characters displayed in each column."

There may be better way to implement this. Will think about the problem further plus listen and integrate other ideas. Please share any feedback or ideas that you have. Thanks!

Permalink | Comments (0) | Post RSSRSS comment feed

Comments