In an earlier post,
SQL Script To Find Variable Character Values with Leading/Trailing Spaces, I talked about finding values stored in variable character columns in SQL Server 2008 (varchar and nvarchar) that contained leading and/or trailing spaces. This post is a followup that shows a possible way to fix these values and remove leading/trailing spaces.
Fair Warning: Please note that these scripts produce SQL that modifies data. Please be careful using these scripts and resulting SQL in a production enviroment. Please test, retest, and reretest before running in any production environment. Depending upon numerous factors including database server performance, data storage performance, number of rows, use of partitioning, transaction rate, etc., it may not be appropriate to eliminate leading/trailing spaces through this method. Sometimes with VLDB and high transaction databases, large set-based operations can cause significant performance and locking issues and should be avoided. In these cases, stepping out and performing per-row operations, while overall taking longer to complete, can result in much lower performance impact and much lower chance of locking.
Let's get to it.
The "Fix Script" is largely the same as the previous "Find Script" found
here. The two differences are the removal of the commented-out 'PRINT' line and the change of the final SELECT statement. The new SELECT statement at the end of the script rather than returning the results of the leading/trailing space search, returns zero to many UPDATE statements on a per varchar/nvarchar Schema.Table.Column basis that if executed would remove leading/trailing spaces. This is an example of using a SELECT statment to generate executable SQL.
There are some selectability options (see the @SchemaName, @TableName, and @ColumnName variables) built into the script where you can either show all schemas or just one particular schema, plus the same with tables and columns. To show all, use the '%' value for the @SchemaName, @TableName, and @ColumnName variables.
Please note that this script does NOT modify data in user tables - a temporary table is used for temporary storage of information. The UPDATE SQL statements generated by this script, if executed, WILL modify data and need to be tested thoroughly.
USE AdventureWorks;
SET NOCOUNT ON;
CREATE TABLE #RESULTS
(SchemaName nvarchar(4000),
TableName nvarchar(4000),
ColumnName nvarchar(4000),
ColumnDataType nvarchar(4000),
TrailingSpacesCount int,
TableCount int);
DECLARE @SQL2Execute nvarchar(4000);
-- % means all.
DECLARE @SchemaName nvarchar(4000) = '%';
DECLARE @TableName nvarchar(4000) = '%';
DECLARE @ColumnName nvarchar(4000) = '%';
DECLARE MyCursor CURSOR
FOR SELECT SQLSTR = 'INSERT INTO #RESULTS ' +
'SELECT ''' + C.TABLE_SCHEMA + ''' AS SchemaName, ' +
'''' + C.TABLE_NAME + ''' AS TableName, ' +
'''' + C.COLUMN_NAME + ''' AS ColumnName, ' +
'''' + C.DATA_TYPE + '''' + ' AS ColumnDataType, ' +
'COUNT(*) TrailingSpacesCount, ' +
'(SELECT COUNT(*) FROM [' + C.TABLE_SCHEMA + '].[' + C.TABLE_NAME + '] (NOLOCK)) TableCount ' +
'FROM [' + C.TABLE_SCHEMA + '].[' + C.TABLE_NAME + '] (NOLOCK) ' +
'WHERE [' + C.COLUMN_NAME + '] + ''X'' != ' +
'LTRIM(RTRIM([' + C.COLUMN_NAME + '])) + ''X'' ' +
'AND [' + C.COLUMN_NAME + '] IS NOT NULL;'
FROM INFORMATION_SCHEMA.TABLES T JOIN INFORMATION_SCHEMA.COLUMNS C
ON T.TABLE_CATALOG = C.TABLE_CATALOG
AND T.TABLE_SCHEMA = C.TABLE_SCHEMA
AND T.TABLE_NAME = C.TABLE_NAME
WHERE T.TABLE_TYPE = 'BASE TABLE'
AND C.TABLE_CATALOG = DB_NAME()
AND C.DATA_TYPE IN ('varchar', 'nvarchar')
AND ((@SchemaName = '%') OR (@SchemaName <> '%' AND C.TABLE_SCHEMA = @SchemaName))
AND ((@TableName = '%') OR (@TableName <> '%' AND C.TABLE_Name = @TableName))
AND ((@ColumnName = '%') OR (@ColumnName <> '%' AND C.Column_Name = @ColumnName))
ORDER BY C.TABLE_SCHEMA,
C.TABLE_NAME,
C.ORDINAL_POSITION;
OPEN MyCursor;
FETCH NEXT FROM MyCursor
INTO @SQL2Execute;
WHILE @@FETCH_STATUS = 0
BEGIN
EXECUTE(@SQL2Execute);
FETCH NEXT FROM MyCursor
INTO @SQL2Execute;
END;
CLOSE MyCursor;
DEALLOCATE MyCursor;
-- The following statement will create an UPDATE statement
-- for any varchar or nvarchar columns that contain 1 or more
-- values with leading/trailing spaces. If a varchar or nvarchar
-- column's values do not contain any leading/trailing spaces,
-- then no UPDATE statement is generated.
SELECT 'UPDATE [' + R.SchemaName + '].[' + R.TableName + '] ' +
'SET [' + R.ColumnName + '] = LTRIM(RTRIM([' + R.ColumnName + '])) ' +
'WHERE [' + R.ColumnName + '] + ''X'' != ' +
'LTRIM(RTRIM([' + R.ColumnName + '])) + ''X'' ' +
'AND [' + R.ColumnName + '] IS NOT NULL;'
FROM #RESULTS R
WHERE R.TrailingSpacesCount > 0
ORDER BY R.SchemaName,
R.TableName,
R.ColumnName;
DROP TABLE #RESULTS;
Similar to the
previous post, you may need to change the results of this script to Text and expand the maximum width of text displayed in the results pane.
Following is a sample of the output produced by the above script.
UPDATE [Sales].[SpecialOffer] SET [Category] = LTRIM(RTRIM([Category])) WHERE [Category] + 'X' != LTRIM(RTRIM([Category])) + 'X' AND [Category] IS NOT NULL;
UPDATE [Sales].[SpecialOffer] SET [Description] = LTRIM(RTRIM([Description])) WHERE [Description] + 'X' != LTRIM(RTRIM([Description])) + 'X' AND [Description] IS NOT NULL;
UPDATE [Sales].[SpecialOffer] SET [Type] = LTRIM(RTRIM([Type])) WHERE [Type] + 'X' != LTRIM(RTRIM([Type])) + 'X' AND [Type] IS NOT NULL;
UPDATE [Sales].[Store] SET [Name] = LTRIM(RTRIM([Name])) WHERE [Name] + 'X' != LTRIM(RTRIM([Name])) + 'X' AND [Name] IS NOT NULL;
You can copy and paste this code from the results into the query pane. At this point I would again strongly suggest testing on a non-production system to confirm proper operation and that you are not affecting performance in any significant way. I would also strongly suggest executing the UPDATE scripts one-at-a-time rather than as a large batch - at least initially. You may also wish to wrap each UPDATE in a explicit transaction that you can ROLLBACK or COMMIT as appropriate (the downside to this would be possibly creating locking issues). I would also suggest setting NOCOUNT back to OFF so that you are able to see the number of rows affected.
After running an UPDATE you can go back to the original find script
here, and you should see the count for that particular varchar or nvarchar Schema.Table.Column have their respective TrailingSpacesCount value drop to zero.
| SchemaName | TableName | ColumnName | ColumnDataType | TrailingSpacesCount | TableCount |
| Sales | SpecialOffer | Category | nvarchar | 0 | 16 |
| Sales | SpecialOffer | Description | nvarchar | 0 | 16 |
| Sales | SpecialOffer | Type | nvarchar | 0 | 16 |
| Sales | Store | Name | nvarchar | 0 | 701 |
As noted before these generated UPDATE statements are set-based operations and can be problematic with larger tables and with high transaction tables. If this is the case switching to a per-row update methodology may be more appropriate. The SELECT statement that generates the UPDATE statements would need to be modified to convert the WHERE clause of the UPDATE statement to use the Primary Key column(s) for the given Schema.Table which would allow for per-row operations rather than the current set-based WHERE clause. I will write this up in a future blog post.
In closing, I believe that I need to do some serious refactoring on this above code - it is functional but needs some work and enhancement.
I plan to shortly add a posting about possible ways to stop leading/trailing spaces in varchar and nvarchar columns in SQL Server 2008 from ever even happening.
If anyone has any feedback, criticism, ideas, or other information please share. Will definitely be appreciated! Thanks!