Thursday, December 6, 2007
SQL Server Support with VMWare
As of November 2007, Microsoft says they do not support third party software virtualization. They require that you reproduce the issue in a non-virtualized environment before they will help you. Here's the link confirming that:
http://support.microsoft.com/kb/897615
Sunday, November 25, 2007
Tracing Events Appropriately
Completed events have all necessary performance information. The starting events do not have important performance statistics because they haven't been collected yet. This will reduce the load of tracing and give you less data to filter out later.
Sunday, November 11, 2007
Performance Tuning and the Heisenberg Uncertainty Principle
I'm reluctant to immediately resort to the SQL Server tracing utilities because of the performance load that they cause. Sometimes, I explain to the Heisenberg Uncertainty Principle:
http://en.wikipedia.org/wiki/Uncertainty_principle
and similarly the Observer Effect:
http://en.wikipedia.org/wiki/Observer_effect
Developed by Werner Heisenberg in 1927, this principal states that when you measure something, there is uncertainty caused by the measuring process. For instance, if we write our trace logs to a file and we're looking for disk contention, the tracing itself might be contributing to it. The uncertainty might also be caused by the measuring logic. We might be looking at excessive disk contention and determine that we need faster disks. But what if memory paging is causing the pages to be written to disk? In that case, our measuring logic was flawed, and we would need to add more RAM to resolve the performance bottlenecks.
So before I start SQL tracing, I like to use other tools first. Task Manager, System Monitor, Activity Monitor, and DMVs will tell us an awful lot about a server's performance before tracing is even needed. This allows me to observe the server's performance with an open mind, before using invasive tracing utilities.
Sunday, October 21, 2007
Max Column Length
DECLARE ColumnList CURSOR
FOR
SELECT c.[name] AS ColumnName
FROM sys.Columns c
JOIN sys.Tables t
ON c.Object_ID = t.Object_ID
WHERE t.Name = 'reyAASerWIP'
DECLARE @ColumnName varchar(30)
OPEN ColumnList
FETCH NEXT FROM ColumnList INTO @ColumnName
WHILE @@FETCH_STATUS = 0
BEGIN
PRINT @ColumnName
EXECUTE('SELECT MAX(LEN(RTRIM(' + @ColumnName+ '))) AS ' + @ColumnName + ' FROM reyAASerWIP ')
FETCH NEXT FROM ColumnList INTO @ColumnName
END
CLOSE ColumnList
DEALLOCATE ColumnList
Sunday, October 14, 2007
Importing CSVs with SSIS
The CSV import process is much more difficult with SSIS than it was for DTS. Scanning the MS newsgroups it looks like many of the problems will be corrected in SQL Server 2008. That's a long ways off, so let's look at some of the issues and how we can solve them.
Our CSV looks like this:
1- Name, Address, PhoneNumber, InvoiceAmount
2- "Fred Flintstone", "365 Stone Ave, BedRock, StoneAge", "928-555-1212", "25.00"
3- "Barney Rubble, "152 Granite Street, BedRock, StoneAge", "928.555.6326"
4- "Mister Slate", "52 Diamond Road", BedRock, StoneAge", "928.555.5332", "14.00"
The first problem is that there are commas enclosed by quotation marks in the Address field. SSIS will put BedRock as the PhoneNumber and StoneAge as the InvoiceAmount. This is very easy to solve. In the properties window of the Flat File Connection Manager, you will see a textbox labeled "Text qualifier." Just put a quotation mark in that field and it will see that the address column contains commas.
The next issue is that record #3 does not have the appropriate number of columns. It is one column short. DTS would handle this by inserting NULLs in the missing fields and then moving on to the next record. SSIS chokes on this. It will attempt to put "Mister Slate" in the InvoiceAmount field of record #3.
To solve this issue, I wrote the following script. I had to write it in VB.NET, since SSIS does not allow C# quite yet.
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime
Imports System.IO
Imports System.Text
Public Class ScriptMain
Public Sub Main()
Dim myLineCount As Int32 = 0
Dim myFilePath As String = Dts.Connections("FlintstoneCSV").ConnectionString
Dim myNewFilePath As String = Dts.Connections("NewFlintstoneCSV").ConnectionString
Dim myNewFile As FileInfo = New FileInfo(myNewFilePath)
Dim myNewFileStream As FileStream = myNewFile.Create()
Dim myFileStream As FileStream = New FileStream(myFilePath, FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.ReadWrite)
Dim myStreamReader As New StreamReader(myFileStream)
Dim myStreamWriter As StreamWriter = New StreamWriter(myNewFileStream, Encoding.ASCII)
Dim myLine As String
Try
myLine = myStreamReader.ReadLine
While Not myStreamReader.EndOfStream
Dim myOffSet As Int32 = 1
Dim myBeginIndex As Int32
Dim myEndIndex As Int32
Dim myCount As Int32 = 0
Dim myColumnCount As Int32 = 41
Dim myCommaCount As Int32 = 0
Dim myChar As Char
'myLineCount += 1
'Dts.Events.FireInformation(0, "Cleaning CSV", "Processing Line " + myLineCount.ToString(), "", 0, True)
Dim myLineChars As Char() = myLine.ToCharArray()
While myCount <= myLineChars.Length - 1 If myLineChars(myCount) = """" Then myCount = InStr(myCount + 2, myLine, """") Else If myLineChars(myCount) = Convert.ToChar(",") Then myCommaCount += 1 End If myCount += 1 End If End While Dim myCommas As String = "" If myColumnCount - myCommaCount > 0 Then
myCommas = New String(Convert.ToChar(","), myColumnCount - myCommaCount)
End If
myStreamWriter.WriteLine(myLine.Replace(vbCrLf, "") + myCommas)
myLine = myStreamReader.ReadLine
End While
Dts.TaskResult = Dts.Results.Success
Catch ex As Exception
Dts.Events.FireError(0, "Cleaning CSV", "Encountered Error " + ex.Message _
+ " While Processing Line " + myLine, "", 0)
Dts.TaskResult = Dts.Results.Failure
Finally
myStreamReader.Close()
myStreamReader.Dispose()
myStreamWriter.Flush()
myStreamWriter.Close()
myFileStream.Close()
End Try
End Sub
End Class
Originally, I put a call to DTS.Events.FireInformational to tell us how many records have completed processing. It turns out that calls to DTS.Events are very, very slow. I removed those lines and processing was much, much faster.
I recently imported a CSV file with very dirty data. It included fields that will stretch across multiple rows and fields with quotes in the data. After struggling with SSIS, I finally decided to write my own CSV importer. As I was beginning my project and doing CSV research, I stumbled upon this project, called "A Fast CSV Reader."
http://www.codeproject.com/cs/database/CsvReader.asp
I used this code, along with a SQLBulkCopy object from DotNet 2.0. It imported 9 million large rows of dirty data in just a few minutes. I was amazed. What cleaning it didn't do, I will do in a SPROC.
Here is the extent of the code I wrote myself:
using (LumenWorks.Framework.IO.Csv.CsvReader
myCSVReader = new LumenWorks.Framework.IO.Csv.CsvReader(
new StreamReader("c:\\Data\\Flintstone.csv", Encoding.ASCII), true,
Convert.ToChar(",")
, Convert.ToChar("\""), Convert.ToChar("\\"), Convert.ToChar("#"), true))
{
myCSVReader.DefaultParseErrorAction = ParseErrorAction.ThrowException;
myCSVReader.MissingFieldAction = MissingFieldAction.ReplaceByEmpty;
myCSVReader.SkipEmptyLines = true;
myCSVReader.SupportsMultiline = true;
using (SqlBulkCopy myBulkCopy = new SqlBulkCopy(SqlHelper.ConnectionString))
{
myBulkCopy.DestinationTableName = "Destination";
myBulkCopy.BulkCopyTimeout = 30;
myBulkCopy.BatchSize = 10000;
myBulkCopy.WriteToServer(myCSVReader);
}
}
MessageBox.Show("Import Success");
Saturday, September 22, 2007
Format dates and times in T-SQL
This function allows you to do date/time formatting just like you do it in C# or VB.NET.
I find it enormously useful. My code is littered with calls to it.
USE [ncu]
GO
/****** Object: UserDefinedFunction [dbo].[FormatDateTime] Script Date: 01/27/2011 17:31:15 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER FUNCTION [dbo].[FormatDateTime]
(@f_datetime DATETIME
, @f_format VARCHAR (60))
RETURNS VARCHAR (60)
AS
/*
||======================================================================
|| Purpose: Provide a more intuitive method of converting DATETIME
|| values to character. Only the parts specified within
|| the input format request are sent back, and any other
|| characters are returned in their respective places.
||
|| Here are the special-request symbols:
||
|| [YY] or [YYYY] -> year
|| [MM] or [M] -> month (numeric; 'M' means one digit
|| where possible)
|| [DD] or [D] -> day
|| [DWEEK] -> day of week
|| [DWK] -> 3-characters day of week abbreviation
|| [MON] -> 3-character month abbreviation
|| [MONTH] -> entire month name
|| [HH] or [H] -> hour
|| [24HH] or [24H] -> 24-hour clock
|| [mm] or [m] -> minute
|| [NN] or [N] -> minute
|| [SS] or [S] -> second
|| [AA] or [PP] -> 'AM' or 'PM'
|| [AM] or [PM] -> 'AM' or 'PM'
||
|| Every other character is considered part of the output.
||======================================================================
*/
BEGIN
/*
||====================================================================
|| Data declarations...
||====================================================================
*/
DECLARE @datetime VARCHAR (60)
, @symbol VARCHAR (7)
, @translation VARCHAR (9)
, @hhmmss CHAR (8)
, @year INT
, @month INT
, @day INT
, @hour INT
, @minute INT
, @second INT
, @pos INT
, @pos2 INT
, @len INT
/*
||====================================================================
|| Validations...
||====================================================================
*/
/*
||====================================================================
|| A NULL input earns a NULL output...
||====================================================================
*/
IF @f_datetime IS NULL
OR @f_format IS NULL
--THEN
GOTO FNC_EXIT
--END IF
/*
||====================================================================
|| End of validations...
||====================================================================
*/
/*
||====================================================================
|| Main logic...
||====================================================================
*/
/*
||====================================================================
|| Acquire the value of each of the date-time components in integer
|| form...
||====================================================================
*/
SELECT @hhmmss = CONVERT (VARCHAR, @f_datetime, 8)
, @datetime = @f_format
SELECT @year = YEAR (@f_datetime)
, @month = MONTH (@f_datetime)
, @day = DAY (@f_datetime)
, @hour = CONVERT (INT, LEFT (@hhmmss, 2))
, @minute = CONVERT (INT, SUBSTRING (@hhmmss, 4, 2))
, @second = CONVERT (INT, RIGHT (@hhmmss, 2))
/*
||====================================================================
|| For users' convenience, we allow small-case m to denote minutes
|| large-case M to denote month; or they can use the unintuitive
|| 'N' to denote minutes.
||
|| However, in order to support case-seensitivity even in a case-
|| insensitive environment, the code below locates each special symbol
|| within the specified format and replaces small-case '[m]' and
|| '[mm]' with '[N]' and '[NN]', or else all other small-case letters
|| with their large-case equivalents...
||====================================================================
*/
SELECT @pos = CHARINDEX ('[', @datetime)
WHILE @pos > 0
BEGIN
SELECT @pos2 = CHARINDEX (']', @datetime, @pos)
IF @pos2 = 0
--THEN
SELECT @pos = 0
ELSE
BEGIN
SELECT @len = @pos2 - @pos + 1
, @symbol = SUBSTRING (@datetime, @pos, @len)
IF CONVERT (VARBINARY (120), @symbol)
= CONVERT (VARBINARY (120), '[m]')
--THEN
SELECT @symbol = '[N]'
ELSE IF CONVERT (VARBINARY (120), @symbol)
= CONVERT (VARBINARY (120), '[mm]')
--THEN
SELECT @symbol = '[NN]'
ELSE
SELECT @symbol = UPPER (@symbol)
--END IF
SELECT @translation =
CASE @symbol
WHEN '[YYYY]' -- year
THEN CONVERT (CHAR (4), @year)
WHEN '[YY]' -- year
THEN RIGHT (CONVERT (CHAR (4), @year), 2)
WHEN '[MONTH]'
THEN UPPER (DATENAME (MM, @f_datetime))
WHEN '[MON]'
THEN LEFT (UPPER (DATENAME (MM, @f_datetime)), 3)
WHEN '[M]'
THEN CONVERT (VARCHAR, @month)
WHEN '[MM]'
THEN RIGHT ('0' + CONVERT (VARCHAR, @month), 2)
WHEN '[D]'
THEN CONVERT (VARCHAR, @day)
WHEN '[DD]'
THEN RIGHT ('0' + CONVERT (VARCHAR, @day), 2)
WHEN '[DWEEK]'
THEN UPPER (DATENAME (DW, @f_datetime))
WHEN '[DWK]'
THEN UPPER (LEFT (DATENAME (DW, @f_datetime), 3))
WHEN '[H]'
THEN
CASE
WHEN @hour BETWEEN 1 AND 12
THEN CONVERT (VARCHAR, @hour)
ELSE CONVERT (VARCHAR, ABS (@hour - 12))
END
WHEN '[HH]'
THEN
CASE
WHEN @hour BETWEEN 1 AND 12
THEN RIGHT ('0' + CONVERT (VARCHAR, @hour), 2)
ELSE RIGHT ('0' + CONVERT (VARCHAR, ABS (@hour - 12)), 2)
END
WHEN '[24H]'
THEN CONVERT (VARCHAR, @hour)
WHEN '[24HH]'
THEN RIGHT ('0' + CONVERT (VARCHAR, @hour), 2)
WHEN '[N]'
THEN CONVERT (VARCHAR, @minute)
WHEN '[NN]'
THEN RIGHT ('0' + CONVERT (VARCHAR, @minute), 2)
WHEN '[S]'
THEN CONVERT (VARCHAR, @second)
WHEN '[SS]'
THEN RIGHT ('0' + CONVERT (VARCHAR, @second), 2)
ELSE
CASE
WHEN @symbol IN ('[AM]', '[PM]', '[AA]', '[PP]')
THEN
CASE
WHEN @hour < 12
THEN 'AM'
ELSE 'PM'
END
ELSE @symbol
END
END
SELECT @datetime = STUFF (@datetime, @pos, @len, @translation)
, @pos = CHARINDEX ('[', @datetime, @pos + LEN (@translation))
END
--END IF
END
FNC_EXIT:
RETURN @datetime
END
Thursday, September 20, 2007
Do Your Foreign Key Columns Have Indices?
I wrote this script to tell you if you have indices on all of your foreign key columns:
SELECT o.[Name] AS TableName, c.[Name] AS ColumnName
FROM sysForeignKeys f
JOIN sysObjects o
ON f.fkeyID = o.[id]
JOIN sysColumns c
ON c.[ID] = f.fkeyID AND c.[colid] = f.fkey
LEFT OUTER JOIN sysIndexKeys i
ON f.[fkeyid] = i.[id] AND f.[fkey] = i.[colid] AND f.[keyno] = i.[keyno]
WHERE i.[id] IS NULL
Even if referential integrity is being enforced at the application layer (GASP!), you should still have indices created on those virtual foreign key columns. Unfortunately, that will have to be audited manually by the application developer, which pretty much means it will never happen.
Sunday, September 16, 2007
Hard Disk Configuration
Hard drives are often the source of performance bottlenecks. Any dime spent on a high-performing hard drives will not go to waste.
Disks are also the component that will most likely fail. Matt, the Microsoft blogger, claims that disks have a failure rate of 2% per year per disk. If that's the case, then an investment in RAID arrays would be very wise.
As far as specific drive components go, it's difficult to make a recommendation without knowing the purpose of the SQL Server.
Here's Matt's post:
http://blogs.msdn.com/matt_hollingsworth/archive/2007/04/02/mtbf-is-useless-how-to-decide-when-to-use-raid-for-database-high-availability.aspx
Friday, September 7, 2007
No More DBCC SHOWCONTIG
Pay attention to the fragmentation levels in the output. This is reporting more accurately than in SQL Server 2000.
Here's a Books Online link explaining the new function:
http://msdn2.microsoft.com/en-us/library/ms188917.aspx
And a link on index fragmentation:
http://msdn2.microsoft.com/en-us/library/ms189858.aspx
Wednesday, September 5, 2007
Dedicated Administrator Connection
That all changed with the DAC. Here's the Microsoft article on how to use it:
http://msdn2.microsoft.com/en-us/library/ms189595.aspx
Kalen Delaney (Inside SQL Server 2005) has this great query for determining the SPID of the DAC:
SELECT t2.session_id
FROM sys.tcp_endpoints as t1
JOIN sys.dm_exec_sessions as t2
ON t1.endpoint_id = t2.endpoint_id
WHERE t1.name='Dedicated Admin Connection'; GO
Tuesday, September 4, 2007
Question of the Week
Ready for the answer?
There is no real maximum, only a supported maximum. On the Enterprise Edition, Microsoft supports 50 instances in a non-clustered environment. Microsoft supports 25 instances on a cluster. Other instances of SQL Server have a supported maximum of 16.
Here a link:
http://www.microsoft.com/technet/technetmag/issues/2006/03/SQLQA/
Monday, September 3, 2007
32-bit vs 64-bit SQL
It's important to note that the 32-bit version of Windows can only address 4GB of memory. By default, it reserves 2GB for itself, however this can be reduced to 1GB by using an operating system switch. Memory beyond the 4GB addressed by the OS can be accessed by SQL Server indirectly using the PAE switch.
http://msdn2.microsoft.com/en-us/library/ms190673.aspx
The 64-bit version of SQL Server can address an incredible amount of memory directly, with no need of these work-arounds.
Saturday, September 1, 2007
Free SQL Server 2005
http://www.microsoft.com/technet/prodtechnol/eval/sqlsrv05/default.mspx