Ike's Spikes: 2007

Thursday, December 6, 2007

SQL Server Support with VMWare

I have a client who is running SQL Server 2005 Standard Edition on a VMWare ESX operating system. I wondered if Microsoft supported this environment and did some digging. I had some difficulty finding current information, so once I found it, I thought I'd share it with you.

As of November 2007, Microsoft says they do not support third party software virtualization. They require that you reproduce the issue in a non-virtualized environment before they will help you. Here's the link confirming that:

http://support.microsoft.com/kb/897615

Sunday, November 25, 2007

Tracing Events Appropriately

When using SQL Server Profiler, it is generally a good idea to only trace COMPLETED events, instead of starting events.

Completed events have all necessary performance information. The starting events do not have important performance statistics because they haven't been collected yet. This will reduce the load of tracing and give you less data to filter out later.

Sunday, November 11, 2007

Performance Tuning and the Heisenberg Uncertainty Principle

Often when I visit a new client, they say, "We need you to start SQL Server Profiler and tell us why our server is so slow."

I'm reluctant to immediately resort to the SQL Server tracing utilities because of the performance load that they cause. Sometimes, I explain to the Heisenberg Uncertainty Principle:

http://en.wikipedia.org/wiki/Uncertainty_principle

and similarly the Observer Effect:

http://en.wikipedia.org/wiki/Observer_effect

Developed by Werner Heisenberg in 1927, this principal states that when you measure something, there is uncertainty caused by the measuring process. For instance, if we write our trace logs to a file and we're looking for disk contention, the tracing itself might be contributing to it. The uncertainty might also be caused by the measuring logic. We might be looking at excessive disk contention and determine that we need faster disks. But what if memory paging is causing the pages to be written to disk? In that case, our measuring logic was flawed, and we would need to add more RAM to resolve the performance bottlenecks.

So before I start SQL tracing, I like to use other tools first. Task Manager, System Monitor, Activity Monitor, and DMVs will tell us an awful lot about a server's performance before tracing is even needed. This allows me to observe the server's performance with an open mind, before using invasive tracing utilities.

Sunday, October 21, 2007

Max Column Length

This script will return all the columns in a table and the maximum length of all data in them.

DECLARE ColumnList CURSOR
FOR
SELECT c.[name] AS ColumnName
FROM sys.Columns c
JOIN sys.Tables t
ON c.Object_ID = t.Object_ID
WHERE t.Name = 'reyAASerWIP'
DECLARE @ColumnName varchar(30)
OPEN ColumnList
FETCH NEXT FROM ColumnList INTO @ColumnName
WHILE @@FETCH_STATUS = 0
BEGIN
PRINT @ColumnName
EXECUTE('SELECT MAX(LEN(RTRIM(' + @ColumnName+ '))) AS ' + @ColumnName + ' FROM reyAASerWIP ')
FETCH NEXT FROM ColumnList INTO @ColumnName
END
CLOSE ColumnList
DEALLOCATE ColumnList

Sunday, October 14, 2007

Importing CSVs with SSIS

The CSV import process is much more difficult with SSIS than it was for DTS. Scanning the MS newsgroups it looks like many of the problems will be corrected in SQL Server 2008. That's a long ways off, so let's look at some of the issues and how we can solve them.

Our CSV looks like this:

1- Name, Address, PhoneNumber, InvoiceAmount
2- "Fred Flintstone", "365 Stone Ave, BedRock, StoneAge", "928-555-1212", "25.00"
3- "Barney Rubble, "152 Granite Street, BedRock, StoneAge", "928.555.6326"
4- "Mister Slate", "52 Diamond Road", BedRock, StoneAge", "928.555.5332", "14.00"

The first problem is that there are commas enclosed by quotation marks in the Address field. SSIS will put BedRock as the PhoneNumber and StoneAge as the InvoiceAmount. This is very easy to solve. In the properties window of the Flat File Connection Manager, you will see a textbox labeled "Text qualifier." Just put a quotation mark in that field and it will see that the address column contains commas.

The next issue is that record #3 does not have the appropriate number of columns. It is one column short. DTS would handle this by inserting NULLs in the missing fields and then moving on to the next record. SSIS chokes on this. It will attempt to put "Mister Slate" in the InvoiceAmount field of record #3.

To solve this issue, I wrote the following script. I had to write it in VB.NET, since SSIS does not allow C# quite yet.

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime
Imports System.IO
Imports System.Text
Public Class ScriptMain
Public Sub Main()
Dim myLineCount As Int32 = 0
Dim myFilePath As String = Dts.Connections("FlintstoneCSV").ConnectionString
Dim myNewFilePath As String = Dts.Connections("NewFlintstoneCSV").ConnectionString
Dim myNewFile As FileInfo = New FileInfo(myNewFilePath)
Dim myNewFileStream As FileStream = myNewFile.Create()
Dim myFileStream As FileStream = New FileStream(myFilePath, FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.ReadWrite)
Dim myStreamReader As New StreamReader(myFileStream)
Dim myStreamWriter As StreamWriter = New StreamWriter(myNewFileStream, Encoding.ASCII)
Dim myLine As String
Try
myLine = myStreamReader.ReadLine
While Not myStreamReader.EndOfStream
Dim myOffSet As Int32 = 1
Dim myBeginIndex As Int32
Dim myEndIndex As Int32
Dim myCount As Int32 = 0
Dim myColumnCount As Int32 = 41
Dim myCommaCount As Int32 = 0
Dim myChar As Char
'myLineCount += 1
'Dts.Events.FireInformation(0, "Cleaning CSV", "Processing Line " + myLineCount.ToString(), "", 0, True)
Dim myLineChars As Char() = myLine.ToCharArray()
While myCount <= myLineChars.Length - 1 If myLineChars(myCount) = """" Then myCount = InStr(myCount + 2, myLine, """") Else If myLineChars(myCount) = Convert.ToChar(",") Then myCommaCount += 1 End If myCount += 1 End If End While Dim myCommas As String = "" If myColumnCount - myCommaCount > 0 Then
myCommas = New String(Convert.ToChar(","), myColumnCount - myCommaCount)
End If
myStreamWriter.WriteLine(myLine.Replace(vbCrLf, "") + myCommas)
myLine = myStreamReader.ReadLine
End While

Dts.TaskResult = Dts.Results.Success
Catch ex As Exception
Dts.Events.FireError(0, "Cleaning CSV", "Encountered Error " + ex.Message _
+ " While Processing Line " + myLine, "", 0)
Dts.TaskResult = Dts.Results.Failure
Finally
myStreamReader.Close()
myStreamReader.Dispose()
myStreamWriter.Flush()
myStreamWriter.Close()
myFileStream.Close()
End Try
End Sub
End Class

Originally, I put a call to DTS.Events.FireInformational to tell us how many records have completed processing. It turns out that calls to DTS.Events are very, very slow. I removed those lines and processing was much, much faster.

I recently imported a CSV file with very dirty data. It included fields that will stretch across multiple rows and fields with quotes in the data. After struggling with SSIS, I finally decided to write my own CSV importer. As I was beginning my project and doing CSV research, I stumbled upon this project, called "A Fast CSV Reader."

http://www.codeproject.com/cs/database/CsvReader.asp

I used this code, along with a SQLBulkCopy object from DotNet 2.0. It imported 9 million large rows of dirty data in just a few minutes. I was amazed. What cleaning it didn't do, I will do in a SPROC.

Here is the extent of the code I wrote myself:

using (LumenWorks.Framework.IO.Csv.CsvReader
myCSVReader = new LumenWorks.Framework.IO.Csv.CsvReader(
new StreamReader("c:\\Data\\Flintstone.csv", Encoding.ASCII), true,
Convert.ToChar(",")
, Convert.ToChar("\""), Convert.ToChar("\\"), Convert.ToChar("#"), true))
{
myCSVReader.DefaultParseErrorAction = ParseErrorAction.ThrowException;
myCSVReader.MissingFieldAction = MissingFieldAction.ReplaceByEmpty;
myCSVReader.SkipEmptyLines = true;
myCSVReader.SupportsMultiline = true;
using (SqlBulkCopy myBulkCopy = new SqlBulkCopy(SqlHelper.ConnectionString))
{
myBulkCopy.DestinationTableName = "Destination";
myBulkCopy.BulkCopyTimeout = 30;
myBulkCopy.BatchSize = 10000;
myBulkCopy.WriteToServer(myCSVReader);
}
}
MessageBox.Show("Import Success");

Saturday, September 22, 2007

Format dates and times in T-SQL

I stole this script off of the internet a few years ago. I wish I could give the author credit for it, but I can't remember who he is. If anyone knows, let me know and I'll add him to this post.

This function allows you to do date/time formatting just like you do it in C# or VB.NET.

I find it enormously useful. My code is littered with calls to it.


USE [ncu]
GO
/****** Object:  UserDefinedFunction [dbo].[FormatDateTime]    Script Date: 01/27/2011 17:31:15 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER FUNCTION [dbo].[FormatDateTime]
                      (@f_datetime DATETIME
                     , @f_format   VARCHAR (60))
RETURNS VARCHAR (60)
AS
/*
||======================================================================
|| Purpose:  Provide a more intuitive method of converting DATETIME
||           values to character.  Only the parts specified within
||           the input format request are sent back, and any other
||           characters are returned in their respective places.
||
||           Here are the special-request symbols:
||
||           [YY] or [YYYY]   -> year
||           [MM] or [M]      -> month (numeric; 'M' means one digit
||                                      where possible)
||           [DD] or [D]      -> day
||           [DWEEK]          -> day of week
||           [DWK]            -> 3-characters day of week abbreviation
||           [MON]            -> 3-character month abbreviation
||           [MONTH]          -> entire month name
||           [HH] or [H]      -> hour
||           [24HH] or [24H]  -> 24-hour clock
||           [mm] or [m]      -> minute
||           [NN] or [N]      -> minute
||           [SS] or [S]      -> second
||           [AA] or [PP]     -> 'AM' or 'PM'
||           [AM] or [PM]     -> 'AM' or 'PM'
||
||           Every other character is considered part of the output.
||======================================================================
*/
BEGIN
  /*
  ||====================================================================
  || Data declarations...
  ||====================================================================
  */
  DECLARE @datetime    VARCHAR (60)
        , @symbol      VARCHAR (7)
        , @translation VARCHAR (9)
        , @hhmmss      CHAR (8)
        , @year        INT
        , @month       INT
        , @day         INT
        , @hour        INT
        , @minute      INT
        , @second      INT
        , @pos         INT
        , @pos2        INT
        , @len         INT

  /*
  ||====================================================================
  || Validations...
  ||====================================================================
  */
  /*
  ||====================================================================
  || A NULL input earns a NULL output...
  ||====================================================================
  */
  IF @f_datetime IS NULL
  OR @f_format   IS NULL
--THEN
    GOTO FNC_EXIT
--END IF

  /*
  ||====================================================================
  || End of validations...
  ||====================================================================
  */
  /*
  ||====================================================================
  || Main logic...
  ||====================================================================
  */
  /*
  ||====================================================================
  || Acquire the value of each of the date-time components in integer
  || form...
  ||====================================================================
  */
  SELECT @hhmmss   = CONVERT (VARCHAR, @f_datetime, 8)
       , @datetime = @f_format
  SELECT @year     = YEAR  (@f_datetime)
       , @month    = MONTH (@f_datetime)
       , @day      = DAY   (@f_datetime)
       , @hour     = CONVERT (INT, LEFT (@hhmmss, 2))
       , @minute   = CONVERT (INT, SUBSTRING (@hhmmss, 4, 2))
       , @second   = CONVERT (INT, RIGHT (@hhmmss, 2))

  /*
  ||====================================================================
  || For users' convenience, we allow small-case m to denote minutes
  || large-case M to denote month; or they can use the unintuitive
  || 'N' to denote minutes.
  ||
  || However, in order to support case-seensitivity even in a case-
  || insensitive environment, the code below locates each special symbol
  || within the specified format and replaces small-case '[m]' and
  || '[mm]' with '[N]' and '[NN]', or else all other small-case letters
  || with their large-case equivalents...
  ||====================================================================
  */
  SELECT @pos = CHARINDEX ('[', @datetime)

  WHILE @pos > 0
  BEGIN
    SELECT @pos2 = CHARINDEX (']', @datetime, @pos)

    IF @pos2 = 0
  --THEN
      SELECT @pos = 0
    ELSE
      BEGIN
        SELECT @len    = @pos2 - @pos + 1
             , @symbol = SUBSTRING (@datetime, @pos, @len)

        IF CONVERT (VARBINARY (120), @symbol)
         = CONVERT (VARBINARY (120), '[m]')
      --THEN
          SELECT @symbol = '[N]'

        ELSE IF CONVERT (VARBINARY (120), @symbol)
              = CONVERT (VARBINARY (120), '[mm]')
      --THEN
          SELECT @symbol = '[NN]'
        ELSE
          SELECT @symbol = UPPER (@symbol)
      --END IF

        SELECT @translation =
          CASE @symbol
            WHEN '[YYYY]' -- year
            THEN CONVERT (CHAR (4), @year)
            WHEN '[YY]'   -- year
            THEN RIGHT (CONVERT (CHAR (4), @year), 2)
            WHEN '[MONTH]'
            THEN UPPER (DATENAME (MM, @f_datetime))
            WHEN '[MON]'
            THEN LEFT (UPPER (DATENAME (MM, @f_datetime)), 3)
            WHEN '[M]'
            THEN CONVERT (VARCHAR, @month)
            WHEN '[MM]'
            THEN RIGHT ('0' + CONVERT (VARCHAR, @month), 2)
            WHEN '[D]'
            THEN CONVERT (VARCHAR, @day)
            WHEN '[DD]'
            THEN RIGHT ('0' + CONVERT (VARCHAR, @day), 2)
            WHEN '[DWEEK]'
            THEN UPPER (DATENAME (DW, @f_datetime))
            WHEN '[DWK]'
            THEN UPPER (LEFT (DATENAME (DW, @f_datetime), 3))
            WHEN '[H]'
            THEN
              CASE
                WHEN @hour BETWEEN 1 AND 12
                THEN CONVERT (VARCHAR, @hour)
                ELSE CONVERT (VARCHAR, ABS (@hour - 12))
              END
            WHEN '[HH]'
            THEN
              CASE
                WHEN @hour BETWEEN 1 AND 12
                THEN RIGHT ('0' + CONVERT (VARCHAR, @hour), 2)
                ELSE RIGHT ('0' + CONVERT (VARCHAR, ABS (@hour - 12)), 2)
              END
            WHEN '[24H]'
            THEN CONVERT (VARCHAR, @hour)
            WHEN '[24HH]'
            THEN RIGHT ('0' + CONVERT (VARCHAR, @hour), 2)
            WHEN '[N]'
            THEN CONVERT (VARCHAR, @minute)
            WHEN '[NN]'
            THEN RIGHT ('0' + CONVERT (VARCHAR, @minute), 2)
            WHEN '[S]'
            THEN CONVERT (VARCHAR, @second)
            WHEN '[SS]'
            THEN RIGHT ('0' + CONVERT (VARCHAR, @second), 2)  
            ELSE
              CASE
                WHEN @symbol IN ('[AM]', '[PM]', '[AA]', '[PP]')
                THEN
                  CASE
                    WHEN @hour < 12
                    THEN 'AM'
                    ELSE 'PM'
                  END
                ELSE @symbol
              END
          END

        SELECT @datetime = STUFF (@datetime, @pos, @len, @translation)
             , @pos      = CHARINDEX ('[', @datetime, @pos + LEN (@translation))
      END
  --END IF
  END

FNC_EXIT:
  RETURN @datetime
END

Thursday, September 20, 2007

Do Your Foreign Key Columns Have Indices?

When I do a database audit, I always check for the existence of foreign keys that enforce referential integrity. That is often a great sign on whether or not a data-centric developer has been involved in the project. Once foreign keys have been created, indices must manually be configured, for SQL Server does not create them automatically.

I wrote this script to tell you if you have indices on all of your foreign key columns:

SELECT o.[Name] AS TableName, c.[Name] AS ColumnName
FROM sysForeignKeys f
JOIN sysObjects o
ON f.fkeyID = o.[id]
JOIN sysColumns c
ON c.[ID] = f.fkeyID AND c.[colid] = f.fkey
LEFT OUTER JOIN sysIndexKeys i
ON f.[fkeyid] = i.[id] AND f.[fkey] = i.[colid] AND f.[keyno] = i.[keyno]
WHERE i.[id] IS NULL

Even if referential integrity is being enforced at the application layer (GASP!), you should still have indices created on those virtual foreign key columns. Unfortunately, that will have to be audited manually by the application developer, which pretty much means it will never happen.

Sunday, September 16, 2007

Hard Disk Configuration

When asked where hardware dollars are best spent, I always recommend a great disk subsystem.

Hard drives are often the source of performance bottlenecks. Any dime spent on a high-performing hard drives will not go to waste.

Disks are also the component that will most likely fail. Matt, the Microsoft blogger, claims that disks have a failure rate of 2% per year per disk. If that's the case, then an investment in RAID arrays would be very wise.

As far as specific drive components go, it's difficult to make a recommendation without knowing the purpose of the SQL Server.

Here's Matt's post:

http://blogs.msdn.com/matt_hollingsworth/archive/2007/04/02/mtbf-is-useless-how-to-decide-when-to-use-raid-for-database-high-availability.aspx

Friday, September 7, 2007

No More DBCC SHOWCONTIG

In SQL Server 2005, DBCC SHOWCONTIG has been replaced with the function sys.dm_db_index_physical_stats. Use this function if you need to diagnose fragmentation and page usage issues.

Pay attention to the fragmentation levels in the output. This is reporting more accurately than in SQL Server 2000.

Here's a Books Online link explaining the new function:
http://msdn2.microsoft.com/en-us/library/ms188917.aspx

And a link on index fragmentation:
http://msdn2.microsoft.com/en-us/library/ms189858.aspx

Wednesday, September 5, 2007

Dedicated Administrator Connection

In previous version of SQL, it was possible for the server to tie up so many resources, that one more connection was no longer possible. For instance, you could have one process that was blocking all the others, and have no way to open the SQL Server tools to kill it.

That all changed with the DAC. Here's the Microsoft article on how to use it:

http://msdn2.microsoft.com/en-us/library/ms189595.aspx

Kalen Delaney (Inside SQL Server 2005) has this great query for determining the SPID of the DAC:

SELECT t2.session_id
FROM sys.tcp_endpoints as t1
JOIN sys.dm_exec_sessions as t2
ON t1.endpoint_id = t2.endpoint_id
WHERE t1.name='Dedicated Admin Connection'; GO

Tuesday, September 4, 2007

Question of the Week

What is the maximum number of SQL Server instances that can be loaded on one physical server?

Ready for the answer?

There is no real maximum, only a supported maximum. On the Enterprise Edition, Microsoft supports 50 instances in a non-clustered environment. Microsoft supports 25 instances on a cluster. Other instances of SQL Server have a supported maximum of 16.

Here a link:

http://www.microsoft.com/technet/technetmag/issues/2006/03/SQLQA/

Monday, September 3, 2007

32-bit vs 64-bit SQL

When it comes to hardware, most SQL Servers I have seen are over-powered. Memory and CPU are most often underuntilized. So when I'm asked whether a customer needs the 64-bit version of SQL server, I think it comes down to one question. Do you need more than 3GB of memory addressed?

It's important to note that the 32-bit version of Windows can only address 4GB of memory. By default, it reserves 2GB for itself, however this can be reduced to 1GB by using an operating system switch. Memory beyond the 4GB addressed by the OS can be accessed by SQL Server indirectly using the PAE switch.

http://msdn2.microsoft.com/en-us/library/ms190673.aspx

The 64-bit version of SQL Server can address an incredible amount of memory directly, with no need of these work-arounds.

Saturday, September 1, 2007

Free SQL Server 2005

Just a friendly reminder that you can evaluate SQL Server 2005 for free for 120 days. This is an excellent way to learn the features of the product before purchasing it. It is the full Enterprise edition. You can download it here:

http://www.microsoft.com/technet/prodtechnol/eval/sqlsrv05/default.mspx

Bio

Ike Ellis is a Microsoft Data Platform MVP and Friend of Redgate. He's first earned his MVP in 2011 and has been renewed every year since then.

He specializes in Power BI, DAX, T-SQL, Python, R, Machine Learning SSRS, SSIS, and SSAS Tabular. He has a strong interest in Microsoft Azure Data products, including CosmosDB, Azure Data Lake Analytics, and Azure SQL DW. Mostly, he likes helping organizations wrangle their data, prepare it, make it fast and accurate, and visualize it in a way that makes a deep impact on business users and decision makers.

He's spoken at PASS Summit (2014, 2015, 2017), Redgate SQL in the City (2012, 2013,Save 2015, 2018), SQL Bits (2018), and TechEd Europe (2012). He wrote the data portion of Developing Azure Solutions in 2015. The second edition was released in 2018.