Tuesday, November 19, 2013

Simplify your on-call hours with automated response from SQL Sentry

For many DBAs and other IT professionals, on-call hours are part of the job.  Keeping critical servers and applications running 24/7 requires the ability to respond to problems at any time of day.  This usually means that a call or page to the on-call administrator when something does go wrong off-hours.  With SQL Sentry's extensive automated alerting and response capabilities, this isn't always necessary.  Let's take a look at how it works.

Automated response types:

There are four different responses that I'm going to focus on for this post: Execute Job, Execute Process, Execute SQL, and Run QuickTrace.  We will also look at using some notifications alongside them.  Let's start by examining the capabilites of each in detail.

Execute Job

The Execute Job action can be used to execute a SQL Agent job on the current server, or any other watched server in your enterprise. This action effectively enables you to create simple "one-to-one" job chains by associating it with Job Completed, Success, or Failure conditions.  SQL Sentry Event Manager users can also create more advanced job workflows using Event Chaining, but that's a topic for another post.

Execute Job is my favorite response type because I can wrap almost anything in a job, and the job itself is monitored by Event Manager.  This allows me to not only automate a response, but very easily configure a Send Email or Send Page notification in case of a failure in my automated response.

Execute Process

The Execute Process action can be used to execute any executable or batch file against any of your monitored servers.  This option does require xp_cmdshell to be enabled on the server, so this is not always an option.

Execute SQL

The Execute SQL action executes any T-SQL statement(s) against any of your monitored servers.  This option is very flexible, able to respond to pretty much anything you can anticipate and write a script or stored procedure to resolve.  The upside to this versus Execute Job is that you do not have to create an object on the server.  The downside is that it is a little bit more complicated to build a notification to respond to issues with your automated response. 

Run QuickTrace

Sometimes you have a recurring issue that some extra trace data would be helpful to resolve, but it is hard to predict when the issue will happen.  Run QuickTrace is for this kind of scenario, as it allows you to run a short-term trace in response to the event occurring.  This allows you to minimize the amount of time the trace has to run against the server, and the amount of data that needs to be sifted through to analyze the problem.

The Quick Trace runs for a short time, I usually set its duration for 10-30 seconds, and grabs everything running against the server at either the batch or statement event level.  You can also specify a rowcount limit for the trace, limiting the amount of data collected.

The QuickTrace is unfiltered and collects a broad range of data columns.  This is something we would typically think of as being fairly high-impact from a standpoint of overhead on the server,and certainly something we would be very resistant to firing off from Profiler as a client-side trace.  SQL Sentry has a number of different safeguards and optimizations to reduce that overhead.  I won't bore you with the details here, but they're all available in the SQL Sentry User Guide.

Configuring an example in SQL Sentry:

Let's take a look at one possible example using my demo environment. 

First, we need a problem to solve. For our example, I have built an automated response to handle a failure of my nightly full backup job on one of my monitored SQL Server instances.  I created a second backup job for my automated response, and it backs up to an alternate storage location.  This way, if my primary backup location runs out of space, or there is some other issue, we can retry to an alternate location.  In this case, I wouldn't disable the failed job email alert that I have configured globally, because I do want to know that my backup isn't going to be in its usual location.  Let's take a look at how I configured it:

Click to enlarge.  On the left, I've configured the Execute Job action on failure to start the alternate backup job.  On the right, I set a Send Page alert targeting both of my call groups and the help desk so a high-priority notification goes out if the alternate backup should fail as well.

This is just a simple example, but outlines one of the many ways you can have SQL Sentry automate response to events to save aggravation and ensure that problems are resolved quickly.