A recent power failure in one of our data centers exposed a flaw in the way we configured alerts on our SQL servers. We have alerts configured to send emails for all critical and fatal events, plus a few other business related events. However, the email account used by SQL was also affected by the blackout and none of the emails were send until the next morning.
We have two data centers set up, with a number of SQL database servers as well as an Exchange server in each. So we were able to come up with a simple method of monitoring if a SQL instance was still running; set up an email account on each Exchange server and configure the SQL email profiles to use the account on the opposite server and use that account to send emails in cases where alert conditions are met.
However this only goes so far. Any power outage could also affect the network connectivity as well as the servers. So we took one more step.
On a SQL server in one location, we set up a job to run a query against a SQL server in the second location. If the query executes with no problems then the second SQL server is running. If the query fails then the job sends us an email. We set this up so all SQL server instances are being queried by a server in the opposite data center. This way we can at least tell if there is a connection issue in the case and then take appropriate steps.