Using Microsoft System Center Operations Manager 2007 R2 (SCOM 2007) to monitor Exchange 2010 is a job unto itself. It takes a lot of time and knowledge of SCOM to effectively monitor your Exchange environment. This is due to SCOM’s architecture, but compounded by the Exchange 2010 correlation engine used to generate alerts (which MS has done away with in Exchange 2013). Anyway, one of the things SCOM doesn’t do is alert you in any way when a database has failed over to another node in a DAG. I can only speculate why this is the case, but I assume it has to do with a failover, assuming it works successfully, is a non-service-impacting event and that you will presumably have other monitoring in place via SCOM or a different tool to alert you to the root cause of the event.
This doesn’t preclude the fact that I want to know when a database has failed over to another node. To create an alert for this in Exchange 2010/2013/2016, you are limited to what Exchange puts in the event log in the first place. Things related to databases and DAGs are in the Crimson logs under Microsoft-Exchange-HighAvailability. The log to monitor for *over events is Operational. Event ID 306 is what is logged on the Primary Active Manager (PAM) server when a database is moved to another node. If you look at the description of the log, you see that it doesn’t contain anything more than the database name, the source and target servers, and any move comment:
The event description doesn’t distinguish between switchovers and failovers, so you might think there isn’t a way to be alerted only when a failover occurs. Exchange logs, however, much more data than what is shown in the description. Click on the Details tab and within the Friendly view, you will see what Exchange actually logs about the event:
You can see that Exchange does, in fact, log why the *over happening. It notes whether it is a failover (ActionInitiator is Automatic) or switchover (ActionInitiator is Admin) and why the move is happening (SystemShutdown, NodeDown, FailureItem, or Cmdlet). There are other properties that may be helpful for event log collection and/or alerting, too.
To have SCOM generate an alert for an event that indicates a database failover is starting, an alert rule needs to be created. In the SCOM console, go to the Authoring navigation pane, right-click on Rules (under Management Pack Objects) and select Create a new rule… Under Alert Generating Rules, then Event Based, select NT Event Log (Alert). Assuming you have created a custom management pack for Exchange customizations, select that in the drop-down at the bottom of the window, then click Next.
Enter a rule name and an optional description. Change the Rule Category to be Alert. In the Rule target, click the Select button, then type in Mailbox, select it in the results pane, then click OK. Setting the target to mailbox means that the alert will automatically apply to all mailbox servers, watching the appropriate event log on those servers, while not watching other servers where the event will never happen anyway. When done, the screen should look similar to this:
Click Next to proceed to the Event Log Type tab. In the Log name field, enter Microsoft-Exchange-HighAvailability/Operational. If you have access from the console to an Exchange server with the mailbox role installed, you can browse to it by clicking the ellipsis and changing to the server name, but you can manually enter the name. If entering it manually, however, it is important to enter it exactly as listed above. Then click Next.
In the Build Event Expression tab, enter 306 for the value of the first field (Event ID). The event source can be confusing, depending on where you are looking at it. For example, if using the native event viewer, the source will be displayed only as HighAvailability. The source, however, is Microsoft-Exchange-HighAvailability, which can been seen in the Provider Name under System of the Friendly View of the event’s details. So, enter this value for the Event Source parameter.
Since the alert should only fire for a database failover, not a switchover, you need to filter the event on another value. To add this, click the Insert drop-down and select Expression. Click the ellipsis next to the empty parameter name to bring up the Event Property dialog. If you look at the common event properties drop-down, nothing in there looks like it will work, so how do you restrict the alert to a value in one of the fields listed in the Details tab? This is where event log parameters come in.
The description of an event is built from a canned string in the event’s message file, often interspersed with one or more variables seen in the Details tab, known as parameters. Each event defines its own parameters, any of which can be used in the description. The fact that the database is failing over is noted in the parameter named ActionInitiator. Parameter names, however, are not used when being referenced; instead, the index number of the parameter in the UserData/EventXML collection. To determine which parameter number to use for ActionInitiator, use the Friendly View and count down the list of parameter names. Note that parameters are stored in a 1-based array, so start with 1 (not 0). ActionInitiator, therefore is in parameter 6.
Back in the Event Property dialog, click the radio button to specify a parameter to use and change the default value of 1 to 6, then click OK. The Parameter Name will now say Parameter 6. Set the Operator to Equals, and manually enter a Value of Automatic. The Build Event Expression tab should look like this:
On the Configure Alerts tab you can customize the how alert should manifest: priority, severity, the description, etc. Using the default event description in the alert description is probably fine, but you can opt to include one or more of the event parameters in the description. For example, you could have the alert include the ActionReason (parameter 7) so you know why the failover is occurring. Because this is an alert from a rule, as opposed to a monitor, it requires manual closure, which is a good thing in this case since the point is to be notified when a database fails over so you can investigate as needed.