We received a feature request for SnapMirrorMetrics:
Since we often have lagtime false positives because complete transfers are still being finalized on the volume, we would like to have a switch for the SnapMirrorMetrics check: ‑‑
We are currently discussing two possible solutions:
The easy solution:
This switch would lead the check to complete with OK (or WARNING) if the lagtime has been exceeded and under the condition that the respective transfer has the status finalizing.
At this point we should remind ourselves of the meaning of finalizing. According to the SDK:
‚finalizing‘ – SnapMirror transfers are enabled, currently in the post-transfer phase for vault or extended data protection incremental transfers
The clean solution:
In contrast to the solution proposed above, this approach would extend the threshold for up to 12 hours, depending on the status of the transfers. This way we would prevent a transfer stuck in a endless finalizing loop of always returning OK and not being identified by the monitoring system.
An example to demonstrate its usage:
SnapMirrorMetrics ... --what=lag_time -w 1d -c 2d --finalizing_plus=12h
This example would result in a WARNING for a lagtime > 36h and a CRITICAL alarm after 2,5 days, even for a ‚finalizing‘ relationship status. For all other relationship states a WARNING will be triggered after a day, as was previously the case, and a CRITICAL after two days.
We are leaning more towards the second, cleaner approach. Certainly, preventing false positives is an important concern but it should never lead to false negatives, i.e. undetected errors, becoming a possibility, no matter how improbable their appearance. In the end, our goal is to fulfill the needs of our customers. If you are interested, leave us a comment below or send us an email to firstname.lastname@example.org.