CISCO-OUTAGE-MONITOR-MIB
This MIB module describes, stores, and reports outage related information generated by individual hardware and software components comprising a router. They includes physical interfaces, logical interfaces, physical entities such as chassis, slots, modules, ports, power supplies, and fan, and software processes running on each card. The outage related information comprises entity or interface up and down, process start and abnormal or normal termination, unusually high CPU utilization caused, for example, by Denial of Service (DoS) attack, etc. The outage related event can be collected by means of Cisco's fault manager, Cisco's event manager, and syslog messages, etc. An outage manager maintains the following outage data for each component. 1) Accumulated Outage Time (AOT): total outage time of a component from the beginning of the measurement. 2) Number of Accumulated Failure (NAF): total number of failure instance from the beginning of the measurement. Using aforementioned outage data, users can calculate and correlate to derive availability information as follows: 1) Component Availability: the probability that a component will operate when needed. = (1 - (Outage Time / Duration)) 2) DPM (Defects per Million): measure(s) of the defects of the system that have an immediate impact on the end-user. = ([AOTi / (Measurement Interval)] x 10**6) 3) MTTR(Mean Time To Recovery): expected average time to restore a failed component. = (AOTi / NAFi) 4) MTBF (Mean Time Between Failure): expected average time between failures of a component. = (T2 - T1) / NAFi 5) MTTF (Mean Time To Failure): the mean time to failure once the device starts working. = (MTBFi - MTTRi = (T2 - T1 - AOTi) / NAFi) measurement 1 |<----------------->| | | System Up @------|-------------------|----> Time T1 T2 At the beginning T1: tmp_AOT = AOTi (from the MIB polling); LastPollingTime = Current Time(T1); At the end T2: Duration = Current Time(T2) - LastPollingTime(T1); Outage Time = AOTj - tmp_AOT;