Network Monitoring/Ten Best Practices
From charlesreid1
Ten best practices for network monitoring: the short list:
- Establish baseline behavior
- Perform network inventory
- Avoid network alert sawtoothing/flapping
- Don't filter your email alerts
- Monitor deltas
- Provide details
- Escalation policy
- Parent-Child
- Event Correlation
- View traffic from application endpoint
Contents
Establish baseline behavior
Establishing a network baseline is important to establishing a sense of how the network performs normally. (Note that, to this end, Bro can be used for network baselining, even though it is designed as an intrusion detection system, not as a network monitoring tool.)
Perform network inventory
Keep an inventory of devices on the network:
- Network devices
- Ports
- Interfaces being used for network connections
- Network hardware (links, switches, controllers, power supplies)
- Servers
- Virtual machines
- SAN devices
If you don't know what's on your network, you can't monitor it very well!
Avoid network alert sawtoothing
Alert sawtoothing is where an element's numerical value hovers right around the threshold, causing the alert to be triggered multiple times. This is a sign the threshold needs to be changed.
Options:
- Once a single alert is triggered, silence that alert for a given window of time
- Add a delay before the alert is triggered
- Add a "state" to each alert, and don't re-trigger alerts until the state of the alert has been returned to normal
- Two-way communication with ticket system or alert management system
Don't filter your email alerts
This is sage advice - if you need to set a filter on your alert emails, it means they're happening too frequently. Alerts should land PLOP in the center of your inbox when they happen.
Monitor deltas
Rather than, or in addition to, implementing threshold alerts, you may also want to monitor deltas. For example, you might monitor disk usage and alert when it exceeds 90%, but you may also monitor disk usage and alert when it changes by more than X% over Y minutes.
Provide details
The alert is the entry point for identifying and responding to the problem, so make sure you provide enough detail with the alert to jump-start the troubleshooting process. Include details like:
- The machine the fault was detected on
- The machine that detected the fault (if different)
- Name of alert
- Duration of alert
- Link or reference to where current state of this element can be seen/monitored
Escalation policy
Sometimes an alert is triggered, but it doesn't go to the right person or the person who receives the alert is not equipped to solve the problem. There should be a policy in-place to determine the chain of command: who gets notified of what kind of alerts and when.
Parent Child
In the context of network monitoring, a parent-child relationship (set up manually for devices) tells the monitoring software what entities are related to what, and create a chain of authority for alerts.
For example, suppose that there is a router that connects a handful of servers running virtual appliances. In this case, the router is the "parent" and the servers and virtual appliances are the "children".
If that particular router goes down, everything else (servers, virtual appliances, etc.) will also go down. The alert system should be smart enough to identify that the real problem is with the parent, and not with any of the children. Alerts related to the children should be suppressed if there is an existing alert about the parent appliance.
Upstream verification is a process by which the network monitoring tool checks each upstream parent of a given device before the device is marked as down and an alert created.
Event Correlation
Once you've gathered a bunch of network data, it's important to utilize it! This leads to the much deeper dive of how you actually analyze your network data. The event correlation component of the network monitor should utilize multiple network alerts to identify patterns.
- On event X, look for event Y
- On event X, wait Y minutes and look for event Z
- If an event X occurs multiple times, suppress duplicate alerts
- Alert when the alert occurs X times
View traffic from application endpoint
Users don't care about network components, they care about whatever they're using the network to do. Measure network performance and traffic as close to the user endpoint as possible, and use techniques like packet inspection (?).
Flags
network monitoring tools and techniques for monitoring networks to avoid pain and suffering
Network Monitoring/Ten Best Practices
Network Monitoring Tools: Bro (network baselining): Bro Snort (IDS): Snort
Category:Network Monitoring · Category:Networking · Category:Linux Flags · Template:NetworkMonitoringFlag · e |
linux networking all the pages for linux networking
Diagnosing network interfaces: Linux/Network Interfaces Connecting to nodes with ssh: Linux/SSH Bridging networks with ssh tunnels: Linux/SSH Linux file server nfs/smb/sshfs: Linux/File Server Samba on linux: Linux/Samba Automounting network shares on linux: Linux/Automount Network Shares Monitoring system resources: Linux/System Monitoring Linux systemd: Linux/Systemd
IP Schema (ipcalc): Linux/IP Schema DHCP Server: Linux/DHCP DNS Server: Linux/DNS NTP Server: Linux/NTP
|