Title
With SNMP only, ISP would lose key customer
An anomaly that SNMP monitoring couldn’t spot but flow-based analysis revealed root cause and helped ISP to retain key enterprise customer
Who
Companies: Any network operator such as provider of internet, communication services, web hosting, etc.
Roles: Network administrator
Use case: Beneš @ DobruskaNet
Situation
A key enterprise customer called ISP’s technical support complaining about latency issues when using Teams. The network administrator checked the router where the customer is connected together with hundreds of other customers. He analyzed latency data stored in Prometheus.
Screenshot: latency on 30s intervals on router
The latency graph revealed a periodicity of an anomaly that took 10 minutes. This repeated every hour.
As well, packets dropped, and cpu usage revealed a similar trend.
Challange
However, based on SNMP telemetry, the administrator wasn’t able to find out the root cause of an issue.
What to do now?
Situation
Network administrator looked into netflow data – traffic telemetry (link).
- ISP had netflow export in place on all CORE routers
- Netflow data streams were continually sent to the central collector with FLOWCUTTER software.
With help of FLOWCUTTER’s ability to easily perform a fast drill-down analysis of flow dataset, the administrator was able to find the root cause of an issue.
In addition to netflow data, periodical scan of open ports was set up in FLOWCUTTER. That helped to expose the first root cause of the anomaly.
Results
On the target router, there was an anomaly – traffic went down while talkers went up.
Drill-down analysis revealed that the anomaly is DNS related.
After that, the administrator checked the dashboard with results from the open ports scan from the previous night. It showed that another customer with public IP opened the DNS port to the public. That led to additional stress for the router influencing other customers in the same region.
There are more examples of what can be revealed within seconds about the customer:
- Upload/download
- Ports and protocols related to specific services: ftp, telnet, ssh
- IP is blacklisted
- Communication w/ botnet
- Open ports and vulnerabilities visible from outside
Resources
- Netflow analysis in Grafana
- Open ports scan
- SNMP vs Flow telemetry
Takeaway
- There are many root causes that cannot be revealed by analyzing SNMP-like telemetry. That’s where netflow data comes in handy. It helps by providing deeper insight into the source and destination of each traffic flow.
- In addition to SNMP and Netflow, it’s useful to correlate with other data sources – in this case open ports scan.
ISP resolved the issue with ease.
The second customer, where the root cause dwelled, was called, pointing to misconfiguration. The port was closed, anomalies stopped.
For the key enterprise customer, the latency issue was resolved helping ensure a good relationship.