How call center can deal faster w/ difficult customers

How call center can deal faster w/ difficult customers

Title

Complaining customer handled with ease with context from traffic telemetry in ISP’s fingertips

Situation

Most calls to the support line of an operator are easy to deal with (missed payment, etc.). However few calls generate the majority of effort and time of the support team, especially in case of technical support calls.

One of those recurrently complaining customers called his ISP support line. John, as always, complained that the internet doesn’t work and he needed it for work, e.g. online meetings (via Teams/Meet/Zoom).

Challange

A technical support person is not always an over-paid network admin. To rule out operator faults when “internet’s not working”, is not trivial and consumes time.

How to speed up dealing with such calls?

Solution

The goal is to help technical support personnel to get customer’s traffic context easily and fast?

  • ISP had to collect and store netflow – traffic telemetry including NAT IP address translation to see individual customers behavior – a perfect job for  FLOWCUTTER collector.
  • With FLOWCUTTER, an administrator can provide user-friendly dashboard to support team
  • Upon calling he/she input customer’s IP into dashboard box and within second can see and understand basic behavior of the calling customer

Results

From the dashboard, even less technical guy/gall can determine that an issue is not operator’s but on the customer’s side. For example he/she can give answers such:

  1. Not working? I can see a lot of traffic passing down your line from Tiktok (AS13869). Maybe ask your daughter whether she is secretly watching videos in her room and not doing her homework.
2. Not working? But I can see a big upload/download with China. Are you sure that the camera system you bought from Aliexpress is secure?

There are more examples of what can be revealed within seconds about the customer:

  • Upload/download
  • Ports and protocols related to specific services: ftp, telnet, ssh
  • IP is blacklisted
  • Communication w/ botnet 
  • Open ports and vulnerabilities visible from outside

Resources

  • Netflow analysis in Grafana – “single host IP” dashboard
  • SNMP vs Flow telemetry
  • IP reputation
  • AS and country of traffic origin
  • Flows w/ NAT IP address translation to see individual customers behavior 

Takeaway

ISP support line can be overwhelmed by calls regarding technical issues. The first step is to rule out mistakes on the customer’s side, where the operator cannot influence things.

This is where FLOWCUTTER can help technical support personnel by providing customer’s traffic context.

    1. Provide user-friendly dashboard to support team
    2. Upon calling operator can see and understand basic behavior of the calling customer
How infected modem could quietly block /22 prefix

How infected modem could quietly block /22 prefix

Title

Malware in just one customer’s device almost ruined the whole prefix reputation potentially causing problems to all ISP’s customers.

Situation

Operator provider internet to both enterprise and home customers. Some of the home connections could pay extra for public IP, for example when having a camera security system at home and want to check home safety from the work. One of those home modem/router got infected by malware. Consequently the device was included in the botnet.

In the case of this botnet, the goal of the week was to scan devices around the internet for possible open telnet ports, and then try to infect them with the latest load of possible attacks that take advantage of vulnerabilities.

Challange

Such an attacking device quickly ends up on public blacklists. This influences just the device with 1 IP address. So far so good.

What can easily happen later is for the whole prefix (in this case /22) to be backlisted on IP reputation. Potentially peering partners start to challenge the operator of the AS (Autonomous System) and demand correcting the issue.

At this point a small anomaly on one modem causes a lot of damage. Amount of work to be done week later is enormous in comparison to correcting issue right at the beginning.

So it’s “no brainer”, we have to spot such anomalies, right? 

Not so fast. Normally such an anomaly flies under the radar, undetected, if an ISP relies just on SNMP (e.g. Zabbix, Nagios). Administrators usually can’t detect it. Routers aren’t aware of it, as it does not tax hardware or ends up in many bytes and packets travelling around the network.

What to do?

Solution

First of all, an operator should use flow-based traffic analysis, so that he/she can catch this anomaly. 

Fortunately, in this case ISP had some measures installed:

  • ISP had netflow export from perimeter routers 
  • Netflow was stored in the central collector with FLOWCUTTER software.
  • With FLOWCUTTER, any admin can easily do a fast drill-down analysis of netflow and other data sources.

A quick morning look at the overview (Home dashboard) in FLOWCUTTER with just a few metrics revealed a trend shift in the number of talkers (distinct communication source-destination IP pairs).

    Fast drill-down analysis revealed that anomaly is situated on one particular IP (home customer with public IP). 

    Fast drill-down analysis revealed that anomaly is situated on one particular IP (home customer with public IP). 

    It took just a few hours to this IP being backlisted on IP reputation lists.

    Results

    What if admin don’t want to look at FLOWCUTTER every single day?

    For that purpose, FLOWCUTTER helps in two ways: 

    1. to set up “out of the box” detection of various network anomalies – including Telnet,
    2. Enrich Netflow data by IP reputation, checking and alerting on when any of your IPs is blacklisted.

      Resources

      • Netflow analysis in Grafana
      • SNMP vs Flow telemetry
      • IP reputation
      • Flow-based Anomaly detection

      Takeaway

      ISP detected a telnet anomaly early, and so was able to prevent cascade of bad outcomes.

      1. Some misconfigurations and infected endpoints can result in damaging operator’s IP prefix or AS reputation.
      2. Within flow-based troubleshooting, these anomalies can be spotted and corrected early when they don’t create havoc within the network

      For the future, ISP used FLOWCUTTER’s ability to monitor and alert on network anomalies as well as regularly check reputation its IP range. And next time be alerted even faster.

        How undetected DDoS cut town from internet

        How undetected DDoS cut town from internet

        Title

        Outgoing DDoS stopped before making havoc by saturating uplink for the whole network segment

        Situation

        Network administrator was notified by an alert that one particular radio uplink becomes saturated in the middle of the day. This network segment represented a remote town. Consequently all homes and institutions in there experienced severe connection problems.

        Challange

        SNMP data showed the saturation of uplink in regular intervals. But nothing apparent was going on. Behind this link ISP didn’t have detailed information per customer, hence, the network administrator was able to find out whether there is one particular endpoint responsible for the upload.

        What to do?

        Solution

        Network administrator analyzed anomaly using netflow data – traffic telemetry.

        • ISP had netflow export on the perimeter 
        • Netflow was stored in the central collector with FLOWCUTTER software.
        • With FLOWCUTTER, any admin can easily do a fast drill-down analysis of netflow and other data sources.
        • In addition, FLOWCUTTER allows to set up “out of the box” detection of various volumetric DDoS attacks

        That helped identify the nature of the DDoS attack and revealed information used to effectively mitigate it.

          Results

          The traffic analysis in FLOWCUTTER showed the nature of the anomaly. By plotting Top N source ports by number of flows, the administrator correctly identified that responses to DNS queries were contributing most to the uplink saturation. Incoming DNS packets stayed hidden in the overall volume of traffic, but the outgoing DNS responses were clearly visible.

            Focusing on just DNS responses meant easy operation (filtering src.port=53 only) that took only 2 seconds. This is where FLOWCUTTER excels above all competitive solutions.

            The next step was to compare behavior before and during the anomaly. It showed that normally there were 1.000 talkers, but suddenly raised to 23.000 corresponding to the distributed nature of the attack.

            Even though the administrator wasn’t able to find a particular end customer responsible, because there were many customers hidden behind each public IPv4 address, he identified this reflexive DDoS using DNS. 

            Moreover, analysis revealed information (combination of Src/Dest. ports = 53/24335) that he could use to mitigate the attack using BGP FlowSpec (RFC 5575). Use of FlowSpec is similar to BGP Remotely trigger black hole, but it provides more granular control (ports, packet length, TCP flags, ICMP code, …) and shape these flows. 

            Resources

            • Netflow analysis in Grafana
            • Open ports scan
            • SNMP vs Flow telemetry

            Takeaway

            ISP detected uplink saturation.

            1. Within netflow, the administrator find root cause in outgoing reflexive DDoS attack using DNS.
            2. He was able to mitigate attack using BGP FlowSpec

             

            For the future, ISP used FLOWCUTTER’s ability to monitor and alert on traffic anomalies – admin can set up “out of the box” detection of various volumetric DDoS attacks. And next time be alerted even faster.

              How to lower a price of Data retention for ISP

              How to lower a price of Data retention for ISP

              Title

              What helped ISP to comply with national data retention policy with low costs within complex network of 130+ sites with NAT translation

              Situation

              Most countries oblige its internet service providers to collect forensic data on who is communicating with whom. In essence it usually translates to netflow data. Mandatory data retention policy for operators ranges from 6 months to 5 years depending on the country.

               

              What is the challenge for an operator to comply with national policy?

              Challange

              There are several challenges, all of them were the case for this operator:

              • In order to identify the customer responsible for communication, an operator has to consider NAT translation – there can be many private IP addresses behind one public IP address.
              • Architecture differs a lot among operators – some operator quite diverse network with many pop in many regions. In this case, there were individual isolated 130+ sites, each with its own NAT router.
              • Routers can be of various vendors and firmware versions.
              • Amount of traffic for small ISP is about 10K flows per second, up to 1M flows/s for large operators.
              • Storing such a traffic’s netflow data is demanding.

              Solution

              ISP implemented following solution to comply Data retention policy:

              • Operator setup export of (private traffic) netflow from 130+ routers
              • In addition FLOWCUTTER probes were installed at perimeter to monitor public traffic
              • Netflow data streams were continually sent to the central collector with FLOWCUTTER software.
              • FLOWCUTTER supports all the incoming flow format NF5/8/IPFIX

                Results

                The project was successfully delivered and all the expectations were met. Correct sizing of the project made sure ISP would be able to store the data for a necessary period for another 3 to 5 years.

                In comparison to the competing projects, the solution including FLOWCUTTER results in 75% price tag decrease due to its versatility and broad compatibility because the operator was able to leverage its own resources and not buy the whole complete solution from scratch.



                  Resources

                  • Network probes
                  • Flow formats & compatibility
                  • Hardware appliance
                  • Flows compression ratio

                  Takeaway

                  The project was successfully delivered, expectations were met.

                  In addition, the total price tag was lowered by 75%.

                    With SNMP only, ISP would lose key customer

                    With SNMP only, ISP would lose key customer

                    Title

                    An anomaly that SNMP monitoring couldn’t spot but flow-based analysis revealed root cause and helped ISP to retain key enterprise customer.

                    Who

                    Companies: Any network operator such as provider of internet, communication services, web hosting, etc.

                    Roles: Network administrator

                    Use case: Beneš @ DobruskaNet

                    Situation

                    A key enterprise customer called ISP’s technical support complaining about latency issues when using Teams. The network administrator checked the router where the customer is connected together with hundreds of other customers. He analyzed latency data stored in Prometheus.

                    Screenshot: latency on 30s intervals on router

                    The latency graph revealed a periodicity of an anomaly that took 10 minutes. This repeated every hour.

                    As well, packets dropped, and cpu usage revealed a similar trend.

                    Challange

                    However, based on SNMP telemetry, the administrator wasn’t able to find out the root cause of an issue.

                    What to do now?

                    Solution

                    Network administrator looked into netflow data – traffic telemetry (link).

                    • ISP had netflow export in place on all CORE routers
                    • Netflow data streams were continually sent to the central collector with FLOWCUTTER software.

                    With help of FLOWCUTTER’s ability to easily perform a fast drill-down analysis of flow dataset, the administrator was able to find the root cause of an issue.

                    In addition to netflow data, periodical scan of open ports was set up in FLOWCUTTER. That helped to expose the first root cause of the anomaly.

                      Results

                      On the target router, there was an anomaly – traffic went down while talkers went up.

                        Drill-down analysis revealed that the anomaly is DNS related.

                        After that, the administrator checked the dashboard with results from the open ports scan from the previous night. It showed that another customer with public IP opened the DNS port to the public. That led to additional stress for the router influencing other customers in the same region.

                        There are more examples of what can be revealed within seconds about the customer:

                        • Upload/download
                        • Ports and protocols related to specific services: ftp, telnet, ssh
                        • IP is blacklisted
                        • Communication w/ botnet 
                        • Open ports and vulnerabilities visible from outside

                        Resources

                        • Netflow analysis in Grafana
                        • Open ports scan
                        • SNMP vs Flow telemetry

                        Takeaway

                        1. There are many root causes that cannot be revealed by analyzing SNMP-like telemetry. That’s where netflow data comes in handy. It helps by providing deeper insight into the source and destination of each traffic flow.
                        2. In addition to SNMP and Netflow, it’s useful to correlate with other data sources – in this case open ports scan.

                        ISP resolved the issue with ease. 

                        The second customer, where the root cause dwelled, was called, pointing to misconfiguration. The port was closed, anomalies stopped.

                        For the key enterprise customer, the latency issue was resolved helping ensure a good relationship.