On Feb. 22, more than 73,000 AT&T customers in the U.S. reported a network outage lasting more than eight hours. AT&T quickly responded, suggesting customers use Wi-Fi calling, as reported by AP. On the same day, AT&T reassured customers that the outage was not the result of a cyberattack but rather a technical error.
The company’s response to this widespread outage offers lessons for organizations about how to communicate with internal and external stakeholders during and after a crisis and how to be prepared for potential technical hitches that could become major blockers to business.
What caused the AT&T outage?
On Feb. 22, AT&T wrote, “Based on our initial review, we believe that today’s outage was caused by the application and execution of an incorrect process used as we were expanding our network, not a cyber attack.”
AT&T communicated with the Cybersecurity and Infrastructure Communications Agency, the Federal Communications Commission, the Department of Homeland Security and the Federal Bureau of Investigation regarding the outage, which fed some rumors of a potential cyberattack. Communications is defined by CISA as a critical function.
SEE: CISA and IBM collaborated on a new cybersecurity certification course. (TechRepublic)
AT&T’s response to the outage shows effective communication
On Feb. 22, AT&T informed customers quickly what happened and why through its social media mobile app, website and virtual assistant. When the information was available, AT&T informed all stakeholders that the outage was not caused by a malicious actor. AT&T communicated to its individual customers, business customers and employees at the same time in this public letter dated Feb. 25 from CEO John Stankey.
Individual customers and small business customers impacted by the outage are eligible for a $5 credit, likely in the next billing cycle. Business customers are invited to discuss the situation: “We are also working closely with our Mid-Market and Enterprise customers and will address their concerns as those discussions take place,” according to Stankey’s letter.
Stankey explained the reasoning for the exact amount of the credit (“For that reason, I believe that crediting those customers for essentially a full day of service is the right thing to do.”) and apologized for the inconvenience. This transparency can help reduce the harm that could be caused by lost trust from customers in the wake of an organization-wide incident.
AT&T’s Communications, Marketing, Product and Operations teams worked closely together to coordinate sharing facts and updates, AT&T told TechRepublic. Those teams also kept customer service and retail teams up-to-date in case of customer calls and store visits involving the outage.
“In crisis, speed is everything,” said Jim Greer, AT&T spokesperson, in an email to TechRepublic. “We sought to put the customer first and moved quickly to get answers to them, along with employees, investors and regulators on what was a rapidly developing situation.”
What can IT in particular learn from the AT&T outage?
Human error happens to the best of us. There is a reason PEBCAK – “problem exists between chair and keyboard” – is an established acronym. Whatever went wrong with the network upgrade, it seems to have been part of the normal course of business.
The AT&T outage emphasizes the importance of testing backups, redundant systems and emergency preparedness plans. For cell carriers, alternate channels like Wi-Fi calling, satellite service or a carrier-agnostic SIM could be good backups in an emergency. These actions help reassure customers as well as put practical solutions in place. In addition, the AT&T outage is a good reminder to report incidents to the correct agencies as appropriate.
SEE: Carrier-agnostic SIM cards are among this year’s highlights from Mobile World Congress. (TechRepublic)
It’s important to keep software up to date and to generally modernize technology to support the resilience and security of organizations overall, but outages like this emphasize that IT and CISOs likely have a role in communicating well to external and internal stakeholders during an unexpected event. IT and cybersecurity leaders should be certain that their software supply chain practices are up to date in case of cascading problems or down-the-chain vulnerabilities, even when there is no malicious intent involved.
Even if IT leaders do not communicate directly with customers, they should have well-established channels of responsibility within their department for responding to and potentially publicizing problems that affect a lot of customers.