LAST Friday’s Internet outages in Japan and Thailand were reportedly severe enough that Japan’s Internal Affairs and Communications Ministries requested clarification from carriers as to the cause of the disruption.
It became apparent then that the problem stemmed from Google, who accidentally published a Border Gateway Protocol (BGP) advertisement, causing almost all traffic from Japanese carriers NTT and KDDI and Thailand’s Jastel to be wrongly routed to Google.
Why is that a problem, and how does this type of mistake have an impact so serious it invokes governmental involvement?
Historians of the Internet will be aware the network was developed by academic institutions’ research specialists, often under contracts awarded by government bodies, such as the US Department of Defense.
As such, the Internet was built on a certain amount of reliance on old-fashioned “gentlemanly” agreement that guaranteed good conduct and cooperation.
One such protocol (literally) is BGP and its advertisement. Essentially, a BGP advertiser (let’s say the owner of network AS 1234 – Google in this case) announces to other networks if traffic needs to get to a different network (call it AS 9876), it can be sent to safely to network AS 1234, and it will be correctly passed on to its destination, or next step towards the destination.
In Friday’s case, Google’s BGP advertisement announced an effective untruth: Google is not a supplier of “transit services” – that is, it doesn’t play the role in the Internet of a packet router/forwarder. Traffic thus routed to Google, therefore, was either dropped or denied by Google’s own internal access control lists (ACLs).
BGP, unfortunately, relies on a certain amount of trust in the advertiser – not such a problem when most backbone-level systems administrators knew each other by name – but trust is more difficult to manage on the scale required these days.
On the whole, however, the system works, but negotiations are taking place to refine Internet management so that each player filters their own routing announcements to catch mistakes, among other proposed steps (including anti-DDoS measures that would involve a network monitoring its own outbound packets).
Such proposals, such as the Mutually Agreed Norms for Routing Security or MANRS (see link), will prove key in ensuring safe Internet uptime in the future.
Today’s massive reliance on a working, efficient and safe Internet places a lot of trust, literally, in systems that are now outdated. The Internet’s integrity comprises of many parties that are not happy, understandably, to bear the cost of proofing and ensuring that traffic routing remains kosher.
In Friday’s case, the problem was righted after some 23 minutes, but the knock-on effects lasted several hours. If a simple error on the part of a single player such as Google (motto, “Don’t be evil”) can cause massive disruption, more malign forces have the ability to wreak serious havoc.
Like the founding fathers of the Internet and their reliance on “proper” conduct, we all rely on the success of good manners, or rather – in the future – good MANRS.