LAB REPORT SUNYIT NETWORK OUTAGE

1
LAB REPORT
SUNYIT NETWORK OUTAGE
By
Shantanu Chaurasia
(Graduate Student)
NCS
2
INDEX
Introduction..................................................................................................................3
VTP...............................................................................................................................3
Lab Topology................................................................................................................8
Incident.........................................................................................................................9
Mistakes......................................................................................................................10
Countermeasures.........................................................................................................11
3
Introduction
As we all know that human element is the greatest source of error and I am no more exception
to it. This is the lab report of the sunyit's network outage on Friday,18 April 2014. To
understand exactly what happened we first have to go through a few basic concepts of CISCO
IOS.
VLAN Trunking Protocol (VTP): VTP is the proprietary protocol of cisco. It is used for the
reduction of the administrative work in a layer 2 switched network. VTP has three modes of
operation – Server, Client, Transparent.
•
Server mode: If the switch is set to VTP server mode then switch has the capability to
create, delete or modify VLAN's and can also specify configuration parameters like VTP
pruning and VTP version for the whole VTP domain. VLAN configuration is advertised by
VTP servers in the same VTP domain as its own and VLAN configuration is synchronized
across all the switches in the same VTP domain. The default mode of Cisco switches is VTP
sever mode.
•
Client: The behavior of VTP client is almost same as VTP servers the only difference is that
it won’t allow to create, modify or delete VLANs in the entire domain or even on the same
switch.
•
Transparent: In this mode the switches won't involve in VTP. Switches in this mode won't
advertise their VLAN configuration nor do they synchronize with other switches based on
VLAN advertisement. Transparent mode switches just transfer the VTP advertisement they
receive on their trunk port but don't make any changes on their own.
•
Off (only for CatOS switches): In this mode neither switch gets synchronized and nor it
advertises the configuration changes.
Thus, configuring the switch on VTP server mode makes it possible to distribute VLAN’s
across all the switches in that domain. Due to this there is no need to configure same VLAN’s
on every switch and hence it saves a lot of time and manual work needed otherwise.
4
To make our understanding more clear about VTP let’s consider a scenario where a company
has five floors in an office. Each switch is assigned a floor for the purpose of easy management
and each switch can also be assigned with different VLANs. For example managers or admins
may be at any floor and still they can access any VLAN 5. The Engineering employees can be
on any floor and should be able to access VLAN 6. In this type of network topology the main
advantage is that each person is not bound by physical location for access to its own vlan.
Now in this scenario, if we want to add another department VLAN, then without VTP we have
to physically access every switch and then add vlan there. If the company is large scale it can
have many floors and many more switches so it is very daunting task to access each switch
physically and configure it. But with the help of VTP it can be done within minutes without any
need to access every switch physically.
5
For the switches to exchange information about VLAN with each other they must be in same
VTP domain. Switches belonging to same domain only can share information related to vlans
across the whole domain. Whenever any change is made in any vlan database of any switch it is
transferred across the entire VTP domain through VTP advertisements.
To maintain proper domain consistency, generally only one switch should be able to create,
modify or delete VLAN's. For this only one switch should be set to Server mode from whome
we can make changes in entire domain. This switch servers as “master” switch for the entire
domain.
Rest all other switches should be in client or transparent mode. The switches in client or
transparent mode acts as slaves. Switches in client mode will change their own vlan database
according to the information received from server. They can also pass on the same information
to other directly connected switches also. While in the transparent mode switches can pass on
the information to other directly connected switches but wont update their own vlan database.
6
Each of the VTP advertisement has a revision number associated with it. This number is very
important and is used for determining that weather the advertisement which is received is more
recent as compared with the current version of that particular switch. Every time any changes
are made in vlan's revision number is increased by one so more higher the revision number
more updated information is that.
For example in the scenario above when for the first time the main switch will send a VTP
advertisement, the revision number can be 1. When a new vlan is added to this switch, revision
number get incremented by 1 so the new revision number of the main switch will now become
2. The main switch will now send updated information to all of the client switch. As soon as the
client switch receives the information first of all they will check the domain name, if the
domain name matches then they will check the revision number. The client switches was
having the revision number 1 from earlier update and now they will compare their current
revision number with that of advertised revision number. So as 2>1 so they will get updated
with the new vlan database.
The important thing here is that when switches receive the latest revision number, they delete
their earlier vlan database and copy the latest information to their database.
So in the scenario above, we can set the main switch in server mode and rest other switches in
client mode so that every time we have to make any changes in vlan, we can do it at one place
and same will be propagated across all the switches and therefore a lot of time and manual
work can be saved by using VTP.
The detailed information of VTP on a Cisco switch can be obtained by typing the command
#show vtp status
7
VTP version: It shows the current version of VTP running on the switch.
Configuration Revision: It displays the current revision number of the switch.
Maximum VLAN's supported locally: It shows the total number of vlans which are supported
by the switch locally
Number of existing VLANs: Shows the total number of VLANs functional on the switch.
VTP operation mode: Displays the current VTP operation mode of the switch.
VTP domain name: It shows the current name of the domain for the switch.
VTP pruning mode: It displays weather VTP pruning is disabled or enabled on the switch.
VTP V2 Mode: It shows weather VTP version 2 is disabled or enabled. It is set to disable by
default
VTP Traps generation: It shows if the traps are sent to management station. Traps are the alert
messages which are generated by SNMP agents
MD5 digest: It is a checksum value of configuration. It is 16-byte long
Configuration Last Modifier: It is used to display the ip address of the last modifier of VLAN
configuration
8
Lab topology (DH1240)
In lab DH1240 generally the topology for each pod is like shown below in diagram
For the lab DH1240, Cisco's 3560 switch is the Internet provider, rest all other equipments in
the lab has to be connected to this switch for accessing Internet. Behind Cisco's 3560 switch is
the Cisco's 1900 router, then Cisco's 2960 Switch and then Pod PC.
For Pod PC, IP address is 172.16.P.X where P is the pod number and X is the host number.
Subnet mask is 255.255.255.0 which is a Class C subnet mask and hence we can fit 254 hosts
or pod pc's in one pod but we only need one. The default gateway is set to routers internal
gigabyte interface. Domain name server is set to 150.156.192.16 which is fang. For router, its
outside interface (facing toward main switch) is given ip address 10.103.5.X/16 where X
represents the pod number. Each pod consists a Cisco 1900 router, Cisco 2960 switch and a pod
pc.
9
Incident on Friday 18 April 2014
Now we have seen all the basic concepts needed to fully understand the incident. After
finishing every lab students save their configuration file to Atlantis server which is 10.103.0.25.
So it is easy for instructor to keep track on all the completed and uncompleted labs of students
and for the students also it is easy to retain their old configuration files from Atlantis server. I
was working on my project in the lab in which I needed old configuration file of the switch on
my pod. So generally in order to get the configuration file back from the Atlantis server to the
pod switch we follow the network diagram above I.e first we connect pod router to the uplink
(Main switch of lab) through patch panel and then configure the router with correct ip
addresses. Then we connect the pod's switch behind the pod's router and then pod pc. We
configure all of these correctly and test the connectivity till Atlantis server by pinging
10.103.0.25. After all the things are configured correctly we can copy our old configuration to
either pod's router or pod's switch from Atlantis server. On Friday, I only needed the
configuration file of the switch on my pod. So I thought connecting the pod's switch directly to
the uplink port on the patch panel. The uplink port on the patch panel is connected to main
3560 switch of the lab. So I connected pod's switch to the uplink port on patch panel. After the
switch was connected, 900 VLAN's suddenly appeared from nowhere on the pod's switch. Lets
see what exactly happened here and how so many VLAN's appeared in switch.
As I connected my pod's switch to the uplink, I used switch's Gi0/0 port to connect to the
uplink. This port was set as trunk port because in our previous lab we set it as trunk port for
working with router on stick. Not just the port was set to trunk port but the VTP mode of the
pod switch's was also set to server mode. With all of these settings a trunk was formed between
pod's switch and the main 3560 switch of the lab.
10
Earlier pod's switch revision number was 11 and main switch's revision number was 306, as
soon as trunked was formed both of the switch compared their revision number. As the revision
number of labs main switch was greater than revision number of pod's switch (306>11) so the
VLAN database of pod's switch got updated and VLAN's were copied from main switch to
pod's switch. Now the pod's switch was on revision number 306.
I was surprised to see 900 VLAN's in my switch and thought of they are from startup
configuration. Without giving it a second thought I deleted all the VLAN's. As soon as I delete
all the vlan's the revision number on pod's switch got updated, it get incremented by one so
now the revision of pod's switch was 307 and the revision number of main switch was 306 and
hence according to VTP labs main switch updated itself with new revision number 307.
According to new revision number 307 there are no VLAN's, so labs main switch also deleted
all of his VLAN's and similarly all the connected switches of the network updated themselves
with new revision number which says there is no VLAN. The beauty of VTP is that it
propagates within a blink of an eye so all the 900 VLAN's across the whole network got
vanished and sunyit's network went down completely. Switch ports on all of the switches across
the network turned amber in color because the VLAN on which they were present was erased
completely.
This incident was really unfortunate, the whole network was down and employees started
taking the day off, students couldn't complete their labs, system admins were trying hard to
figure out what the problem was. After two and half hours of inspecting everything throughly
they figure out the IP address from which the switches got updated last time as switches show
this message if we see VTP status “Configuration last modified by
172.16.1.99” . This was the IP address which modified all the switches and in the lab we
were using this addressing scheme of 172.16.P.X. Here Pod number is 1 and 99 is one host on
that Pod. So it was pretty clear by now that Pod 1's host has updated the configuration. They
finally restored the VLAN database from their backup and network started working as expected
again.
There were a few mistakes and flaws in the configuration which were revealed by this accident.
This would not have happened if :
•
First of all the ports of main 3560 switch should be in access mode. They were in
desirable mode. The ports in desirable mode can form a trunk automatically with any
other switch if the other switch port is set as trunk port. The same thing happened here
the pod's switch port was set as trunk and main switch's port was in desirable mode and
hence they formed a trunk. Had the ports of main switch be in access mode, trunk
wouldn't have formed and this incident would never have happened. We can set the ports
to access mode by going into the interface mode and typing the command
switchport mode access
11
•
The Pod's switch VTP mode should not be set to server mode, they should be in either
client or transparent mode. Had they been in client or transparent mode, switch would
not have the capability to delete all the vlans and update the revision number. We can set
the VTP mode to client or transparent by the command
vtp mode client
•
OR
vtp mode transparent
The name of VTP domain on the pod's switch matched with the VTP domain name of
main switch. This was surprising co-incidence. VTP domain name is a security feature
for preventing exactly the same situation that happened. If the domain name of two
switches are different they will never form a trunk between them. To form a trunk the
domain name should be exactly same in both the switches. Domain name is case
sensitive also we have to take care of upper case and lower case too for the formation of
trunk. But that day by chance I changed the domain name to SUNYIT and main switch's
domain name was also SUNYIT and hence trunk was formed between them. Domain
name can be changed in global configuration mode by the command
vtp domain domain_name
Steps taken to avoid this situation in future:
•
All the ports of the main 3560 switch were set in access mode. So now they wont form
trunk with any other switch.
•
VTP passwords are set so that exchange of VTP information can only take place after
entering a valid password. Password can be set by using following command in global
configuration mode
vtp password password
•
VTP pruning: VTP pruning stops the vlans which are allowed to propagate through the
trunk. It can be disabled or we can put restrictions on VLANs that shall be propagate
through the trunk.
•
Educate People: Students should be educated to use VTP mode client or transparent as
long as possible. Server mode should be used with caution. And last but not the least
before taking any unusual action like deleting all vlan's, think twice.
12
Cisco's recommendation is to set VTP mode to transparent which passes the configuration
information across the switches but wont change the configuration of itself. But when the
number of switches in a network grow it became a back breaking task as it involves a lot of
manual work by physically accessing the switch every time in order to make changes. So there
is a trade off between security and amount of work done.