Last Update: Sep 04, 2024 | Published: Apr 06, 2012
Let’s talk about the essential part of troubleshooting VLAN and switch problems. In this post, we’ll discuss common general switch issues, VLAN related issues, and spanning-tree issues. We’ll also cover VLAN/switch troubleshooting techniques. Later, in part two I’ll look further into a “No connectivity” issue.
One of the things to keep in mind is that there are some things that can just generally happen on a switch. One example is a physical or connectivity related issue.
Physical Interface/Connectivity Issues
Symptoms
Solutions
Physical Interface Speed/Duplex Issues
Other problems that can happen frequently across two interfaces are speed and duplex issues or mismatches. This can be particularly true if you have a gigabit connection on one side and a 10/100 on the other.
Symptoms
Solutions
VLAN-specific Issues
Symptom
Solutions
If something’s missing from that, add what you need. Some of the more automated trunking type mechanisms and similar stuff can create this type of issue if you don’t have it specifically set for access mode and the specific VLAN.
Another reason a VLAN could be down is because there’s no physical port associated with that particular VLAN. Now, with a Layer 3 switch, this typically doesn’t tend to be as big an issue. On Layer 2 switches, it can be.
Symptom
Solution
VLAN trunking issues
Symptom
Solutions
VLAN Trunking Protocol (VTP) issues
Symptom
Solutions
Inter-VLAN Routing Issues
Symptom
Solutions
802.1D Spanning Tree Issues
Symptom
Solutions
Another spanning tree issue is one that has something to do with Etherchannel.
Symptom
Solutions
Now that we’ve already taken up some common problems, here are some basic ideas on how to do troubleshooting on switches and VLANs.
Now it’s time to look at how this actually works in a simulated environment. We’re going to start by giving you a general background of some situation that could actually exist. Three Trouble Tickets will be involved here. You’ll get them from the system and use for troubleshooting and resolution purposes.
The three Trouble Tickets will be: Internet is Down, No Connectivity, and Network is Slow.
As we walk you through each step of the simulated troubleshooting process, we’ll present it in a way as if you’re the one doing the troubleshooting and that you’re doing it the way an expert would.
Here’s the basic layout. Let’s call it our Site 1 Topology:
It consists of a large campus with 300 employees spread across three separate buildings. The Internet connectivity is across the WAN. In other words, this campus environment is getting Internet access from another location.
There are two routers that provide redundancy both to the WAN and the Internet. See routers R1-1 and R1-2? Those two connect to the Wide Area Network.
Now here’s the situation.
Building 3, which is being serviced by R1-3, has been experiencing a number of service outages. Your role as the Tier 1 help desk technician on duty is to receive the trouble ticket, diagnose the issue, and ultimately resolve it.
You arrive at work to find a high-priority trouble ticket assigned to you, and it says the Internet is down. The problem has been going on for over an hour without any resolution. After some investigation, you discover that someone on the network team has made an undocumented configuration change.
Your task is to pick up the ticket, assign it to yourself, contact the requestor and inform that person that you are now actively working on the problem, and then of course proceed with troubleshooting and resolution.
Here’s what greets you the moment you arrive at work:
Now, while these messages may sound really harsh (see the last one), it’s just normal for tensions to run high if something isn’t working and a person’s job depended on it. So even if you don’t particularly like the way this person’s talking to you, you have to take all that into account.
Note in the upper-right corner of that last screenshot that the Status is Open and the Priority is High. The first thing you do is send the person a message assuring him/her that you are already working on the issue. After that, you proceed to your troubleshooting activity.
To begin troubleshooting, you bring up your console. Because R1-3 is the one experiencing problems, you right-click on it and select Telnet/SSH to device.
First, you check for connectivity. Since you got a Trouble Ticket from the manager indicating that although the Internet’s down, everything else seems to be working at least locally, you assume that the workstations are still able to reach you.
You proceed by issuing the command:
The first one, enclosed in a box marked #1, is something that would have required some deeper inspection. However, it’s not being used, so you skip it.
The second one (marked #2), on the other hand, is a bunch of LAN interfaces, and they’re Up. That means they’re working the way they should be. In other words, the Physical Layer is working.
Next, you execute the show interfaces command and see if everything’s working as expected. In the screenshot below, FastEthernet is showing Up/Up. That’s a good sign.
While you’re doing all this, you’re following a plan. Here’s the plan you drew up and filled out for this particular troubleshooting activity:
Next, you do show cdp neighbors.
Switch 1-3 (SW1-3) is the upstream switch, so you know that is functional. At this point, you think of ruling out both Layer 1 and Layer 2.
Next, you conduct some ping tests on VLAN1 (the Management VLAN) and VLAN11 (the Production VLAN).
Everything looks fine on the Management VLAN:
However, on the Production VLAN, you experience some problems:
You want to find out whether the upstream switch can be pinged, so you try to obtain the IP addresses by executing the show cdp neighbors detail command.
It’s not listing an IP address here, so you try pinging the switches.
Unlike Switch 1 and Switch 2, which are doing fine, Switch 3 is experiencing connectivity problems.
You try pinging the Internet, and still you can’t get outside on VLAN11. That can be the reason why the Internet is down.
So you’ve got successful connectivity on VLAN 1 to Router 1-1 and everything in between. However, you can’t get on VLAN11.
Another thing you consider looking into is routing. To check routing, you execute the command:
Seeing signs indicating that you may have a routing problem, you proceed to conduct further investigation by executing the show ip eigrp interfaces.
It reveals that you have zero peers even though you can get out on your VLAN1, which is the Management VLAN. The Production VLAN isn’t getting any routing. At this point, you cannot be sure but, judging from the way things are working, it would be logical to suspect a switch related problem and that the problem is not on this router.
When you do a show cdp neighbors, you see that the next upstream is Switch1-3, so you take a look at that next.
You again execute show cdp neighbors. That output includes Router 1-3 as well as an Etherchannel (Switch 1-2) across two interfaces, so you know that you’re looking at a Layer 2 connectivity.
Next, you execute show interfaces trunk. You notice that both Native VLAN properties of both the link back to the router (Fa0/1) and the port channel (Po4) that’s up to the next upstream switch, SW1-2, are matching. Everything appears to be in order here.
After that, you issue the show spanning-tree vlan 11 command. There you see your root port (Po4) and your designated port (Fa0/1).
So far, everything here appears to be functional, but because you want to make sure that all the necessary configurations have been carried out, you do a show vlan. The results show that both VLAN 1 and VLAN 11 have really been configured.
You then execute the command: show vtp status
It shows that the configuration has been successfully sent, the domain is correct, it’s operating in client mode, and there are 7 existing VLANs.
At this point, you eliminate Switch 1-3 from your list of possible culprits and proceed to Switch 1-2.
You try executing a show ip interface brief command. Everything looks good there.
Then you try show cdp neighbors. Same story there.
You also try a show spanning-tree vlan 11.
Still you see that everything’s functioning the way they’re supposed to.
To make sure the vlans are there, you issue the show vlan command.
VLAN1 and VLAN11, which are the ones that are critical, are there.
Next, you do a show vtp status.
Again, the information shown tells you that everything should be working properly, but that’s before you take a much closer look. Closer inspection reveals that some of the letters of the VTP Domain Name are in lower case.
That may not sound like a big deal but, to this switch, it may mean something different. Now you have what looks like a potential issue. Since everything else is working, you certainly would like to eliminate every possible cause, negligible as they may seem.
Having found a potential issue, you now conduct further inspection in that particular direction. You remember to make only one change at a time, knowing fully well that if you make multiple changes simultaneously, you would run the risk of not knowing which one actually worked.
The next thing you do is issue the configure terminal command, followed by vtp domain CCNP-TSHOOT.
You then go back to your Router 1-3 and ping 192.168.1.1, which was successful earlier, and 192.168.11.1, which wasn’t. Now, you find them both reachable.
You issue configuration terminal here and then execute logging on (just in case the logging got turned off), followed by show ip route.
Next, you do a show ip eigrp neighbors. Surprisingly, you still don’t see any neighbors even though you already have connectivity back up.
So you follow that with a show running-config to see if something’s out of order.
After scrolling down the results, you notice one particular interface with an error where IP authentication for eigrp has been put in place.
To take that out, you execute:
no ip authentication mode eigrp 100 md5
After that, things start coming back up.
You try show ip eigrp neighbors one more time. This time, you’re shown the three you were expecting.
You try pinging the Internet. It’s now back up as well.
At this point, you do a little analysis and put together the information you’ve been able to gather so far.
Since the problem has been resolved, you go back to the trouble ticket sent by the requestor, change the status to resolved, and put in necessary notes.
When you go back to the Home tab, you now see the number of Requests Overdue is already down to two.
Your day has just started and you still have two more trouble tickets to resolve. I’ll go over those in Part 2 of this post.