Upcoming Webinar
Cayosoft Shows How Forest Recovery Tech Addresses Survey Issues
Join this webinar and not only learn about what your peers are doing, but also learn about a new patent-pending modern approach to AD forest recovery from Cayosoft.
New episode!
The Scoop on Loop: The Latest Innovations Directly From Microsoft!
Darrell Webster speaks to Rebecca Keys, a Program Manager from the Microsoft Loop team.
Turn Data Protection into an MRR Growth Engine
Security for your customers, revenue for your MSP practice. Access key insights on how to scale your practice with SkyKick’s new eGuide.
MSP eGuide to Package, Sell and Deliver M365 Security Services
Based on input from our top-performing MSPs, SkyKick prepared the MSP Guide to help you navigate the security landscape.

Troubleshooting VLAN Switch Problems: No Connectivity

Having already solved your first trouble ticket for the day, which basically was an “Internet is down” issue (including common switch issues, VLAN-related issues, and spanning-tree issues), you’re now ready to tackle your second trouble ticket. This time, we’re looking into a “No connectivity” issue.

(Instructional video below provides a walkthrough of the steps contained in this article.)
Just like the first one, you pick up the ticket, assign it to yourself, contact the requester to inform him/her that you are already actively working on the problem, and perform troubleshooting and resolution.

Here’s a screenshot of this particular request:

You start by going back to some of the devices you were looking at earlier. After some initial investigation, you learn that users aren’t even getting any DHCP addresses.

As always, you begin troubleshooting at the Physical Layer. You execute a show ip interface brief to get a quick summary, and you see that everything is doing well from there.

You follow that up with a command that will give you more detailed information:

show interfaces

All relevant devices are still showing Up/Up.

Just to make sure you eliminate any potential problem, you issue the show controllers fastEthernet 0/0 command. The results tell you that nothing is really wrong there.

For instance, there are no collisions…

… it is autonegotiated …

… and so on.

At this point, you presume that the problem is not physical.

You then execute the show cdp neighbors command. You see SW1-3, so you conclude that Layer 2 is still working.

You follow that up with a ping to 192.168.1.1.

No connectivity there.

Same with VLAN11.

Since all pings are failing, you proceed to execute:

show ip eigrp interfaces

You’re not seeing anything, despite that all your interfaces are configured correctly and routing is working.

You execute show ip eigrp neighbors.

Everything does seem to be down.

You follow that with show ip route, and you only get its local routes.

You then decide to see whether the problem is a little more widespread. You go to Router 1-1 to see if there’s a problem there.

You start by issuing the show ip interface brief command, which then shows that everything is up.

You again execute show ip route. This time, unlike in R1-3, you get all sorts of routes.

You try to see whether the Internet is working from here.

It is.

You then try pinging 192.168.1.2. You have connectivity there. You also ping 192.168.1.3, which you know isn’t working.

You then proceed to ping Switch 1-1 (192.168.1.111), Switch 1-2 (192.168.1.112), and Switch 1-3 (192.168.1.113) because these are all on the path. Among the three, Switch 3 is where you’re not getting any connection.

Your initial observations tell you that something between Switch 1-2 and Switch 1-3 isn’t working correctly.

You go to Switch 1-3 and issue the show cdp neighbors command. You see downstream but not upstream. This confirms your suspicion that the problem is between Switch 1-2 and Switch 1-3.

Next, you do show spanning-tree, and you notice that it’s not showing anything for spanning-tree.

You ping 192.168.1.1, but you can’t reach it from here.

You go to Switch 1-2 and ping 192.168.1.1 again. You also ping 192.168.11.1. They are both reachable from here.

When you do a show cdp neighbors, you notice that Switch 1-3 is missing.

You execute show spanning-tree. Everything is working as expected.

Next, you execute the show interfaces trunk command. You see no problem there either.

Taking into account all your observations up to this point, you get the idea that the problem may really be on Switch 1-2.

You go now to Switch 1-3 and issue the command show vtp status. No problem there either.

You finally decide to do a configuration check. To do that, you execute the show running-config command.

When the results come out, two things catch your attention. First, the lines that say channel-group 4 mode on tells you that this is part of the upstream etherchannel trunk, but then the line that says switchport access vlan 111 tells you that access vlan has been configured.

This will not work as a trunk the way that it’s supposed to. The problem is really getting clearer now.

You execute configure terminal, and then interface range fa0/23 – 24, which is a very helpful command because it allows you to configure multiple interfaces at once. You then do switchport mode trunk. The results indicate that your port channel is back up.

You go back to Router 1-3 and issue the show cdp neighbors command.

Follow that up with a show ip eigrp interfaces command.

After that, you execute show ip route again. This time, you get all sorts of information.

You ping 192.168.11.1.

Success.

Next, 216.145.1.2.

Success again.

You therefore conclude everything is really back up.

Just like in the first ticket, you summarize everything in a sort of root cause analysis.

The fault was identified on SW1-3.
The fault was isolated to Layer 2, specifically in the VLAN and trunking configuration.
The fault was caused by a misconfigured port mode (a human error).
It was resolved by restoring the original trunking mode configuration on the switch port.

After changing the status of the request to Resolved, you go back to the Home tab. You’re relieved to see only one more situation left to troubleshoot.

This last situation is somehow different than the other two in that the network is not hard down, but there is a performance issue.

As you look at the request on the ticketing system, you see a message that doesn’t sound as urgent as the other two. Still, it’s your job to find a solution for the problem.

The closest point to the problem is Router 1-3, so you start your troubleshooting work there.

You’ve already established that everything, from a workstation point, is working. You know there’s connectivity to the Internet and to the workstations. There’s basically full reachability, so you know that it’s not any kind of a routing issue.

The first command you issue is show interface brief. You find all pertinent interfaces up and running.

Next, you check out Layer 2 by issuing the show cdp neighbors command. That is working as well.

You follow that with show ip eigrp interfaces and then show ip eigrp neighbors. Both results still show nothing wrong.

Same with show ip route.

But then when you try a ping test, the result is slower than what you expected.

Still the question remains – where is the performance issue?

You try a similar ping test from Router 1-1. You get the same slow results.

So you know that there’s something between Router 1-1, all the way back through the switch fabric, to Router 1-3. The problem could be at Router 1-3 or in the switching fabric.

You do a show running-configuration, and start looking for anything that would limit traffic such as any filtering, access lists, or anything of that sort.

But then you see that everything is completely in order in this device, so at this point, you eliminate it as a possible cause of the problem.

You go to Switch 1-3 and issue a ping to 216.145.1.2. It appears to be unreachable.

To find out why, you execute a show run. There, you notice that the VLAN interface is shut down, which explains why you could not ping out.

You issue a no shut. Sure enough, after that you are able to ping out again.

It is still slower than what you were expecting, so you proceed by issuing a show cdp neighbors. You got your etherchannel link substream to the next. Again, everything appears to be in order from here.

You continue with a show spanning-tree. You see that the port channel is up and running, and the other portions of the result aren’t showing anything unusual either.

You move on to Switch 1-2 and begin with a show cdp neighbors. You find your two connections upstream to Switch 1-1, so it really looks like your system should be doing well.

Again you do a show spanning-tree. As soon as you get to VLAN 11, you notice something peculiar. When you review the results of your show cdp neighbors command, you notice that the etherchannel configurations are in groups of two.

Then when you look at the show spanning-tree results, you see the items listed individually.

To see if something is awry, you execute a show running-config.

The idea is that, if there’s etherchannel, double the bandwidth between all the switches but up to this one; this becomes the constrained bandwidth area.

You see the two interfaces:

When you scroll down, you see that there is no channel group. The channel group configuration is missing. You presume, for some reason, that it has been deleted.

To start solving that problem, you execute the following commands:

interface range fastEthernet 0/1 – 2
channel-group 1 mode on

Sure enough, it brings the port channel up.

You then go back to Switch 1-1 and issue the following commands:

interface gi0/15
channel-group 1 mode on
interface gi0/17
channel-group 1 mode on

After you do a show spanning-tree, you observe that the port channel goes to a “learning” state and then goes back up.

You see port channel 1.

When you go back to Router 1-3 and do your ping again, you see a significant improvement in the round-trip time. This makes you conclude that the problem was that the etherchannel had been deleted between the switches, and that degraded the bandwidth.

In summarizing everything into your root cause analysis, you take note of the following:

The fault was identified on Devices SW1-1 and SW1-2.
The fault was isolated to Layer 2, Etherchannel.
The fault was caused by deleted configuration, although you’re not sure how that happened.
The problem was resolved by restoring Etherchannel functionality with the channel-group 1 mode on command.

Conclusion

In this series, we covered VLAN/switch troubleshooting techniques. This concludes this two-part article.