As most experienced network engineers know, there are a number of different methods that people use to troubleshoot problems on a network (or systems in general). Now determining which one is “better” is very subjective and can end up being a bit like having a political conversation with other engineers. This article takes a look at a number of the common troubleshooting techniques; these techniques are vital to understand for those candidates looking to obtain the Cisco certified Network Professional (CCNP) and other higher level certifications. The CCNP TSHOOT exam is one of the required exams that must be taken to achieve the CCNP; this exam requires a knowledge base including the concepts discussed in this article.
The Top-Down approach takes advantage of the hierarchy of the Open Systems Interconnection (OSI) model. As most network engineers are drilled in both the structure of the OSI and TCP/IP models, basing a troubleshooting model from them makes sense and tends to be very “natural” to most trained engineers. The Top-Down model as the name indicates takes a look first at the application layer (OSI model) and then works down based on whether a problem has been found. This model tends to be used when troubleshooting apparent application problems on specific computers. An example of using the Top-Down approach would be to first look at the application being used when the trouble is happening and determining whether it is causing the reported problem, if not, continue to work down the layers until the physical connection is verified.
The Bottoms-Up approach uses the same OSI model as a basis but simply takes a look at the physical layers first. Obviously, to really use this model some amount of physical access is required (for example to see if the cable is plugged in or connected correctly). Sometimes this is possible, sometimes not. The problem that an engineer would have if using this model first would be if the problem really ends up existing at the application layer, as allot of time would have been spent working through all of the other layers first. A simple example of this approach would involve verifying the cabling into a device is connected first, then moving to the data link layer and so on; if no problem was found the last step would involve looking at the specific application being used to determine if it is the problem.
The Divide and Conquer is a very popular starting technique when troubleshooting network problems. Instead of starting at the top or the bottom of the OSI model the Divide and Conquer model starts in the middle and works in the direction of the problem. For example, by attempting to ping or traceroute from a device and engineer can determine whether to troubleshoot down towards the network layer or up through the transport layer. The Divide and Conquer method is one of the most commonly taught troubleshooting methods, mainly because it avoids the problem that both the Top-Down and Bottom-Up approaches have with troubleshooting problems without knowing which side of the OSI model the problem exists on. By starting the troubleshooting process in the middle of the OSI model there are fewer layers to work through and the problem is typically found faster than with the Top-Down or Bottoms-Up methods.
The Follow the Path technique is used to locate a problem by following the path that the traffic takes through the network. To start tools like traceroute are used to determine the path being taken through the network. If the traceroute is unable to complete then the problem may exist at that point within the network, sometimes the point of the troubleshooting is to determine whether the traffic is taking the “correct” path through the network. This is easily determined with a traceroute as well, wherever the traffic “steps” off the “correct” path is where to continue troubleshooting.
The Spot the Differences technique is used when there is something to compare against. For example, if troubleshooting access router configurations that are similar, an engineer can compare the configuration to determine a missing or extra command. The engineer must be knowledgeable enough to know what is suppose to be different and what is suppose to be the same to correctly use this technique.
The last troubleshooting technique covered in this article is the Move the Problem technique. The basic principle here is that if a component is moved and the problem moves with it, then the problem exists with the component, if it does not move then the component is probably not the problem. As with all of the techniques discussed there are only specific situations where it is possible to move a component to test it, this is one of the main limiting factors to this technique. A simple example of this would be if two branch offices existed that utilized the same router and the same (of very similar) configuration, if the devices were swapped and the problem moved from one office to the other the problem would probably exist with the router in the office where the problem moved.
Obviously there are a number of different techniques that can be used when troubleshooting, which one to use depends greatly on the specific situation and what is being troubleshooted. Overall, each method has its advantages and disadvantages and should be used in specific situations making none of them the perfect technique overall but only in specific situations. What really comes with experience is knowing which one to use in which situation to limit the amount of time spent troubleshooting and resolving a problem quickly. Hopefully you find this advice useful if whether you’re looking to take the CCNP TSHOOT exam or not.