The overall process of troubleshooting is a very subjective and which trouble shooting techniques that are used to troubleshoot specific problems are as well. This article takes a look at the basic troubleshooting process steps as laid out by Cisco; these procedures are vital to understand for those candidates looking to obtain the Cisco certified Network Professional (CCNP) and other higher level certifications. The CCNP TSHOOT exam is one of the required exams that must be taken to achieve the CCNP; this exam requires a knowledge base including the concepts discussed in this article. It is also important to note that these specific process steps are given in a specific order but can be used and reused in a number of different orders depending on the experience of the engineer.
A common issue that exists for troubleshooters is a lack of a clear definition of a problem being reported; a common one is “My **** is not working”. While this gives a basic idea of what to look at it does not really give an engineer a good idea of where to start; it is sort of like “my car doesn’t work”. During this step in the process an engineer must define the problem being reported; this includes talking to the reporting party and hopefully observing the exact problem being reported. The more specific definition of the problem, the easier it is to narrow down and fix the problem.
Once a proper definition of a problem exists, the specific devices to gather information from can be determined. What information to specifically gather really comes with experience, in general it is best to have too much information over too little. Examples of this would be gathering event logs, status information and verifying current operations information from each affected device and those devices along the affected path. Once all of the relevant information is obtained goto the next step.
The gather and analyze steps are really two sides of the same step but is presented as a separate step in the process. Once an engineer has all of the information gathered that is relevant to the problem it must then be analyzed and formatted. The specific format is not specific and depends on how the engineer best reviews information. Once it is formatted to the engineers liking then it is able to be reviewed easier and problem can be located faster. An example would be taking the information gathered from the event logs, status information and operations information and performing an assessment of observed potential problems.
To find the cause of a problem, it is often required to eliminate what is not the cause of the problem. This step extends on the information formatted in the previous step and attempts to isolate the potential causes of the problem and eliminate those that are not. An example would be observing that the configuration was not altered on a device when a trouble occurred; this eliminates a configuration change as a potential problem. Once the potential causes of the problem have been isolated, move to the next step in the process.
Once the potential causes of the problem have been isolated a hypothesis can be derived about how the problem occurred and how to fix the problem. This step in the process includes the formulation of the hypothesis and mapping out a procedure for testing the hypothesis. An example would be observing that a circuit between major offices went down when the trouble occurred; from this hypothesizing that the circuit trouble caused the problem being reported.
During this step the engineer takes the procedure (or procedures) laid out in the previous step and tests whether the hypothesis was correct and if not why. At times it is necessary to take the information out of this step and repeat back to the previous step (especially when the hypothesis was incorrect). An example would be to bring down the circuit between offices (during designated testing and change times) and see if the problem recurs.
Well as for the most obvious step name, this step includes taking the information taken from the testing step and implements the fix (or fixes) proven during the testing. An example would be ensuring that the circuit does not go down and pursuing a method to ensure that the problem does not recur even if the circuit does go down again (i.e. offline file support)
As even Cisco admits, the specific process steps to fix a problem can depend greatly on the engineer doing the troubleshooting and the specific problem. These set of steps are laid out as a backbone that can be followed to achieve successful results, the streamlining of the process occurs with experience.