Logs save lives! Or at least your environment and sometimes your sanity. Just recently, we had a customer who was facing a crucial issue: their production ITSM system was no longer able to connect to their database, bringing work to a grinding halt. The customer knew that they hadn’t made any configuration changes to their ITSM system, and pinging the database from the application servers returned no errors. What’s our next step in troubleshooting?
In a perfect world situation, this is what RJR Support is looking for when a ticket is opened:
If the error is reproducible:
- A step-by-step on how to reproduce the error. In the step-by-step, screenshots aren’t required to start with unless you feel the details are necessary.
- A screenshot of the error message
- A plaintext copy of the error message
- Tomcat logs
- IIS logs
- We’ll often provide counsel as to which logs to include, as there are many possible answers! However, the ARError and Mid-Tier logs are our go-to logs, and they’re great to receive with a ticket!
- Windows Event Viewer logs
- Results from 3rd party troubleshooting tools, such as Fiddler or Wireshark. (Again, we wouldn’t expect these to begin with unless they were part of your initial troubleshooting.)
We’re also aware that everything described above falls into a “perfect world” scenario, and sometimes these items just aren’t available when a ticket is first opened. Still, getting the above is always helpful!
So what did we eventually discover for our customer?
Well, we saw this in the ARError logs:
Sep 8, 2015 1:24:09 PM – WARNING (com.remedy.log.SERVLET) : Caught GoatException
Sep 8, 2015 1:24:11 PM – FINE (com.remedy.log.INTERNAL) : Throw ARException – ERROR (90): Cannot establish a network connection to the AR System server; Connection refused: connect fakeservernamehere:7012 ERROR (90): Cannot establish a network connection to the AR System server; Connection refused: connect fakeservernamehere:7012
Looks like there’s a problem with communications over port 7102! Testing connections over that specific port confirmed it.
In the end, we discovered that the customer’s security team had blocked the port after noticing “odd” traffic over it. The customer had them unblock the port, and regular communication between the application servers and the database resumed! (Now we can talk about a defined Change Management process, but that, of course, is a subject for another time.)
Thank you, logs!