Archive

Archive for the ‘Debugging’ Category

Lesson Learned – Orphan process

February 17th, 2012 1 comment

writing something after almost two months…not good but I was busy in lots of stuff, trying to learn new things which I will document later… But for today something on Analyzing dump for hangs due to orphan processes.

Recently while working on one issue, I found an interesting case scenario… this seems to be one common scenario… when users logoff from their session on TS\XenApp server, three processes stuck there – Csrss, Winlogon and LogonUI. Though they logged off but because of these processes, there session stuck, eating resources and some stage give unexpected behaviour…It is bit difficult to show the full stack here but I am documenting the technique that I used to find the root cause of the issue. So here’s what I did.

Complete Memory Dump – for hang related issue, it is good to take complete memory dump and atleast 2-3 to see the consistency (this what I usually do)…

Step 1 – Ensure symbols are loaded, may be a good idea to run lm command and too see what files are loaded and then run .reload /f to force the symbol download.

Step 2 – Find out all the processes, it will be good to have them in sort order by session so I ran command !sprocess -4 . this show me all the sessionss in proper order and also, what all processes are available in each session.

Step 3 – Now, some manual work, I looked into each session and checked for sessions with just these three processes.I found atleast 6-7 sessions.

PROCESS fffffa800a68c2e0
SessionId: 1 Cid: 038c Peb: 7fffffd9000 ParentCid: 037c
DirBase: 1ad871000 ObjectTable: fffff8a001c729e0 HandleCount: 79.
Image: csrss.exe

Step 4 – Now I checked all thread of each Csrss & Winlogon process for each hung session – !process <process-ID ff> (!process fffffa800a68c2e0 ff)

Step 5 – This showed me all informations related to this process, all threads active (??)…

Step 6 – Now I looked into each thread and check for ALPC wait chain message and found one…somethg like below

THREAD fffffa800a696700 Cid 038c.0398 Teb: 000007fffffdc000 Win32Thread: fffff900c01bf360 WAIT: (WrLpcReply) UserMode Non-Alertable
fffffa800a696ac0 Semaphore Limit 0×1
Waiting for reply to ALPC Message fffff8a003053d00 : queued at port fffffa800aa539e0 : owned by process fffffa8009d86630
Not impersonating
DeviceMap fffff8a0000088c0
Owning Process fffffa800a68c2e0 Image: csrss.exe
Attached Process N/A Image: N/A
Wait Start TickCount 3409868 Ticks: 2007111 (0:08:42:41.109)

Step 7 – Now important line in above is - Waiting for reply to ALPC Message fffff8a003053d00 : queued at port fffffa800aa539e0 : owned by process fffffa8009d86630

Step 8 – To check the alpc message for it I ran the command – !alpc /m fffff8a003053d00 

0: kd> !alpc /m fffff8a003053d00

Message @ fffff8a003053d00
MessageID : 0×0050 (80)
CallbackID : 0x034F (847)
SequenceNumber : 0×00000002 (2)
Type : LPC_REQUEST
DataLength : 0×0128 (296)
TotalLength : 0×0150 (336)
Canceled : No
Release : No
ReplyWaitReply : No
Continuation : Yes
OwnerPort : fffffa800aa5ce60 [ALPC_CLIENT_COMMUNICATION_PORT]
WaitingThread : fffffa800a696700
QueueType : ALPC_MSGQUEUE_PENDING
 QueuePort : fffffa800aa539e0 [ALPC_CONNECTION_PORT]
QueuePortOwnerProcess : fffffa8009d86630 (lsm.exe)
ServerThread : fffffa800aa5d060
QuotaCharged : No
CancelQueuePort : 0000000000000000
CancelSequencePort : 0000000000000000
CancelSequenceNumber : 0×00000000 (0)
ClientContext : 0000000000000000
ServerContext : 0000000000000000
PortContext : 000000000031d010
CancelPortContext : 0000000000000000
SecurityData : 0000000000000000
View : 0000000000000000

Step 9 – Some of the important thing in above o/p are in Bold

Step 10 – Now, it seems that Csrss – has alpc wait chain -> on lsm.exe

Step 11 – To go further, we can look into more details to the ServerThread to see what all components are there in its stack.

Step12 – so I ran the command – !thread fffffa800aa5d060

Step 13 – Here in the o/p of this command I could see the whole stack and by going through each components (remember bottom-up), I found that at last a third party components has made a call and the it just wait on loop.

system hang > orphan process > Csrss > ALPC wait > lsm.exe > 3rd-part component… just asked them to check it and issue is pretty much resolved.

This is one of the technique but you also need to look into locks (!cs -l) just to confirm if there is any dead-lock or not… I hope this is interesting and my on-line note will help me again….

Reading x32 stack – Learned two new commands!!!

December 25th, 2011 No comments

Learned two new commands while working on an issue. I have Windows 7 x64 OS and was troubleshooting an issue… I took a process dump and tried opening in WinDbg…

As you can see stack is not showing properly… after some searching on web I found some useful articles… it looks as processes are x32 but dump is taken on x64, therefore, I can’t read it. Further browsing through the help (.hh) of Windbg, I found below two commands: -

0:000> .load wow64exts
0:000> .effmach x86

So I ran these command (Check Red block): -

after this, I ran command again to get stack information kv

Looks better now !!! (little more learning :-) )…

 

Debugging for Starters – III

December 13th, 2011 1 comment

Debugging for Starters – III

First two blog posts in this series are -> http://blog.lkctx.com/debugging-for-starters-i

http://blog.lkctx.com/debugging-for-starters-ii/

We already discussed different terminologies, different types of dumps, tools to create dumps and also, how to check if they are good for analysis or not. In next couple of articles, I will document steps require to open a dump in Windbg. I will also try to document  steps require to troubleshoot some common issues related to :-

  1. Application\Server crash
  2. Application\Server hangs
  3. CPU Spikes, etc;

and will add some more tools as and when require.The main tool that we are going to use is Windbg.

http://www.microsoft.com/whdc/devtools/debugging/debugstart.mspx

The installation of Windbg is pretty simple, anyone who has ever installed any software on Windows , can do it. However, before opening the dump, you need to configure the symbol server.

Symbols – In simplest way, Symbols (.pdb files, generated during application compilations) convert 01010101 to ‘human readable’ English. There are more technical definitions exist on internet but this is the simplest I can think of. Symbols are provided by the application vendors, usually they have their Public facing Symbols server. For example: -

How to configure symbol Server ­

So let’s start and open -> Windbg (Start-> Programs -> Debugging tools for Windows) and configure ‘Symbol’ server location in Windbg.

SRV*c:\symcache*http://msdl.microsoft.com/download/symbols;SRV*c:\symcache*http://ctxsym.citrix.com/symbols

If you don’t do this and try to open a dump file inside Windbg, you will see following: -

So now, once you have Symbol server configured properly, we can start first step to open a dump in Windbg.

I also find it difficult to follow theory without any example, so I will cover the analysis part with some real-world example.

Disclaimer – Please note that the issues described in my posts may not be actual issue you are facing and should not be consider as issue with specific application or software. I have forced some of this issues to happen, with the help of different tools, for the sake of this tutorial. However, the steps mentioned can be used while dealing with similar issues with any application.

How to read the Open and read the Dump

Opening a dump is very simple if you have write symbols (as mentioned above). File -> Open a Crash dump and then select the dump file. Some of the important things to notice are: -

  1. Types of dump
  2. System\Process uptime
  3. Symbol search path

You can do the basic analysis with just one simple command: – !analyze –v

Usually, it will return and highlight the module that is culprit, however, don’t always believe it as it will, by default, seems to look for any 3rd party non-OS components and point it (as OS components are pretty stable). The most important part is the stack it is pointing to, some of the rules are: -

  1. Read from bottom and go up
  2. Check all the components involved
  3. Check the last components called

Above is an example of a stack and different components involved. This crash is generated using SystemDump utility, therefore, SystemDump components is on top on the stack and culprit for this crash. The same technique can be used to analyze most type of dumps.

I think this is enough for today’s post. From my next post, I will start covering some common scenarios and will give some example to show how easy is to do the basic analysis.

Please let me know your feedback and any topic that interest you.

Memory Dump Analysis Anthology – Wow!

December 1st, 2011 No comments

One of the best book on Dump Analsysis, the whole series worth reading along with Windows Internal. Just got Vol-1 signed by its author, Dmitry Vostokov.

http://www.dumpanalysis.org/Memory+Dump+Analysis+Anthology+Volume+1

http://www.dumpanalysis.org/Forthcoming+Memory+Dump+Analysis+Anthology+Volume+2

http://www.dumpanalysis.org/Memory+Dump+Analysis+Anthology+Volume+3

http://www.dumpanalysis.org/Memory+Dump+Analysis+Anthology+Volume+4

http://www.dumpanalysis.org/Memory+Dump+Analysis+Anthology+Volume+5

Dmitry Vostokov has a very informative blog, http://www.dumpanalysis.org , where he shares his experience and knowledge…bookmark it if you want to learn and would like go deeper in debugging… He is also a developer of many useful utilities like DumpCheck, TestWER, etc.

Debugging for Starters – II

December 1st, 2011 2 comments

Debugging for Starters – II

First blog in this series is -> http://blog.lkctx.com/debugging-for-starters-i/

So we already discussed some terms in above blog, now let’s see how we can create a dump (as we are going to concentrate more on Dump analysis then live-debugging techniques).

Creating a Dump – There are different ways to create User dumps – automatically andor manually.

This will help to capture the dump in case application crash. From Windows Vista onwards, you can use Task Manager to create a dump of any process. This will be helpful if you are troubleshooting issues related CPU spikes in a process.

Need to force a dump – In case you need to force to create a dump e.g. in case of ServerApplications hang, then you can try below method: -

Verifying if Default Debugger is right – In some case you may experience that though you have configured the utility to capture the dump, however, it is not generating any dump. For this, you can use  TestWER  (formerly known as TestDefaultDebugger) – http://support.citrix.com/article/CTX111901 . this is a very simple and useful utility to ensure that you have some debugger enabled on your server, it crashed itself to generate a dump.

DumpCheck – Another very helpful utility to verify dump. This will ensure that the dump you have captured is valid for analysis. This will be helpful in case you have to send dump for further analysis to Citrix Support or Microsoft Support. It install as explorer extension. You can download it from -> http://support.citrix.com/article/CTX108825

Windbg – And finally, Windbg to analyszeopen the dump. This is part of Debugging Tools for Windows and you can download it from -> http://www.microsoft.com/whdc/devtools/debugging/debugstart.mspx

In next article, I will document steps require to open a dump in Windbg. I will also try to document  steps require to troubleshoot some common issues related to crash, hangs, CPU Spikes, etc; and will add some more tools as and when require.

Debugging for Starters – I

November 24th, 2011 1 comment

Debugging for Starters – I

There are many articles on the web on this topic with some very good technical details and deep-dive. However, when I started debugging it was bit difficult to find the starting point. Most of the articles or books I found are covering high-level debugging. Also, Windows Internal is must to understand the whole aspect. But I was more interested in ‘quick’ and ‘short-route’. Being from System Administration and consulting background, I was more interested in find the easy way to move the issue to second-level. In this series, I will try to document my experience and learning on this area.

Read more…

Get plugin http://www.fastemailsender.com