Home > Citrix, Debugging, XenApp > Lesson Learned – Orphan process

Lesson Learned – Orphan process

February 17th, 2012 Leave a comment Go to comments
 

writing something after almost two months…not good but I was busy in lots of stuff, trying to learn new things which I will document later… But for today something on Analyzing dump for hangs due to orphan processes.

Recently while working on one issue, I found an interesting case scenario… this seems to be one common scenario… when users logoff from their session on TS\XenApp server, three processes stuck there – Csrss, Winlogon and LogonUI. Though they logged off but because of these processes, there session stuck, eating resources and some stage give unexpected behaviour…It is bit difficult to show the full stack here but I am documenting the technique that I used to find the root cause of the issue. So here’s what I did.

Complete Memory Dump – for hang related issue, it is good to take complete memory dump and atleast 2-3 to see the consistency (this what I usually do)…

Step 1 – Ensure symbols are loaded, may be a good idea to run lm command and too see what files are loaded and then run .reload /f to force the symbol download.

Step 2 – Find out all the processes, it will be good to have them in sort order by session so I ran command !sprocess -4 . this show me all the sessionss in proper order and also, what all processes are available in each session.

Step 3 – Now, some manual work, I looked into each session and checked for sessions with just these three processes.I found atleast 6-7 sessions.

PROCESS fffffa800a68c2e0
SessionId: 1 Cid: 038c Peb: 7fffffd9000 ParentCid: 037c
DirBase: 1ad871000 ObjectTable: fffff8a001c729e0 HandleCount: 79.
Image: csrss.exe

Step 4 – Now I checked all thread of each Csrss & Winlogon process for each hung session – !process <process-ID ff> (!process fffffa800a68c2e0 ff)

Step 5 – This showed me all informations related to this process, all threads active (??)…

Step 6 – Now I looked into each thread and check for ALPC wait chain message and found one…somethg like below

THREAD fffffa800a696700 Cid 038c.0398 Teb: 000007fffffdc000 Win32Thread: fffff900c01bf360 WAIT: (WrLpcReply) UserMode Non-Alertable
fffffa800a696ac0 Semaphore Limit 0×1
Waiting for reply to ALPC Message fffff8a003053d00 : queued at port fffffa800aa539e0 : owned by process fffffa8009d86630
Not impersonating
DeviceMap fffff8a0000088c0
Owning Process fffffa800a68c2e0 Image: csrss.exe
Attached Process N/A Image: N/A
Wait Start TickCount 3409868 Ticks: 2007111 (0:08:42:41.109)

Step 7 – Now important line in above is - Waiting for reply to ALPC Message fffff8a003053d00 : queued at port fffffa800aa539e0 : owned by process fffffa8009d86630

Step 8 – To check the alpc message for it I ran the command – !alpc /m fffff8a003053d00 

0: kd> !alpc /m fffff8a003053d00

Message @ fffff8a003053d00
MessageID : 0×0050 (80)
CallbackID : 0x034F (847)
SequenceNumber : 0×00000002 (2)
Type : LPC_REQUEST
DataLength : 0×0128 (296)
TotalLength : 0×0150 (336)
Canceled : No
Release : No
ReplyWaitReply : No
Continuation : Yes
OwnerPort : fffffa800aa5ce60 [ALPC_CLIENT_COMMUNICATION_PORT]
WaitingThread : fffffa800a696700
QueueType : ALPC_MSGQUEUE_PENDING
 QueuePort : fffffa800aa539e0 [ALPC_CONNECTION_PORT]
QueuePortOwnerProcess : fffffa8009d86630 (lsm.exe)
ServerThread : fffffa800aa5d060
QuotaCharged : No
CancelQueuePort : 0000000000000000
CancelSequencePort : 0000000000000000
CancelSequenceNumber : 0×00000000 (0)
ClientContext : 0000000000000000
ServerContext : 0000000000000000
PortContext : 000000000031d010
CancelPortContext : 0000000000000000
SecurityData : 0000000000000000
View : 0000000000000000

Step 9 – Some of the important thing in above o/p are in Bold

Step 10 – Now, it seems that Csrss – has alpc wait chain -> on lsm.exe

Step 11 – To go further, we can look into more details to the ServerThread to see what all components are there in its stack.

Step12 – so I ran the command – !thread fffffa800aa5d060

Step 13 – Here in the o/p of this command I could see the whole stack and by going through each components (remember bottom-up), I found that at last a third party components has made a call and the it just wait on loop.

system hang > orphan process > Csrss > ALPC wait > lsm.exe > 3rd-part component… just asked them to check it and issue is pretty much resolved.

This is one of the technique but you also need to look into locks (!cs -l) just to confirm if there is any dead-lock or not… I hope this is interesting and my on-line note will help me again….

  1. March 14th, 2012 at 04:21 | #1

    I really like your writing style, superb info , regards for posting : D.

  2. Lesta
    August 16th, 2012 at 08:42 | #2

    What tool are you using for this?
    I haved found the same problem. It has not crashed the server , but i have a csrss.exe chewing up 15% and it is waiting on LSM.exe. I want to find out what is process is causing the hang

    I am using w2008R2 and found the problem by ising the resource monitor

    • admin
      August 16th, 2012 at 17:17 | #3

      Complete memory dump opened in WinDbg.

  1. No trackbacks yet.