记一次 .NET 某智慧工厂视觉程序 崩溃分析
一:背景
1. 讲故事
好久没有写文章了,除了家里的小宝宝要照护,还有一个就是卷英语的听说去了,虽有些时段未更新,但这个 .NET高级调试之旅 肯定不会断的,感谢大家的期盼,接下来开始分享吧。
前些天有位朋友找到我,说它们的视觉程序崩溃了,让我帮忙看下怎么回事?虽然给同行们分析没有那么频了,但该出手时还得出手,让朋友抓一个dump,我来给上一卦。
二:崩溃分析
1. 为什么会崩溃
查崩溃原因的approachs有很多,除了使用常规的 !analyze -v ,还有一个就是双击打开windbg,会直接定位错误位置,输出如下:
...................................................
This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr
(930.1684): CLR exception - code e0434352 (first/second chance not available)
CLR exception type: System.Reflection.TargetInvocationException
"调用的目标发生了异常。"
For analysis of this file, run !analyze -v
KERNELBASE!RaiseException+0x69:
00007ff9`4a83cf19 0f1f440000 nop dword ptr [rax+rax]
从卦中的 CLR exception - code e0434352 (first/second chance not available) 来看,这是一个 CLR exception,即 Managed Exception,并且是一个 TargetInvocationException 目标对象调用异常,接下来用 !t 观察下托管异常信息。
0:000> !t
ThreadCount: 36
UnstartedThread: 3
BackgroundThread: 31
PendingThread: 0
DeadThread: 0
Hosted Runtime: no
Lock
ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt Exception
0 1 1684 0000027727ebfc30 26020 Preemptive 000002772D3ACD50:000002772D3AEC80 0000027727e600b0 0 STA System.Reflection.TargetInvocationException 000002772d33e5d8
5 2 978 0000027727eea240 2b220 Preemptive 000002772D334D10:000002772D336C80 0000027727e600b0 0 MTA (Finalizer)
9 3 acc 00000277420e2900 202b220 Preemptive 000002772D398278:000002772D398C80 0000027727e600b0 0 MTA
10 4 aac 000002774211a3e0 8029220 Preemptive 0000000000000000:0000000000000000 0000027727e600b0 0 MTA (Threadpool Completion Port)
从卦中看确实是一个托管异常,接下来使用 !pe -nested 观察下异常信息。
0:000> !pe -nested
Exception object: 000002772d33e5d8
Exception type: System.Reflection.TargetInvocationException
Message: 调用的目标发生了异常。
InnerException: System.AccessViolationException, Use !PrintException 000002772d33e438 to see more.
StackTrace (generated):
SP IP Function
000000D8F17FE160 0000000000000000 mscorlib_ni!System.RuntimeMethodHandle.InvokeMethod(System.Object, System.Object[], System.Signature, Boolean)+0xffff8006c9419390
000000D8F17FE160 00007FF933CDFAAD mscorlib_ni!System.Reflection.RuntimeMethodInfo.UnsafeInvokeInternal(System.Object, System.Object[], System.Object[])+0x10d
000000D8F17FE1D0 00007FF933CFFA40 mscorlib_ni!System.Delegate.DynamicInvokeImpl(System.Object[])+0xa0
000000D8F17FE220 00007FF91039945D System_Windows_Forms_ni!System.Windows.Forms.Control.InvokeMarshaledCallbackDo(ThreadMethodEntry)+0x9d
000000D8F17FE260 00007FF910399379 System_Windows_Forms_ni!System.Windows.Forms.Control.InvokeMarshaledCallbackHelper(System.Object)+0x69
000000D8F17FE2B0 00007FF933CD00B2 mscorlib_ni!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)+0x172
000000D8F17FE380 00007FF933CCFF35 mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)+0x15
000000D8F17FE3B0 00007FF933CCFF05 mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)+0x55
000000D8F17FE400 00007FF9103992FC System_Windows_Forms_ni!System.Windows.Forms.Control.InvokeMarshaledCallback(ThreadMethodEntry)+0xbc
000000D8F17FE450 00007FF910399066 System_Windows_Forms_ni!System.Windows.Forms.Control.InvokeMarshaledCallbacks()+0xe6
000000D8F17FE4C0 00007FF910382D69 System_Windows_Forms_ni!System.Windows.Forms.Control.WndProc(System.Windows.Forms.Message ByRef)+0x509
000000D8F17FE580 00007FF91038EEA7 System_Windows_Forms_ni!System.Windows.Forms.Form.WndProc(System.Windows.Forms.Message ByRef)+0x67
000000D8F17FE5E0 00007FF910382092 System_Windows_Forms_ni!System.Windows.Forms.NativeWindow.Callback(IntPtr, Int32, IntPtr, IntPtr)+0xc2
000000D8F17FE990 0000000000000000 System_Windows_Forms_ni!System.Windows.Forms.UnsafeNativeMethods.DispatchMessageW(MSG ByRef)+0x1
000000D8F17FEA50 00007FF910398671 System_Windows_Forms_ni!System.Windows.Forms.Application+ComponentManager.System.Windows.Forms.UnsafeNativeMethods.IMsoComponentManager.FPushMessageLoop(IntPtr, Int32, Int32)+0x341
000000D8F17FEB40 00007FF910397FD7 System_Windows_Forms_ni!System.Windows.Forms.Application+ThreadContext.RunMessageLoopInner(Int32, System.Windows.Forms.ApplicationContext)+0x1c7
000000D8F17FEBE0 00007FF910397DD2 System_Windows_Forms_ni!System.Windows.Forms.Application+ThreadContext.RunMessageLoop(Int32, System.Windows.Forms.ApplicationContext)+0x52
000000D8F17FEC40 00007FF8D76B0D60 TWVision!TWVision.Program.Main()+0x260
StackTraceString:
HResult: 80131604
0:000> !PrintException /d 000002772d33e438
Exception object: 000002772d33e438
Exception type: System.AccessViolationException
Message: 尝试读取或写入受保护的内存。这通常指示其他内存已损坏。
InnerException:
StackTrace (generated):
StackTraceString:
HResult: 80004003
从卦象看,看样子是UI在执行回调函数的时候崩出来的异常,比如上面的 InvokeMarshaledCallbackDo 函数。
2. 到底在执行什么函数
接下来的思路在哪里呢?肯定就是要找到 UI线程 到底在执行哪一个函数,这个需要大家知道一点 MessageLoop 的一些相关知识,即在 UI线程中找到正在处理的 ThreadMethodEntry 队列item,使用 !dso 即可。
0:000> !dso
OS Thread Id: 0x1684 (0)
RSP/REG Object Name
rax 000002772d33e788 System.String 调用的目标发生了异常。
000000D8F17FD828 000002772d33e5d8 System.Reflection.TargetInvocationException
...
000000D8F17FE058 000002772c6c56a8 System.Windows.Forms.Control+ThreadMethodEntry
0:000> !do 000002772c6c56a8
Name: System.Windows.Forms.Control+ThreadMethodEntry
MethodTable: 00007ff9100df170
EEClass: 00007ff910191fd8
Size: 104(0x68) bytes
File: C:\Windows\Microsoft.Net\assembly\GAC_MSIL\System.Windows.Forms\v4.0_4.0.0.0__b77a5c561934e089\System.Windows.Forms.dll
Fields:
MT Field Offset Type VT Attr Value Name
00007ff9100d87e8 400385b 8 ...ows.Forms.Control 0 instance 0000027729b28e58 caller
00007ff9100d87e8 400385c 10 ...ows.Forms.Control 0 instance 0000027729b28e58 marshaler
00007ff9337580f0 400385d 18 System.Delegate 0 instance 000002772c6c5610 method
00007ff933755e70 400385e 20 System.Object[] 0 instance 0000000000000000 args
00007ff933755dd8 400385f 28 System.Object 0 instance 0000000000000000 retVal
00007ff933755b70 4003860 30 System.Exception 0 instance 0000000000000000 exception
00007ff93375b698 4003861 58 System.Boolean 1 instance 0 synchronous
00007ff93375b698 4003862 59 System.Boolean 1 instance 0 isCompleted
00007ff9337d4398 4003863 38 ....ManualResetEvent 0 instance 0000000000000000 resetEvent
00007ff933755dd8 4003864 40 System.Object 0 instance 000002772c6c5710 invokeSyncObject
00007ff9337d24b0 4003865 48 ....ExecutionContext 0 instance 000002772c6c5650 executionContext
00007ff9337d70e8 4003866 50 ...ronizationContext 0 instance 00000277298bd630 syncContext
从卦中数据看,里面的 000002772c6c5610 method 就是我们要找的方法,输出如下:
0:000> !DumpObj /d 000002772c6c5610
Name: System.Action
MethodTable: 00007ff9337daff0
EEClass: 00007ff9338e8f50
Size: 64(0x40) bytes
File: C:\Windows\Microsoft.Net\assembly\GAC_64\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
Fields:
MT Field Offset Type VT Attr Value Name
00007ff933755dd8 40002f3 8 System.Object 0 instance 000002772c6c55f0 _target
00007ff933755dd8 40002f4 10 System.Object 0 instance 0000000000000000 _methodBase
00007ff9337d31f8 40002f5 18 System.IntPtr 1 instance 7ff8d7c3dd90 _methodPtr
00007ff9337d31f8 40002f6 20 System.IntPtr 1 instance 0 _methodPtrAux
00007ff933755dd8 4000300 28 System.Object 0 instance 0000000000000000 _invocationList
00007ff9337d31f8 4000301 30 System.IntPtr 1 instance 0 _invocationCount
0:000> !U 7ff8d7c3dd90
Unmanaged code
00007ff8`d7c3dd90 e9fb2fa300 jmp xxx!xxx.FuncLib+<>c__DisplayClass10_0.b__0() (00007ff8`d8670d90)
00007ff8`d7c3dd95 5f pop rdi
00007ff8`d7c3dd96 0100 add dword ptr [rax],eax
00007ff8`d7c3dd98 a075d5d7f87f0000e8 mov al,byte ptr [E800007FF8D7D575h]
00007ff8`d7c3dda1 eb66 jmp xxx.ConfigSystem.get_MESLines()+0x1 (00007ff8`d7c3de09)
00007ff8`d7c3dda3 f65e5e neg byte ptr [rsi+5Eh]
00007ff8`d7c3dda6 0006 add byte ptr [rsi],al
00007ff8`d7c3dda8 e8e366f65e call clr!PrecodeFixupThunk (00007ff9`36ba4490)
00007ff8`d7c3ddad 5e pop rsi
00007ff8`d7c3ddae 0105e8db66f6 add dword ptr [00007ff8`ce2ab99c],eax
卦中的 xxx!xxx.FuncLib+<>c__DisplayClass10_0. 就是正在 executing 的方法,接下来就是祭出 ILSPY 观察这个方法的代码,成功给它找到,截图如下:

3. 崩溃点在哪里
虽然找到了崩溃方法,但我们还没有找到程序的崩溃点,毕竟在 !pe 中我们一无所获,不过真相离我们越来越近了,如果你深谙底层,你会知道 Exception 中会有一个 _ip 字段,这个字段就记录了崩溃的 IP指令,截图如下:

最后通过 uf 7ff92854124a 观察崩溃处的汇编代码,输出如下:
0:000> uf 7ff92854124a
msftedit!CRchTxtPtr::ReplaceRange+0x4fc:
00007ff9`2854122c 488b03 mov rax,qword ptr [rbx]
00007ff9`2854122f 488d4de0 lea rcx,[rbp-20h]
00007ff9`28541233 48894c2430 mov qword ptr [rsp+30h],rcx
00007ff9`28541238 448bce mov r9d,esi
00007ff9`2854123b 897c2428 mov dword ptr [rsp+28h],edi
00007ff9`2854123f 458bc6 mov r8d,r14d
00007ff9`28541242 418bd4 mov edx,r12d
00007ff9`28541245 44897c2420 mov dword ptr [rsp+20h],r15d
00007ff9`2854124a 488b00 mov rax,qword ptr [rax]
00007ff9`2854124d 488bcb mov rcx,rbx
00007ff9`28541250 ff15ca012500 call qword ptr [msftedit!_guard_dispatch_icall_fptr (00007ff9`28791420)]
从卦象中,应该是 rax,qword ptr [rax] 中的 rax 是 0 导致的,这里我就不到栈中去恢复了,这里有一个重要信息就是 msftedit!CRchTxtPtr::ReplaceRange 函数,这告诉我们它崩溃的是在 RichTextBox 中,最后回头在 ShowMsg() 中寻找 RichTextBox 的相关赋值操作,果然不少,截图如下:

最后还有一个疑问,当前正在写入什么内容呢?如果你想知道,也很简单,把上面的 000002772c6c55f0 _target 弄出来即可,截图如下:

最后就是下结论了,在我的分析之旅中,我见过几次 RichTextBox 导致的程序崩溃,不知道是不稳定还是有bug的存在,所以我个人建议是做排除法,不要使用这玩意再观察效果,效果自然是满意的。
三:总结
我见过 RichTextBox导致的崩溃,也见过RichTexBox导致的非托管内存暴涨,大家以后在用这个控件的时候要 watch out 一下。

原文地址: https://www.cveoy.top/t/topic/qGAt 著作权归作者所有。请勿转载和采集!