.net 调试托管堆损坏

.net 调试托管堆损坏,.net,memory,clr,.net,Memory,Clr,我有一个有趣的问题,我有两个进程的两个转储,它们显示了托管堆损坏。我正在Windows7x64上的x64中使用clr.dll 4.0.30319.1008(RTMGDR.030319-1000)。 通过VerifyHeap,我知道我有一个缺陷: 0:016> !VerifyHeap object 000000000367ec60: bad member 0000000004fba740 at 000000000367ec78 curr_object: 000000000528CF

我有一个有趣的问题,我有两个进程的两个转储,它们显示了托管堆损坏。我正在Windows7x64上的x64中使用clr.dll 4.0.30319.1008(RTMGDR.030319-1000)。 通过VerifyHeap,我知道我有一个缺陷:

0:016> !VerifyHeap
object 000000000367ec60: bad member 0000000004fba740 at 000000000367ec78
curr_object:      000000000528CF90
Last good object: 000000000367ec40
对象是一个包含两个元素的数组

0:016> !DumpObj /d 000000000367ec60
Name:        System.Object[]
MethodTable: 000007feedf6adf8
EEClass:     000007feedaefc68
Size:        48(0x30) bytes
Array:       Rank 1, Number of elements 2, Type CLASS (Print Array)
Element Type:System.Object
Fields:
None

0:016> !DumpArray /d 000000000367ec60
Name:        System.Object[]
MethodTable: 000007feedf6adf8
EEClass:     000007feedaefc68
Size:        48(0x30) bytes
Array:       Rank 1, Number of elements 2, Type CLASS
Element Methodtable: 000007feedf65a48
[0] 0000000004fba740
[1] 000000000367ec90
第一个指针是损坏的值,它指向值为1的bool值,该值不是托管对象。这就是为什么总司令会出手相救的原因

0:016> db 0000000004fba740-10
00000000`04fba730  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
00000000`04fba740  **01 00** 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
00000000`04fba750  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
00000000`04fba760  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
00000000`04fba770  00 00 00 00 00 00 00 00-b8 1b f7 ed fe 07 00 00  ................
00000000`04fba780  d0 a7 fb 04 00 00 00 00-00 00 00 00 00 00 00 00  ................
00000000`04fba790  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
00000000`04fba7a0  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................

0:016> !lno 04fba740 
Before:  0000000004fba718 System.Collections.Hashtable+bucket[] 
After:   0000000004fba778 System.Collections.Hashtable 
Heap local consistency confirmed.
周围的对象并不重要,因为根据转储的不同,它们会随机更改

!GCRoot 0000000367ec60
Scan Thread 16 OSTHread 5fd0
r10:Root:  000000000367ec60(System.Object[])
Scan Thread 17 OSTHread 10cc
RSP:1de4cd58:Root:  000000000367ec60(System.Object[])
数组本身没有根,这表明它可以被收集。有趣的是,数组中的第二个对象是来自已退出线程的ThreadLocal数据。看起来CLR确实将ThreadLocal对象存储在每个线程的对象数组中,这些线程在退出时可以被收集。 线程编号17执行实际的集合,该集合引发ExecutionEngineeException。但线程16似乎确实将线程本地数据保存到一个数组中,该数组应该被固定(不是),而它不应该访问该数组

线程nr16似乎保存了一个已经退出的线程的TLS数据,并且可能会写入该线程

OS Thread Id: 0x5fd0 (16)
Child SP         IP               Call Site
000000001dffdfe8 0000000076eb135a [NDirectMethodFrameStandalone: 000000001dffdfe8] MS.Win32.UnsafeNativeMethods.MsgWaitForMultipleObjects(Int32, IntPtr[], Boolean, Int32, Int32)
000000001dffdfa0 000007fecfa7e1bd DomainBoundILStubClass.IL_STUB_PInvoke(Int32, IntPtr[], Boolean, Int32, Int32)*** WARNING: Unable to verify checksum for UIAutomationClientsideProviders.ni.dll

000000001dffe090 000007fecfa7b28d MS.Internal.AutomationProxies.Misc.MsgWaitForMultipleObjects(Microsoft.Win32.SafeHandles.SafeWaitHandle, Boolean, Int32, Int32)
000000001dffe110 000007fecfab5cdd MS.Internal.AutomationProxies.QueueProcessor.WaitForWork()
000000001dffe1b0 000007feede22f78 System.Threading.ExecutionContext.runTryCode(System.Object)*** WARNING: Unable to verify checksum for mscorlib.ni.dll

000000001dffe8d8 000007fef08044c4 [HelperMethodFrame_PROTECTOBJ: 000000001dffe8d8] System.Runtime.CompilerServices.RuntimeHelpers.ExecuteCodeWithGuaranteedCleanup(TryCode, CleanupCode, System.Object)
000000001dffea00 000007feede11661 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
000000001dffea60 000007feede115ab System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
000000001dffeab0 000007feedea6d8d System.Threading.ThreadHelper.ThreadStart()
000000001dffef08 000007fef08044c4 [GCFrame: 000000001dffef08] 
000000001dfff2f0 000007fef08044c4 [DebuggerU2MCatchHandlerFrame: 000000001dfff2f0] 
以下是GC collect的堆栈:

0:017> !DumpStack
OS Thread Id: 0x10cc (17)
Current frame: clr!WKS::gc_heap::mark_object_simple+0x75
Child-SP         RetAddr          Caller, Callee
000000001de4cce0 000007fef0877fb2 clr!WKS::gc_heap::mark_through_cards_for_segments+0x36b
000000001de4ce50 000007fef0873980 clr!WKS::gc_heap::mark_phase+0x160, calling clr!WKS::gc_heap::mark_through_cards_for_segments
000000001de4ce80 000007fef086fce7 clr!EEJitManager::CleanupCodeHeaps+0x57, calling clr!CrstBase::Leave
000000001de4cea0 000007fef07e3dc1 clr!CrstBase::Leave+0x31, calling clr!GetThread
000000001de4ced0 000007fef0873f3d clr!WKS::gc_heap::gc1+0xae, calling clr!WKS::gc_heap::mark_phase
000000001de4cef0 000007fef0874786 clr!WKS::gc_heap::update_collection_counts+0x16, calling 000000000065006e
000000001de4cf20 000007fef0a1fa56 clr!WKS::gc_heap::garbage_collect+0x42e, calling clr!WKS::gc_heap::gc1
000000001de4cf60 000007feede2d774 (MethodDesc 000007feedaa93b8 +0x124 System.TimeZoneInfo.GetDateTimeNowUtcOffsetFromUtc(System.DateTime, Boolean ByRef)), calling (MethodDesc 000007feedaa8708 +0 System.TimeSpan.Add(System.TimeSpan))
000000001de4cfa0 000007fef07fd4ff clr!SystemNative::__GetSystemTimeAsFileTime+0xf, calling kernel32!GetSystemTimeAsFileTimeStub
000000001de4cff0 000007fef087452e clr!WKS::GCHeap::GarbageCollectGeneration+0x14e, calling clr!WKS::gc_heap::garbage_collect
000000001de4d040 000007fef08734ce clr!WKS::gc_heap::try_allocate_more_space+0x25f, calling clr!WKS::GCHeap::GarbageCollectGeneration
000000001de4d080 000007fef0872f43 clr!WKS::gc_heap::allocate_small+0x158, calling clr!WKS::gc_heap::a_fit_segment_end_p
000000001de4d110 000007fef08731fe clr!FastAllocateObject+0x73e, calling clr!WKS::gc_heap::try_allocate_more_space
000000001de4d1f0 000007fef07fc8b8 clr!JIT_NewFast+0xb8, calling clr!FastAllocateObject
000000001de4d2c8 000007feede3fa80 (MethodDesc 000007feedaaa8e8 +0x40 System.Text.StringBuilder.ExpandByABlock(Int32)), calling clr!JIT_TrialAllocSFastMP_InlineGetThread

0:016> !Threads
ThreadCount:      17
UnstartedThread:  0
BackgroundThread: 13
PendingThread:    0
DeadThread:       1
Hosted Runtime:   no
                                           PreEmptive                                                   Lock
       ID  OSID        ThreadOBJ     State GC       GC Alloc Context                  Domain           Count APT Exception
   0    1  58e4 0000000000498ba0   2006020 Enabled  0000000000000000:0000000000000000 0000000000481df0     0 STA
   2    2  4190 000000000049ee80      b220 Enabled  0000000000000000:0000000000000000 0000000000481df0     0 MTA (Finalizer)
   6    3  48d4 000000001ac8bb60   1000220 Enabled  0000000000000000:0000000000000000 0000000000481df0     0 Ukn (Threadpool Worker)
   8    5  5fbc 000000001aca1970   a009220 Enabled  0000000000000000:0000000000000000 0000000000481df0     0 MTA (Threadpool Completion Port)
   9    6  615c 000000001c4b2880      b020 Enabled  0000000000000000:0000000000000000 0000000000481df0     0 MTA
  10    7  5818 000000001c4e7bd0   200b220 Enabled  0000000000000000:0000000000000000 0000000000481df0     0 MTA
  11    8  6e14 000000001c4f0850      7020 Enabled  0000000000000000:0000000000000000 0000000000481df0     2 STA
  12    a  683c 000000001c512610      7220 Enabled  0000000000000000:0000000000000000 0000000000481df0     0 STA
  14    b  6f40 000000001c521120      7220 Enabled  0000000000000000:0000000000000000 0000000000481df0     0 STA
  15    c  5070 000000001c564760   100a220 Enabled  0000000000000000:0000000000000000 0000000000481df0     0 MTA (Threadpool Worker)
  16    d  5fd0 000000000049bc10      b220 Enabled  0000000000000000:0000000000000000 0000000000481df0     0 MTA
  17    e  10cc 000000001c62e370      b220 Enabled  0000000000000000:0000000000000000 0000000000481df0     2 MTA (GC) System.ExecutionEngineException (0000000002441228)
XXXX    f       000000001e102c80     15820 Enabled  0000000000000000:0000000000000000 0000000000481df0     0 Ukn
  22   10  158c 000000001e103aa0   1009220 Enabled  0000000000000000:0000000000000000 0000000000481df0     0 MTA (Threadpool Worker)
  23   12  47e8 000000001e1048c0   8019220 Enabled  0000000000000000:0000000000000000 0000000000481df0     0 Ukn (Threadpool Completion Port)
  24    4  58a8 000000001e103390   8019220 Enabled  0000000000000000:0000000000000000 0000000000481df0     0 Ukn (Threadpool Completion Port)
  25    9  2874 000000001e102570   8009220 Enabled  0000000000000000:0000000000000000 0000000000481df0     0 MTA (Threadpool Completion Port)
这一切都是美好和有趣的,但我不知道如何进一步进行。由于该错误确实发生在自动测试机上,其中测试控制程序进程每天死亡约1-2次,因此我不能简单地将调试器附加到该进程,并设置一些断点来保护对特定内存位置的写入。任何关于如何理解这一点的额外提示都是非常受欢迎的。我将获得更多转储,以便能够至少进行差异分析,以检查哪些测试可能导致这种情况

在我看来,保存线程静态信息的CLR数组是未固定的,有人向第一个数组元素写入了一个未固定的bool值。CLR数组不包含值,但包含通常托管对象应位于的地址,但仅包含bool值(一个),而不包含具有对象头的普通CLR对象

错误的PInvoke签名会导致这种行为吗?我见过一些事情,比如

    [DllImport( "kernel32.dll" )]
    public static extern bool Beep( int frequeny_in, int time_in );

它确实返回一个单字节bool,但Beep方法确实返回一个4字节bool。PInvoke的错误返回类型(bool而不是int)会导致这样的问题吗?

坦率地说,我没有通读一遍,但似乎处理了一些类似的问题,并建议错误的p/Invoke返回类型可能导致托管堆损坏(另请参阅
bool
bool
的更多背景资料)。