Object Stack Allocation in .NET 1003/30/2026 | 7 minutes to read
Object Stack Allocation is a performance optimization feature in .NET 10 that allows certain objects to be allocated on the stack instead of the heap. The JIT compiler identifies objects and object allocations that don't escape from a method and may decide to allocate these non-escaping objects on the stack. When these objects are stack-allocated, they get freed automatically when the stack frame is freed as the method returns, reducing the load on the Garbage Collector (GC).
Recent improvements in this area have been detailed in the Performance Improvements in .NET 10 blog post.
In this post, I will investigate the current state and limitations of this feature. Please note that the findings reveal some of the current internal limitations of the runtime, which may change or be adjusted in future releases.
Using Stopwatch
[MethodImpl(MethodImplOptions.AggressiveOptimization)] public long DoWork() { var sw = Stopwatch.StartNew(); int i = 65; Nop(); sw.Stop(); return i + sw.ElapsedTicks; } [MethodImpl(MethodImplOptions.NoInlining)] private static void Nop() { }
To observe the generated code, run the application with the $env:DOTNET_JitDisasm="Test:DoWork()" environment variable set. It prints the following assembly code on the console on a Windows 11 x64 platform:
G_M000_IG01: ;; offset=0x0000
push rsi
push rbx
sub rsp, 56
vzeroupper
xor eax, eax
mov qword ptr [rsp+0x30], rax
mov qword ptr [rsp+0x28], rax
G_M000_IG02: ;; offset=0x0015
lea rcx, [rsp+0x30]
mov rax, 0x7FF8F15A3FB0
G_M000_IG03: ;; offset=0x0024
call rax ; Interop+Kernel32:QueryPerformanceCounter(ptr):int
mov rbx, qword ptr [rsp+0x30]
call [Test:Nop()]
lea rcx, [rsp+0x28]
mov rax, 0x7FF8F15A3FB0
G_M000_IG04: ;; offset=0x0040
call rax ; Interop+Kernel32:QueryPerformanceCounter(ptr):int
mov rsi, qword ptr [rsp+0x28]
sub rsi, rbx
cmp dword ptr [(reloc 0x7ff85a53e808)], 0
jne SHORT G_M000_IG07
G_M000_IG05: ;; offset=0x0053
lea rax, [rsi+0x41]
G_M000_IG06: ;; offset=0x0057
add rsp, 56
pop rbx
pop rsi
ret
G_M000_IG07: ;; offset=0x005E
call CORINFO_HELP_POLL_GC
jmp SHORT G_M000_IG05
; Total bytes of code 101
This optimization does not kick in with Tier 0 compilation of the code; hence I added the [MethodImpl(MethodImplOptions.AggressiveOptimization)] attribute to the DoWork() method.
Code with labels G_M000_IG03 and G_M000_IG04 contains the inlined code for the Start and Stop method calls on the Stopwatch object. Label G_M000_IG05 add 65 to the result value.
An important observation: the more code inlined, the better escape analysis can perform.
When I run the above code and measure the allocated memory in bytes, it prints 0.
public void Measurement() { long beginning = GC.GetTotalAllocatedBytes(true); for (int i = 0; i < 100; i++) sum += DoWork(); var allocated = GC.GetTotalAllocatedBytes(true) - beginning; Console.WriteLine($"Allocated: {allocated}, Sum: {sum}"); }
Console output: Allocated: 0, Sum: 6889
Custom Objects
Let's replace the Stopwatch type with a custom type (similarly returning a timestamp).
public long DoWork() { var obj = new MyClass(); Nop(); return 65 + obj.Get(); } //... public class MyClass { public long Get() => Stopwatch.GetTimestamp(); }
Without AggressiveOptimization attribute it prints Allocated: 2400, Sum: 249270223149128 on the console.
With AggressiveOptimization attribute it prints Allocated: 0, Sum: 7393 on the console.
Size of the Custom Objects
Does the size of the allocated object limit stack object allocation? To test this, I added an InlineArray to MyClass type:
public class MyClass { private MyInlineType<byte> _arr; public long Get() => Stopwatch.GetTimestamp() + _arr[0]; } [InlineArray(520)] public struct MyInlineType<T> { public T _value; }
The maximum size of the inline array that still got stack-allocated on my test machine is 520 bytes. However, using up to 8 such objects still got stack-allocated, and the 9th resumed to heap allocation.
While this seems to be pointing to a limit, it cannot be generalized. Classes sometimes already with 18 long fields will not be stack allocated. For example, a class with 18 similar fields:
private long _a01 = (Stopwatch.GetTimestamp() * 1); // does get stack-allocated
However, changing the multiplication to division and the type gets stack-allocated:
private long _a01 = (Stopwatch.GetTimestamp() / 1); // does get stack-allocated
This above case is discussed in detail here.
For arrays, I observed stack allocation with the same limit. The maximum byte[] size observed is 512 elements, and the maximum int[] size observed is 128 elements.
Escaping Into Substack
Is a stack-allocated object allowed to 'escape' to sub frames?
public long DoWork() { var obj = new MyClass(); var value = Nop(obj); return 65 + value + obj.Get(); } [MethodImpl(MethodImplOptions.NoInlining)] private static long Nop(MyClass obj) => 0;
Without NoInlining (when the Nop method is likely inlined), objects can be stack-allocated. This even works if Nop invokes a Nop2 method passing the same object.
With NoInlining, objects are heap allocated - even though obj parameter is not used.
Try-Finally Blocks
Try-Finally blocks allow for stack allocation optimization.
Try-Catch Blocks
In case of a simple try-catch block as below, the MyClass instance is stack-allocated.
public long DoWork() { try { var obj = new MyClass(); Nop(); return 65 + obj.Get(); } catch (Exception) { throw; } }
However, when obj is used in the catch block, it is heap allocated:
public long DoWork() { var obj = new MyClass(); try { Nop(); return 65 + obj.Get(); } catch (Exception) { // 👇 Heap allocated in catch block. return obj.Get(); } }
Lock Statement
Locking an object prevents it from being stack-allocated.
public long DoWork() { var obj = new MyClass(); // 👇 Heap allocated when locked. lock (obj) { Nop(); return 65 + obj.Get(); } }
However, when locking other objects (e.g. a Lock object) had no impact on the instance of MyClass to be stack-allocated.
Constructor
The constructor of stack-allocated objects is still executed. The following type with a sleep in the constructor gets stack-allocated, but even in the stack-allocated case it still executes and waits for 15 ms. This behavior is expected, but an important note, that object stack allocation does not cut the execution of the construtor.
public class MyClass { private long _a00; public MyClass() { Thread.Sleep(15); _a00 = Stopwatch.GetTimestamp(); } public long Get() => Stopwatch.GetTimestamp() + _a00; }
Finalizer
A finalizer will prevent the object from being stack-allocated as it makes the object escape onto the finalizer queue.
public class MyClass { private long _a00; public long Get() => Stopwatch.GetTimestamp() + _a00; // 👇 Heap allocated with finalizer. ~MyClass() => _a00 = 0; }
Disposable
Disposable objects can be stack-allocated, regardless of whether they are disposed of or not, when used with a using statement.
public class MyClass : IDisposable { private long _a00; public void Dispose() =>_a00 = 0; public long Get() => Stopwatch.GetTimestamp() + _a00; }
With Measurement Optimized
One interesting behavior change happens when increasing the loop count in the Measurement() method from 100 to 100_000 or higher. This will trigger Tier1 optimization for the Measurement() method. During this optimization, the DoWork() method may get inlined, while the stack allocation optimization will no longer be applied. This results in an interesting situation where a 'non-allocating' method (due to optimizations) becomes allocating again.
Conclusion
Object Stack Allocation in .NET 10 is a powerful optimization feature that can reduce GC pressure by allocating certain objects on the stack rather than the heap. Through the experiments, I've identified several key characteristics and limitations of this feature.
While this feature shows great promise for performance optimization, it's important to note that these findings represent the current state in .NET 10 and may evolve in future releases. Developers should avoid making strong assumptions about exactly what will be stack-allocated, as the JIT's decisions may vary based on multiple factors including the runtime version, platform, and broader optimization context.