You might have had the anti boxing and unboxing rhetoric pounded into your head.
In the .NET world they’re often mentioned as an unfortunate side-effect of backwards compatibility. A reminder of life before generics…
…before Linq. *shudder*
Let’s take a look at 7 of the reasons why programmers beat up on boxing.
1. Boxed values take up more memory.
A boxed value resides in the heap. That means that we need a pointer (32b or 64b) from the stack to our reference-type in the heap as a well as a sync block index(32b).
This means a boxed int32 now takes up between 92b or 128b.
3-4 times the space!
Ouch!
2. Boxed values require an additional read
Values on the stack are right there. Stick and move, stick and move!
To fetch a boxed value you must first get the pointer, then look up the object.
This means boxed values are slower, in addition to being larger.
Stick and move, stick and move!
3. Short-lived values clog the heap
When an item is popped of the stack. It’s gone. Gone Daddy Gone.
In contrast, unused boxed values pile up in the heap until the garbage collector decides to do something about it.
4. Boxing and unboxing operations takes time/cpu
Boxing requires allocating space in the heap and copying the value from the stack.
Unboxing is cheaper since you just need to get the address of the fields inside the boxed instance and you can skip the allocation, but you usually end up copying the value data from the heap back to the stack if you want to use it.
According to MSDN: “[Boxing] can take up to 20 times longer than a simple reference assignment. When unboxing, the casting process can take four times as long as an assignment.”
5. Casting
Casting isn’t free, but it’s generally considered to be in the “Don’t worry about it” category of performance hits. Use a profiler people!
The real problem with casting is that you get no compile type safety checks. Check ahead or be smote by InvalidCastExceptions.
6. Implicit Boxing
Okay, so boxing/unboxing is big, slow, and ugly…but it’s also sneaky!
Consider the following code:
1 2 3 4 5 |
var collection = new ArrayList(); for(var i = 99; i > 0; i--) { collection.Add(i); } |
It looks innocuous enough, but that “Add” function ends up performing 99 box operations.
Here’s the relevant IL:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
IL_0000: newobj instance void [mscorlib]System.Collections.ArrayList::.ctor() IL_0005: stloc.0 IL_0006: ldc.i4.s 99 IL_0008: stloc.1 IL_0009: br.s IL_001c IL_000b: ldloc.0 IL_000c: ldloc.1 IL_000d: box [mscorlib]System.Int32 IL_0012: callvirt instance int32 [mscorlib]System.Collections.ArrayList::Add(object) IL_0017: pop IL_0018: ldloc.1 IL_0019: ldc.i4.1 IL_001a: sub IL_001b: stloc.1 IL_001c: ldloc.1 IL_001d: ldc.i4.0 IL_001e: bgt.s IL_000b |
This is one of the reasons why using an ArrayList will get your wrist slapped in a code review.
7. They’re (almost) unnecessary!
Most discussions on boxing/unboxing in .Net focus on old skool data structures like ArrayList and HashTable.
These objects were the de facto (and de jour!) collections before .NET 2 came along and saved us all with generic collections like List and Dictionary.
And it was good!
Straight from the horses msdn: “Generics allow you to define type-safe classes without compromising type safety, performance, or productivity.”
We get all the benefits of the ArrayList and HashTable collections without having to box or unbox.
if you’re living in a post 1.1 world then there just isn’t much reason to put up with boxing and unboxing anymore
But about that almost…
In Defense of Boxing and Unboxing
7 deadly sins aside, there are some good reasons that boxing/unboxing are still around.
This is what we came up with, with a little hint-tweet from @jonskeet.
1. Legacy Code
Pre .NET 2.0 you’re stuck with ArrayList and HashTables, unless you want to roll/download something custom. Box away.
2. 3rd Party Library
If their function takes an object, you end up passing a reference type. No use sulking about it.
3. .NET internals
.Net notably makes use of boxing and unboxing with the dynamic keyword, and reflection would be pretty tough without boxing. 🙂
That said, these things are typically used for productivity over performance.
4. Mixed value/refernces type Collections
The Console.WriteLine overload that takes a string and Object params is a great example. The params let you pass an arbitrary number of arguments that are used to populate values in your string.
These params could be of value or reference type, so you’re stuck with their common ancestor System.Object which ends up boxing the value types.
However, I have yet to see an application whose biggest performance bottleneck is writing to stdout.
Conclusion
M$FT did a great job designing C#, and they’ve done an even better job maintaining it. Between the generic collections added in .NET 2 and the ToString() trick I have a hard time getting behind this big, slow, and ugly contender but it still has it’s places.