@voo: A correct implementation of double-checked locking requires you to use memory fences to ensure that the processor actually flushes its write queue out, and that the caches are kept properly coherent. If you do it properly, the result is actually about the same code as you would execute if you just used a critical section.
www.aristeia.com/.../DDJ_Jul_Aug_2004_revised.pdf
See also Herb Sutter's presentation to the Northwest C++ Users' Group from 2007: www.youtube.com/watch titled "Machine Architecture: Things Your Programming Language Never Told You".
You might get away with it on x86, which has a strong memory model, but ARM has a *weak* memory model and you may well run into trouble. What do I mean by Memory Model? See blogs.msdn.com/.../51445.aspx .
.NET's memory model does, I believe, ensure that double-checked locking implemented naively by the programmer does run correctly on the hardware. stackoverflow.com/.../double-checked-locking-in-net .