@Gabe:
This is incorrect.
The uniprocessor kernel (which doesn't exist anymore, by the way), has the same implementation of Enter/LeaveCriticalSection. Its implementation of spinlocks, however, only does set IRQL, and doesn't bother with setting/clearing the spinlock variable. In the big picture, this "optimization" is not worth it, because an interlocked operation on single-proc Pentium and later is as fast as non-locked, and only has a trivial cost.