Well, you're running with more threads than I was able to test (highest I got to was 8 on my core i7-950), so I'm not entirely sure why this is, though it could be that all threads are trying to access a single variable which is determining how long to let the sieve be woven for. I suppose that all these threads could end up blocking each other and cause a significant portion of idle time. It would be far more effective to have a sieve weaving time variable for each individual boost thread, but I'm not sure how to do this as I am entirely unfamiliar with the Boost library (I'm not a c++ programmer )
Actually with my FX 8350 I get just a bit less speed with your code than the official 0.11, and with my sempron 145 I get around 20% less with your code. Maybe it doesn't work well for AMD architectures?.