I believe it's in the OP.
And ain't a whole lot to do to it, either. You may be able to get clever with Pluck itself, but most of it is SHA256. And Reorder did a nice job on that SHA256 implementation djm34 used.
I'm guessing that Salsa could use major work, but it's already a pretty quick operation, and it's done only twice per main loop iteration... unless you free up loads of registers or something, I don't see it helping.
As for loop unrolling tricks, you've got three looping structures in the whole damned thing. The first contains the other two, is quite complex, and iterates 4,093 times - unrolling it by any factor at all would be a disaster - leaving the other two.