Yes you can. If each thread is working on a different hash.
example
4 threads 4 hashes
HASH1: x1->x2->x3->
HASH2: x4->x5->x6->
HASH3: x7->x8->x9->
HASH4: x10->x11
Swap the 4 hashes
HASH4: x1->x2->x3->
HASH1: x4->x5->x6->
HASH2: x7->x8->x9->
HASH3: x10->x11
Swap the 4 hashes
HASH3: x1->x2->x3->
HASH4: x4->x5->x6->
HASH1: x7->x8->x9->
HASH2: x10->x11
Swap the 4 hashes
HASH2: x1->x2->x3->
HASH3: x4->x5->x6->
HASH4: x7->x8->x9->
HASH1: x10->x11
Complete
What if you have 4 gpu's in your rig and each thread is executed on a seperate gpu. x11 is then reduced to x2+.
advantages:
-Smaller kernals, bether register usage, less memory needed, more cache hits, more paralell threads
-Hybrid mining is possible. (run AES algos on the AMD, and the rest on NVIDIA)
disadvangtages:
-throughput must be passed from gpu to gpu trough the pci-E to memory and back.
-You need 4 gpu's (but the algorithm can be scalable to support x gpu's)