ProgPow seems a good algo to apply gcn dpp instructions. Thanks for the info. Will work on it.
Took a quick look on those two. x21s has one step of lyra2v2, which kind of suitable for applying GCN optimization. I don't like to continue working on lyra2v2 related algorithms because there is a bin file originally from lyclminer. That bin file is fully optimized. Some closed source miners simply copy that file to use it as their own. kind of like IP theft but I am not sure how illegal that is. As to x16rt, seems no lyra2 related steps. Not sure if other steps can be optimized by GCN cross lane instructions.