- Optimize CPU/GPU exchange
- Add missing ECC optimizations (some symmetries and endomorphism)
- Add support for GPU funnel shift that should speed up SHA (but I need to find a board with compute capability >3.5, mine is 3.0).
Did you implement already all the steps 1, 2, 3 or there is still space to further improvements?
- Support for funnel shift no yet done.
- p-iG/p+iG done.
- k.(x,y)/-k.(x,-y) done.
- Endomorphism is in progress.
- CPU/GPU exchange done but still need improvement (difficult to find good compromises with multi prefixes search)