Author

Topic: I need help understanding secp256k1_scalar_check_overflow (from secp256k1 lib.) (Read 177 times)

staff
Activity: 4284
Merit: 8808
Sounds like you have it figured out.  I thought I would point out:  It's implemented this way instead of a more obvious way so that the comparison gets compiled to constant time code.  It's also faster than the most simple non-shortcutting comparison.

If constant time wasn't needed the more obvious approach would be faster.  (though that isn't always true-- sometimes constant time code is just faster.  In this case, though only one in 2^64 inputs would need to compare more than the first word in vartime code.)
legendary
Activity: 1456
Merit: 1175
Always remember the cause!
Note that the last 4 words of the group order (SECP256K1 constant) are equal to 0xFFFFFFFF.
RED
Foolishly set yes to 0, again!
Actually only the last 3 are max, N4 is 0xFFFFFFFE.
Ah, my bad. You are right, and it is why the red part is not foolish at all and is absolutely necessary.
legendary
Activity: 1042
Merit: 2805
Bitcoin and C♯ Enthusiast
Note that the last 4 words of the group order (SECP256K1 constant) are equal to 0xFFFFFFFF.
RED
Foolishly set yes to 0, again!
Actually only the last 3 are max, N4 is 0xFFFFFFFE.
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
RED
Foolishly set yes to 0, again!

It's not entirely foolish, for the case of non-overflow (no=1) this tests whether the 4th leg (assume we count from 0 so that d[0] is the 0th leg) is bigger than it's limit.

Inside the processor, there is a pipeline that executes each stage of an instruction, say CMP, logical AND, or logical OR. The pipeline is broken down to a certain number of stages so this gets parallelized to some extent by each of these instructions in the function preoccupying one of the stages. It's sort of like this:

Code:
              AND

---------> Step 1 ------->        ---------> Step 3 ------->

               OR

---------> Step 1 ------->        ---------> Step 3 ------->

              CMP

---------> Step 1 -------> Step 2 ---------> Step 3 -------> Step 4

              Assignment (=)

/* Big latency if variable is in a memory location, could be 1 step if it's on a register and a couple more if it's in the cache */
 
* These are just approximations, the actual number of stages could be completely different.


The same circuitry is used for each step number, that means in the whole processor thread, you can only have one instruction occupying step 1 circuits at a time, same for steps 2 3 and 4, and some of these instructions are so fast they skip some of the pipeline steps and its circuits, which means that if you arrange the instructions in such a way that you put the faster ones before the slower ones, they don't have to wait for the slow instruction to finish going through the pipeline and that makes the whole function faster.

E.g.

Code:
int f(int a, int b) {
  // some fast instruction using a and b
  // some more fast instructions using a and b
  // some assignment to memory, cache or a function call
}

Arranging the functions in this way prevents it from stalling on assignments and calls. Inserting lines full of logical ops and an assignment makes the logical statements run faster.
legendary
Activity: 1456
Merit: 1175
Always remember the cause!
I understand what it does and why, I just don't get how it does it.

SECP256K1_INLINE static int secp256k1_scalar_check_overflow(const secp256k1_scalar *a) {
    int yes = 0;
    int no = 0;

   
    no |= (a->d[7] < SECP256K1_N_7); /* No need for a > check. */
    no |= (a->d[6] < SECP256K1_N_6); /* No need for a > check. */
    no |= (a->d[5] < SECP256K1_N_5); /* No need for a > check. */
    no |= (a->d[4] < SECP256K1_N_4);
   

 
    yes |= (a->d[4] > SECP256K1_N_4) & ~no;


   
    no |= (a->d[3] < SECP256K1_N_3) & ~yes;
    yes |= (a->d[3] > SECP256K1_N_3) & ~no;
 
   no |= (a->d[2] < SECP256K1_N_2) & ~yes;
    yes |= (a->d[2] > SECP256K1_N_2) & ~no;

    no |= (a->d[1] < SECP256K1_N_1) & ~yes;
    yes |= (a->d[1] > SECP256K1_N_1) & ~no;



    yes |= (a->d[0] >= SECP256K1_N_0) & ~no;


    return yes;
}

Link

BLUE
Is any of the last 4 words (each being 32 bits unsigned int) of the input less than 0xFFFFFFFF? If true, no variable is set. Note that the last 4 words of the group order (SECP256K1 constant) are equal to 0xFFFFFFFF.
Edit:
Actually the fourth word is 0xFFFFFFFE

RED
Foolishly set yes to 0, again!
EDIT: Initialize yes to show if there is a chance for overflow.


BROWN
For the first half of the input (again 4 words each being a 32 bits int) except for the first word, apply the following check, starting with the last word:
Check how this word is compared to its respective word in the group constant and set the variables no and yes properly such that no implies that there is a chance for the input not being greater than the group order and yes implies that there is a chance for the input to be an overflow.

GREEN
For the first and least significant word, check whether it overflows compare to the least significant word of the group order and either yes is set or no is not set already.

Logic: For input to overflow, at least one word needs to exceed a corresponding word of SECP256K1 constant group order while the more significant words leave a chance for overflow, e.g. no one of such words are less than their counterpart.



legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
It appears that it represents a 256-bit integer as 8 limbs of 32-bit numbers.

The first three comparisons will toggle no overflow if the largest three limbs are not 0xFFFFFFFF, since it's the maximum representable value.

Then each "yes" line signals overflow IF a limb is greater than it's respective maximum secp256k1 value. The & ~no part on the "yes" lines basically translates to "if no overflow was signaled AND whatever's on the left evaluates to true" and stops overflow from being triggered if one of the middle or lower legs happen to exceed their limit but the upper legs are within range.

Similar logic is used for & ~yes on the "no" lines. It prevents a real overflow from being cleared if one of the middle or lower legs are within range of the higher legs are over their limit.

So at each stage it's either going to be yes=1 no=0 or yes=0 no=1, it's just an optimization that takes advantage of how x86 executes instructions. And this state can only flip on the yes = lines because no=1 (such as initial legs being 0xFFFFFFFF) passed through & ~no makes 0 and only allows for yes=0. When yes=0 is plugged in to this we get a 1 on the right which allows the left condition to have an effect.

Since the |= on each line means yes and no values can only go from 0 to 1, and not back, that's why we return the "yes" value instead of the "no" value since it may not be accurate.

The >= at the last line has the side effect of detecting the secp256k1 group order as an overflow (so that numbers must be less than it).
legendary
Activity: 1042
Merit: 2805
Bitcoin and C♯ Enthusiast
I understand what it does and why, I just don't get how it does it.
Code:
SECP256K1_INLINE static int secp256k1_scalar_check_overflow(const secp256k1_scalar *a) {
    int yes = 0;
    int no = 0;
    no |= (a->d[7] < SECP256K1_N_7); /* No need for a > check. */
    no |= (a->d[6] < SECP256K1_N_6); /* No need for a > check. */
    no |= (a->d[5] < SECP256K1_N_5); /* No need for a > check. */
    no |= (a->d[4] < SECP256K1_N_4);
    yes |= (a->d[4] > SECP256K1_N_4) & ~no;
    no |= (a->d[3] < SECP256K1_N_3) & ~yes;
    yes |= (a->d[3] > SECP256K1_N_3) & ~no;
    no |= (a->d[2] < SECP256K1_N_2) & ~yes;
    yes |= (a->d[2] > SECP256K1_N_2) & ~no;
    no |= (a->d[1] < SECP256K1_N_1) & ~yes;
    yes |= (a->d[1] > SECP256K1_N_1) & ~no;
    yes |= (a->d[0] >= SECP256K1_N_0) & ~no;
    return yes;
}
Link
Jump to: