I think that your square-roots have too few bits.
In your current scheme x1 has 126 bits. Then x2^2 < 2*x1, hence x2 has only 64 bits. Since you publish x2*G, the baby-step giant-step algorithm would only have 32 bits complexity. It is even worse, since you know that the highest 32 bits of fuzzvalue are probably 0. So x2 is almost trivially breakable. Even x1 can be broken with ~50 bit complexity if one has a good guess what the value is.
So I think you should make the square roots at least about 250 bits, so that using baby-step giant-step is hard enough. This means that you need more than 512 bit ECC.
Updated paper to 768 bit curve for now, which brings square roots up to x1@220 and x2@190.
Baby-step giant-step algorithm is memory intensive. Eve would be spending 2^(190/2)=2^95 on storage and also a lot of CPU for 768-bit ECC, just to see a single root (not even the full single output).
Also don't use the scheme where x2 has only has half the bits of x1.
x2 doesn't get that low generally, it retains about 87% as many bits as x1. Note, I am not rooting x1 to get x2, I am rooting |x1^2 - fuzzvalue|.
If you have multiple outputs you could distribute the squares over the outputs. You could use a single square root x_i for each output, such that the first 64 bit of x_i^2 equal the satoshi amount. Choose the lower bits of x_i at random except for the last one. Then publish the square-roots and the difference Δ = sum inputs - sum x_i^2.
In your approach you have basically removed sum of squares and settled on a single square per output and Δ per txn?
I was also thinking of a less dramatic optimisation with a dominant x1 that is reused for each output (based on log mean), then only require additional x2 and Δ per output.
BTW, how much information would leak if Δ = fuzzvalue - floor(sqrt(fuzzvalue))^2 is known (if there is only one output and the simple scheme is used)? To avoid leaking the magnitude of x, one could make sure that Δ is not the smallest remainder but equally distributed with (fuzzbits+64)/2 bits.
The bit lengths of everything will be revealed. I don't see a way to have Δ both evenly distributed and complete the exact output sum with a square x_i. If you mean to only publish one Δ for the whole transaction, then each output would be constrained to only ever spend square amounts. I guess you mean using multiple outputs to send the amount to save revealing a Δ per output?