-snip-
You need increase -w parameter and set -htsz dependency of -w
The main task is to fill the GPU memory as much as possible with the help of the -w parameter(and -htsz)
And then fill free GPU memory with the -p parameter -b (not more then x2 of SM) and -t (not more then 512)
Presettings say that for your RTX 3090 good config is:
-t 512 -b 328 -p 530 -w 31 -htsz 29
you fill 20436.750 MB from free 20450.000 but you need around 58GB of host RAM to generate all arrays.
with saved arrays you need much less memory to launch app.
In this case you perfomance will be around 2^62 per card.