Now metal lines are currently, in state-of-the-art technologies, the biggest factor for delay.
The term "delay" here on earth has two components, R and C. We consider the metal line to contribute some capacitance C along with the more significant input C of the next gate, and we consider R to be 1/gm of the device driving said capacitance. I'll ignore metal resistance as your gm is going to be low so it will far outweigh Rmetal. Now replacing an active PMOS device (that has transconductance) with a dummy load (which does not, but does have capacitance) is going to increase your RC time constant if I use earth math. GM down = R up. Rbigger*Cbigger = Tmuchbigger.
Trying to toggle around Vth is just reducing swing, so it seems your plan may be to use slow devices but swing in a narrower range hoping for a speed boost? As NotFuzzyWarm said, it will be a disaster over process and temperature corners - what you are doing is reducing noise margin, which may work in a
simulation of a single flop but will be a catastrophic failure when you get some noise in the system.
Also, WTF? How are you going to keep the low end of the swing from going far below Vth with only NMOS? The only answer is V=I*R, meaning that you draw some current though a low gm device so the total voltage when the pulldown is on is just slightly less than Vth. That completely sucks! You need high gm for speed, but that would mean you burn a shitload of DC current - just like 1971's depletion loaded common source logic as I mentioned before. You cannot get a small swing on the low side without burning significant current or having low enough gm that the speed is crap. This blows both your J/GH claim and your speed claim at the same time. Your engineer should know better, maybe it's the marketing guys that are being taken for a ride here....? Sorry to call you out bro, I'm guilty of the same from time to time lol.
OK, brass tacks time. Try this. Put 100k of your flops in a ripple counter configuration. Tie each one's power supply and ground pins to the next with 20 milliOhm to simulate resistive drop in your metal lines. I'm generously estimating your power rail metal as a big fat 2um run in a 20-50 nm process. I'm OK with you tapping in higher metal every few thousand gates, but you can't cheat on via resistance. 50 Ohm/via at least.
Now take your 100k ripple counter and clock it at 1Ghz. Take it's output and build yourself a simple compare - Xor is fine, but you've got to remember the supply resistances in this stage as well. Compare the 1GHz counter with the output of a second counter running at a non-evenly divisible frequency, say 77MHz. Then clock this compare result into one final register at 1GHz as well, and for extra points you'll want to buffer the clocks between the two counters - your real chip would be a big ass tree, so no ideal wires in the clock lines.
Tell me how many times you get a false match running at ~300mV noise margin. The 2^N transitions on your fast ripple counter are going to demolish your rail, easily dropping 150-200mV and completely corrupting any noise margin you think you have, even at this tiny scale. Then re-sim at 125C.
I'd love to see your sim traces if you don't mind posting them here, or show me a spectre run on youtube.
And post the patents. Please realize I've backed off the pitchfork a bit here to help you show yourself why your pitch is not feasible. Call the sand hill gang if you need VC money to waste, but don't try to take $100 bucks each from a bunch of people who can't properly dispute your bs. We're used to scammers sniffing around looking for a money grab, I unfortunately see no evidence that your are not the same in a suit.