Each 9P has a number of 32.75Gb transceiver GTIO. Up to 120 on the A2577 9P package. Though, the 5P B2104 would probably be more ideal for this application connecting 1 5P to 1 HMC x4....
Each HMC v1 x4 link allows (up to) 15Gb/s per pin with 64 pins being used. (480Gb/s (60GB/s) full duplex) -- There is support for x8 link in HMC v1 spec, not sure if any memory was made for x8 links though.
Each HMC v2 x4 link allows (up to) 30Gb/s per pin with 64 pins being used. (960Gb/s (120GB/s) full duplex)
The other nice thing about HMC is that their latency is closer to DDR4 than GDDR5 -- When optimally using the HMC your latency can be as low as 1ns, looking at 20ns worst case. Using a bus width of 128 bytes (cn7) you can achieve 90% bus efficiency. With the logic layer some interesting things can be done. I doubt you'll get anywhere near 90% with the 1024-bit bus width on HBM2.
Either way, not bad for $500 proto sample part (but not amazing either).
Edit:
Btw, did some reading on algebraic logic minimization last night along with a couple other techniques. This is already done, automatically, during synth (but can be turned off). Seeing the process, yes, it's something that could be added to simplify logic circuits. HOWEVER, Vivado already does it! Starting to question OP and if this bittware account is even really bittware. I might have to put my foot in my mouth in 18 days, but the more I look at it, the more I'm thinking it's not possible. Elaborate scam?