The fun part is, you may get it from mining rigs. With opencl and cuda, the sky is the limit.
Unfortunately it does depend on your particular application, you would have to test them by yourself. But don't disregard those gaming gpus, especially the new ones that appear more general scientific purposed than 3d graphics; still those older mining rigs people sometimes get rid of for cheap, could give you a surprise...
I guess the fun part is spreading your workload in the smaller data chunks for the things to actually work...
I've considered it, but problem is training models require at least 16GB cards. Jukebox for example, only works with a 16GB card at least the 5B model. And older cards like the k80 are much slower compared to the modern ones.
Is there another setup that is comparable to a 4XV100 but at fraction the cost? I'd prefer not to pay $10,000s if possible, like $50,000 is too much.