With (fully custom) ASICs, however, you can just match your exact routing needs with wires, which should take care of the routing problems.
I'm certainly not an expert on that area, but I'd expect the overhead of intermediate result storage (in a rolled design) to outweigh the routing overhead (in an unrolled deisn).
As I stated above already a rolled design might still be useful to increase yield by containing defects into smaller functional units.
Thats only if you get real ASIC. SASIC still screws you the same way since its just a hardwired version of the FPGA.
I'm definitely not an ASIC developer, so correct me if I'm wrong here. From the small number of simple designs I've done and layed out in Cadence, routing in an ASIC is definitely not free, especially if you're really trying to push the boundaries of all your well, gate, pad, etc keepouts to maximize density on the silicon. If you're not careful with your planning and design your number of metal layers can jump way up which definitely adds to the cost of the design even outside of the possible performance penalties from haphazard routing. I've only ever done work at 90nm and above so I don't know how difficult the routing would be at 45nm or whatever a BTC ASIC would end up getting designed at, but small rolled cores might be more effective in an ASIC as well as in an FPGA. Someone would actually have to look into it to know. Maybe Vladimir could shed some light onto the subject.
ASIC gives you maximum flexibility in the design. The biggest problem with FPGAs is the fact FPGA designs must use the DSP blocks and the BRAM for storage of the constants. Routing is still a problem on ASICs but _you_ design the routing. You no longer have to worry about routing around things on FPGAs, and you no longer have to worry about paying for hardware you'll never use (like, for example, that high speed serial IO fabric isn't cheap, or is the onboard Ethernet controller and such).
ASIC has a huge upfront design cost, but if we could sell 250k ASICs (or, approximately more chips than all the FPGAs currently in use for mining put together) it would be cheaper per mhash over the next 10 years by an order of magnitude.