Here's where I begin to question. Modularizability sounds great and all, but it's not really the "game-changer" that you make it out to be. As far as I can tell, the BFL singles (only picture of an ASIC device we have currently) is essentially a box around a heatsink and a PCB+chips. In essence, it is almost as bare minimum as you can get. Aside from being able to buy some sort of massive copper-block with slots to insert boards into, you're not really going to get much more efficiency from some sort of modular design (and that's not very efficient really).
http://en.wikipedia.org/wiki/PCI_Express#PCI_Express_Mini_Card
Consider for example the BFL Single. Like you said, it is a square box with a PCB board with a heatsink. Imagine if you only had to produce 1 motherboard and had vertical mounts (like the old BTCFPGA design). This would make it a bit cheaper to produce quantities of add-on boards.
With fabs, with quantity there are discounts. There are also less processes to go through to put one together than say a BFL single which should have a higher overhead because it brings lots of components like the case and power supply.. The metal box is one piece and it has to be created n number of times how many boxes you want.
With add-on daughter card(s) it should be simpler, cheaper and easier to populate a rig with identical daughter cards (as needed).
If I am not mistaken, the BFL mini-rigs are modules but they are separate by connective cabling. They have to be professionally assembled as opposed to an end-user just popping the case open and adding more processing power into one of many east to install slots.
As to the cryptocurrencies, there are already merged mining pools, so all the bitcoin forks do not need to covered by vendors, and the non doubleSHA256s can't work alongside bitcoin for ASICs as needs no explanation.
This has been discussed before. Using PCI instead of USB would make the boards more complicated and harder to design.