The problem is you won't find a design house willing to work that way. They have their tool sets and their work flows and they aren't going to diverge from it. So you will need to buy your own set of design tools and find a team of borderline Asperger's cases to do the transistor design.
This is where I disagree, people are just looking for a wrong kind of design house. They need to look for designers experienced and interested in mixed-signal and power-electronics designs. This is significantly different than the predominant industry practice in digital logic design.
Technically there are 3 main points where Bitcoin mining chip differs from the typical modern digital IC:
1) SHA-256D is practically a fully self-testing circuit
2) SHA-256D has very high signal toggle rate (0.5, only 3dB below the theoretical maximum of 1.0 for a ring oscillator)
3) there are practically no external design requirements (like timing closure) the chip is 100% limited to either thermal/power (when over-clocking) or self-switching-noise (when under-volting)
The standard tool chains used in digital logic design fail to produce efficient designs with the above requirements:
1) heuristic layout optimization algorithms fail to converge on a design where each bit of the output depends on each bit of input, so the designers force round unrolling to achieve convergence
2) the methodology is mostly designed for timing-closure or test-driven-design, when this is completely not an problem here
3) the approximations made by the toolchains are very inaccurate in the interesting problem space (very high toggle rate and no timing demand whatsoever)
The end result is that standard tools produce designs that are way too conservative in terms of individual reliability of gates and flip-flops: they are way too reliable at the local-logic level and then trade this off for noise tolerance on the very long and high fan-out interconnections.
Somebody should really fund a Professor to do the design work under an open hardware license. One the transistor design for SHA256 is done you just have to bring that into the fab's design tools and optimize for placement. Conductor losses are becoming dominant at these process nodes so that is where the biggest optimizations will be found.
Well, I haven't spoken with a Professor recently, but I did in the past. From that experience I can surmise that work on a Bitcoin miner could be a career-limiting move for a scientist in the current prevailing climate at the engineering schools.
If you are going to ask around here are the two good questions to ask:
1) why nobody considers plain old serial adders/subtractors for the most common operation in SHA-256D? One can add two 32 bit numbers with a few XOR gates and a D flip-flop with absolute minimum of power spent. It will just take 32 clocks, but who cares? Why such an obsession with parallel adders and complex carry-look-ahead logic when there's no real timing constraint?
2) why nobody considers the old trick of using differential logic (like ECL vs TTL) when the signal toggle rate is at 50%? The complementary logic gives great power saving provided that the toggle rate is much less, the closer to zero the better. This is of
no benefit here whatsoever.
So if you guys are looking for either commercial design houses or semiconductor design professors avoid the mainstream. In addition to the two categories above I could also suggest asking about past experience with GaAs or other exotic processes, which exercised less explored corners of the design mind-scape.
Remember, BitFury's original design may have been done on a kitchen table or in a garage, but they did not unroll and decisively beat all the experienced CAD-monkeys despite using much older fabrication process.