What makes you think there are no issues with the design?
I'm not saying there couldn't be be issues with KNCs 20nm design, the whole argument has been why it made sense for KNC to jump to 20nm rather than making a new fully custom 28nm design when there was no initial cost/real performance/$ gains from doing so.
Why could knc only achieve 1w/gh on 28nm where as bitfury could achieve 0.8w/gh on 55nm?
It has been speculated (confirmed?) that the Jupiter design is a hardcopy and not a full custom design (hence their fast design/delivery time and worse performance/effiency). Also the Jupiter chips are more than capable to get below 1W/GH (you can get down to about 0,7-0,8W/GH at 400-450GH) but why would you sacrifice the performance for efficiency when running them at higher speed but worse efficiency has been more profitable?
To properly compare 2 different chips you have to look at the given performance at a certain amount of chip area. Take 100mm2 of total chip area for 2 design then have a specific performance point and compare the best efficiency at that performance point, that's the only way you will get real comparison on how 2 different designs stack up to each other in terms of efficiency. Otherwise you can easily push the metrics in whatever direction you want.
So let's take a look at Bitmaintech and their S1s, they sold them at 180GH@2W/GH (a much worse chip in your world) and is clearly inferior to Bitfury by your logic. I could however take a S1 Antminer and probably get it below 0,8W/GH but you would only get around 70-80GH out of it. But by your logic I have just taken this inferior horrible inefficient chip and turned into something that is better/equal to Bitfury. Since Bitmain is on 55nm as well it would be really easy however to compare who has the lowest manufacturing cost to produce those results, so who out of Bitfury and Bitmain uses the least amount of die area (and hence lowest manufacturing costs) to produce say 100GH@0,8W/GH? Not the faintest idea, something for you to figure out I guess!
What my whole point is that the same chip can be used for a usually quite large range of performance targets depending on how you tune them.
The rule is however that as long as the designs are in the same ballpark for optimization it will take substantially more die area for a chip on a larger node to match the same performance/watt as a chip on a smaller node. Hence why saying that Bitfury 55nm is clearly better than the KNC jupiter is not as clear cut as you might think.
Unless your are selling into a specific market where performance/watt is EVERYTHING selling chips at their "best" efficiency specs doesn't make sense since the performance tradeoff to get there is to substantial. People pay for performance and the scaling of the chips will mean that as a manufacturer you are looking selling them at a performance point that makes sense and generates the most profit.