Hrm, looks like I picked the wrong week for a heads-down sprint to make a (self-imposed) deadline. I'll keep my response short since at the moment I can't afford to get drawn into a long fascinating/distracting conversation...
About "there's no long lines" - I've already commented, but will try to draw it, where epic fail for parallel expander is exactly.... And in Spartan-6 there's difficult to pass more than 256-bit cross-section in 8 slices height long-way (there's 32 QUAD routes per each switch - so 256-bits would use QUAD routes in horizontal case for 8 slices height).
Then I guess it's a good thing my stages are
three times that height! They're tall-and-skinny (4x24 = 96 - 8 totally empty = 82 slices per single round) for a reason. Also, as you point out:
interconnect works in one direction only, so if rounds placed in smart way, you'll get more efficiency in routing resources usage ( i.e. A,B <---> C,D while A --> C and B <--- D are interconnected and placed into same regions).
Indeed. This is why I chose a ring-shaped design. The innermost 8-slice-tall tracks are moving in opposite directions, so they don't compete for QUAD lines -- which doubles my QUAD budget. I suppose you can now guess which part of each 4x24 region is the message expander
So I really respect author's work of fitting 1.5 parallel rounds into Spartan 6 - it is tough and very nice work.
Thanks. Your results are very impressive too!
I have to say, at times I find myself wishing for the reduced headaches of the the sea-of-tiny-hashers approach. But ultimately I went with an unrolled design for anti-theft reasons (I'll explain in a week) and also to let me hardwire the k-values into the adder LUTs (a three-input-plus-constant adder is a lot smaller than a four-input adder). Also, the sea-of-tiny-hashers approach yields more benefit if you're willing to do not only algorithmic placement (which both of us do) but also algorithmic routing (which I don't do). I decided to stick to heuristic routing (except for a very few cases) to preserve portability -- I have a massive pile of Virtex-II Pros that I got almost-for-free and I might be able to pick up a bunch of deeply-discounted Virtex-5's as well. Although the slice design has changed a bit during the 10 year span from v2pro to s6, if you don't hardwire the routing it's possible to have an AxB region of Virtex-II Pro "emulate" an XxY region of Spartan-6 very efficiently (although, of course, A>X and B>Y).
By the way, when I first saw your announcement, I took a look at your timing report -- 441 lines of generic boilerplate, and all but 9 lines of the actual report redacted (".... Dropped other traces report ....") and there were no carry chains on the lone path you decided to leave in the report! At that point I was pretty suspicious. On the other hand, after reading your postings, you clearly know what you're talking about -- the obnoxious "missing slices in columns 66+67" problem is something most people aren't aware of. So now I'm leaning back towards believing it. Anyways, I know you posted the redacted timing report in order to bolster your credibility, but because of the way you've edited it, it actually might have the opposite effect.