Correct - double digit percentages aren't massive, and I don't remember saying it was. I said that a couple days worth of work meant a good 40% or more increase. While that is not massive, or near massive, it doesn't need to be - I'm mentioning it simply to prove the point.
For the purposes of this discussion, let "stock" refer to the kernels shipping with SGMiner 5 as of today. To be completely clear so there's no nitpicking, I mean commit e481d67e59ad60edc69c026617219f8fae9d6c6e published on the 19th. Now, you say massive is 300%+. Cool, let's go with that. Increases of 50% or below over stock are child's play. Assuming the dev is at least somewhat competent in GPU development, and has more than a passing familiarity with the algorithms in use, the amount of time required for that shit is measured in hours. And it's probably a low number. My estimate of the work required for a 200% to 250% increase, which I have begun and made some progress on, is maybe 1.5 weeks to 2 weeks of almost nonstop work for me - assuming I run into a lot of bugs and shit, which has been and will almost no doubt continue to be the case, lol. (In case you are wondering, no, I am not doing it nonstop - I've other shit to do.) Why so long? The code to SGMiner 5 is based on CGMiner, and while the SGMiner people aren't bad at all, the code they are working with was extremely messy and disorganized, repetitive and overcomplicated, and sadly, much of it remains as it was in CGMiner. Really, the best thing you could say about the code that remains from CG is that it works. Because of this, editing SGMiner 5 - that being required for many of the extremely effective, yet most time consuming optimizations - is not easy and is about as enjoyable as pulling out your own molars with a household pliers. In addition, for those optimizations, not only do you have to simply dump the current, as pretty much all of it will be unusable, you have to rewrite it quite carefully. For most of them, optimal or close to optimal implementations are going to require changing the very structure of the implementation, which is why the current code needs to be replaced and not edited: the current code used for every single hashing algorithm in X11 is copypasted code that was meant to be run on CPUs. GPUs are VERY different - while you can do some shit like that and get it to run (as evidenced by the current code), it won't be anywhere near optimal. Not only will you leave parts of the GPU idle, you'll be bottlenecking yourself in other places - it's just a horrid way to do it; it's not proper porting, and it's not even coding. Anyway, for further optimizations above 250%, I'm not comfortable with estimating the time required. While I strongly believe it's possible, to get an estimate that's more than a number that's mostly pulled out of my ass, I'd need to finish the current few rewrites I have in progress and then, before moving on (mainly because the details would still be fresh in my mind), fine-tune those new implementations. Then I'd re-benchmark, see where the performance fell in terms of percentage, and only then would I be able to say with any confidence some shit like, "Okay, it'd take so long to get 400% to 450%+."
Thing is, I just decided around... two weeks ago or something, "Hey, I feel like fucking around with X11 on AMD." Seeing as I didn't know what GPU code looked like 4 months ago, there obviously are loads of people out there who could hand me my ass when it comes to GPU optimizations, on AMD or otherwise, and it stands to reason that someone has done this shit already.
Sounds to me like you're looking for a new gig since the 970/980 wasn't all that it's cracked up to be...
Want me to organise you a crowdfunded project where you pimp out x11 for SGMiner and get paid for your services?