Some people often argue they can squeeze out a few Watts decrease by finding the lowest possible PowerLimit (which is an offset applied to the TDP, so you can ignore/disable the PowerLimit offset when you have direct control of the TDP itself), although I think even if such a thing is real (but I think it's just a false perception due to the finite precision and accuracy of their measurements and tuning iterations because TDP ultimately controls [throttles or not] the core clock and very few if anything else, especially under heavy graphic or computing loads when you can't shut down computing blocks) it's not worth worrying about.
If you want the optimal settings I suggest you disable / work around the arbitrary limiters (like TDP) and:
- find the best memory clock + timing pair (not easy but the potential profit is high)
- find the lowest GPU core clock which is sufficient for that memory setting (not too hard but meaningful and thus profitable)
- try to under-volt the GPU core (I didn't manage to do that with the 480 yet, it's not the easiest part in practice)
(Keep in mind that AMD often uses hardware level dynamic voltage scaling since around Hawaii (R9 290X), so the voltage you think you modify is usually an offset for a target.)
Now you probably want to try and create another configuration which allows for lower voltage (loose timings, lower core clock, lower voltage), then compare the perf/Watt of these two configs and decide (or divine a third iteration).