The AMD Nano/Fury line are all MUCH higher core count, faster clocks, faster memory, MUCH wider memory bus - and mine SLOWER than my 2-generation old R9 290s, which routinely see 30 MH and I've not pushed them as far as they CAN go (best figure I've seen for the Nano was 27).
There appears to be more than just pure "memory bandwidth" involved in how fast Ethereum can be mined by a given model of GPU.
Stock clocks, or overclocked?
Efficiency on the Fury/Nano is very good from everything I've seen, but still not impressive that a card with a ton more memory bandwidth and almost twice the Stream units is so poor on hashrate compared to a 2 generation old card.
Is that 32Mh at stock clocks, or overclocked?
So for the Fury/nano cards, is it better to dual mine so that the core processors can be utilised fullly?