Do I need to initialize GDS before actually using it?
These instructions are documented nowhere.
DS_CONSUME
DS_APPEND
DS_ORDERED_COUNT
nerdralph, do you have any ideas?
I suspect the driver initializes M0 when gds_segment_byte_size is set in the kernel configuration. If you look in the GCN ISA docs, it says M0 has 16 bits for offset and 16 bits for size. M0 is also used for LDS, so when you use both in your code you'll need to save it to another register.
I hadn't looked at the DS_ instructions you refer to, and a quick look at the ISA confirms your observation about them having no documentation. The llvm source would at least have the instruction encoding.
I'm not sure why you want to use those instructions though. For the global row counters I'd use ds_add_u32 with the GDS bit set.
p.s. the M0 description is in s. 3.7 of the GCN ISA docs.