I was leaning towards recommending using HMAC-SHA512 since its already required for BIP32.
I'd generally recommend against non-deterministic signatures. If the signatures are non-deterministic it is impossible for someone to verify that the implementation is not using the R value as a side channel to leak the private keys.
In open source pure software implementations it easy to be relatively confident that an implementation isn't cryptographically encoding the private key in the choice of R value (via, e.g. incrementing K until an R that leaks a non-deterministic part of the master private key), but in a hardware wallet implementation this is impossible, and it is trivial to construct a malicious implementation that leaks the private key via the R value in just a few signatures.
I actually have two implementations of example malicious signers: One produces non-deterministic signatures and leaks a 256 bit private key, to the holder of a specific public key and no one else, in ~33 signatures with very high probability (failure rate of 1 in 1000 for 33 signatures, around 1 in a million for 34). The other produces a seemingly RFC 6979 like deterministic signatures and with a single signature leaks the current private key, and with 16 signatures leaks an additional 256 bit secret (e.g. a master private key, with a failure rate of around 1:1000 for 16 signatures, ~1:1e6 for 17 signatures).
Both work by performing an extra point multiply to gain an ECDH shared secret between the attacker and the user's key.
In the first case it then searches for a K value where H(secret||R)'s least significant bits match the data being leaked. The leaked data is selected based using the data being signed to drive a fountain code over the private data.
In the second, the ECDH shared secret replaces the secret key in the RFC6979 K value selection (this is especially diabolical because the implementation with openssl looks fairly benign as its just point multiplying the secret by a constant), and appeneding 16 bits of (again) message digest selected secret data (which just looks like more 'salt') this time just a index into 65535 16 bit words from a 16 bit RS code expansion of the private key. The attacker computes the shared secret and then searches for the 16 bit value that gives him the same R. He then knows K and can recover the current key and has learned 16 bits of secret data. The RS code can be precomputed and passed off as just storage redundancy for the master key.
Because tractability in hardware devices is already weak, it would sure be better if the device could be put in a mode which would make its behavior completely reproducible externally. If the security assumptions underlying the SHA2 based derandomized DSA do not hold, then it is almost certain that SHA2 using ECDSA will also not hold. Whatever version you implement, I hope there will be a way for someone with the device to verify that it's doing what its supposted to be doing.