Image displaying software isn't supposed to execute arbitrary code based on the content of a JPEG file, but it still happens sometimes.
That you aren't even acknowledging the existence of an entire category of vulnerabilities does not inspire confidence.
Do we really know sound is safe? Has anyone ever tried to crash the Linux sound drivers via malicious sounds sent to the line in port? Maybe the only reason we don't think a vulnerability exists is because until now nobody has ever had a reason to look for one. Even if the sound drivers and ALSA libs are safe, there's still the matter of hardening the decoding software.
If even a task as old and well-understood as transforming a JPEG image into a bitmap can result in arbitrary code execution you can't just assume that sound is safe without at least some kind of testing.
I'm not saying attack surface is exactly 0.00, simply that I'm not aware of any transfer method that has less linkage between the content of the data stream and what code will be executed. (and subsystems of the OS that automatically operate when the link is detected)
If you want to discuss this further, please respond to the thread I linked above. This would be a good discussion to have there.