For example for SHA256 this is capped at ~2.1 TB.
How you got that number? For SHA-256 you have 64-bit value as message size in bits. So, for every 2^10 you can change units to kilos. That means you have 2^10 (kilo), 2^20 (mega), 2^30 (giga), 2^40 (tera), 2^50 (peta), and 2^60 (exa). One byte has eight bits, so 2^3. So 2^64 is something like 2 EiB. Maybe for 32-bit values you have too big messages, but for 64-bit values I never heard that "this file is too big to be hashed".
Another issue with this idea is that it defeats one of the main principles of PoW which is a computation that is very hard to finish but super easy to validate.
I wonder how to create difficulty for such chain. Because in Bitcoin sometimes you can see difficulty drop. In that case, if you have new hash for every block, you can just use easier target. But for single hash chain you have to always push chain forward, so always add some work to that chain. That means, even if you have two times easier difficulty, you still have to find better hash than anyone else in the world did for this whole time.
Basically, in this model, the difficulty is growing exponentially and there is no way to make it lower. You can easily check that, just by computing hashes locally. Start from regtest difficulty and then always compute better hash. If you broadcast the best hash immediately, then your difficulty is growing exponentially. The only way to raise it slowly is precomputing a lot of hashes, storing them locally, sorting, and then sharing in the right order. But then, one lucky miner can instantly raise the difficulty to insane levels and then there is no way to lower that difficulty.
If you want to decentralize things, you should do the opposite: there should be N hashes per block. Of course, such things should be done off-chain, because in other case, the blockchain would be quickly filled with rewards for miners. Ideally, hashes could be joined, but so far I have no better idea than storing N hashes per block.
Data is broken into small blocks and the hash state is updated for each block.
Exactly, so even in case of single hash per chain, it could be turned into what we have today. Just you could have IV for SHA-256 as your starting point, then first 256 bits would be single SHA-256 of the block, the second 256 bits would be (one_separator + zero_padding + message_size). So, to get any block hash, you could use the previous block hash as your IV, add current 256-bit block hash after running single SHA-256 on it, and then attach one, padding, and message size (from that field you can even get block number).
Most likely, all the blocks would just be kept inside a vector instead of a linked list.
Yes, because internally, SHA-256 is just processed as a vector of 512-bit chunks, so miners will take advantage of that.