what would happen if merkle trees didn't included in the blockchain
Then, instead of a single 256-bit number, you would have only the transaction counter in the block message, and all transactions after that.
SPV mechanism would be more difficult and less efficient, since it needs to download whole block which contain relevant TX.
If you use a hash function like SHA-256, which is based on
Merkle–Damgård construction, then things can be optimized, and then done in SPV way. For example: you can give someone the initialization vector, just before the latest SHA-256 internal block, and the last 512-bit chunk. It would be sufficient to verify, that the hash of the whole block was mined correctly.
The same about proving transaction inclusion: even if you hash all data, as a single SHA-256 call, then still: you can prove, that a chunk "X" is included, by giving the initialization vector, some chunks in-between, and a final SHA-256 result, for that specific chunk.
So, SPV would be of course harder, but still possible, because SHA-256 splits data into 512-bit chunks by definition.