More number of transactions take more time. Quite logical.
Hmmm ... that's not clear to me.
If you feed 1 byte, 50 bytes, or 5,000 bytes of data to a hash function, does it take more clock cycles to calculate the hash?
The merkel tree is a single hash value of all transactions in a block - Correct?
No.
With a merkle tree each transaction is individually hashed...
Then pairs of hashes are hashed to get to the next level of the tree...
Then pairs of those hashes are hashed to get to the next level of the tree...
Then pairs of those hashes are hashed to get to the next level of the tree...
This process continues until there in only 1 pair of hashes. That pair is hashed together to get the root.
Does the time to calculate the merkel tree increase with larger argument data (more transactions)?
Yes, but hashing is VERY fast. It is true that additional transactions will increase the amount of hashing that must be done, but compared to the proof-of-work it is rather a very small amount of time.
It might take time to gather more transactions together to feed as arguments into the hash function,
Correct.
but that process has to run regardless, so how much difference can it make to gather 5 transactions versus 20 transaction?
That depends on how many transactions you have waiting around in your mempool.
Additionally, you need to spend time verifying the block that you just received from the network and removing all those transactions from your mempool so that you don't accidentally include a confirmed (or invalid) transaction in your block. As such, it can be efficient to build an immediate block with no transactions, and get your ASIC started on that. Then while your ASIC is busy you can do all the verifying and transaction manipulation to prepare another block header for hashing.
I'm not sure but I think we're talking some where between a number of clock cycles or milliseconds where determining the nonce (hence solving the block) is a question of minutes.
If you are just talking about choosing from an already validated transaction set and building the block header, then yes the time to build the block header is generally many orders of magnitude faster than the proof-of-work (however, it is possible to get lucky and complete your proof-of-work on your very first hash. You don't know whether or not that will be the case while you are building the header).
If the number of transaction was a significant delay (causing miners to lose the race for the block) I think we would predominately see single transaction blocks. if it's not significant, I would expect to see the maximum that will fit.
We do see single transaction blocks. As I said, some miners/pools will get started on a single transaction block first, then while the ASIC is busy they will build a block header with transactions. If that first block gets lucky and is solved before the bigger block, then it gets broadcast.