In my view it was done as a way to limit the amount of wastage as the block time increases you would see wastage drop there could also be a number of other factors that could point to why the 10 min time was included.
Such as if many nodes were to be generating the same block at the same time it could lead to more frequent forks this in turn would lower the strength of the network and push the times for conformations up.
For example if everyone is to be mining block 999,999 and miner A solves the block first and pushes it to the network all the other miners who would be at { t0 + t } t= time taken by miner A to mine and submit block 999,999.
Lets say miner B finds miner A's block at {(t0 + t)+ 1} with taking into account network latency miner B and all other miners who finish mining the same block where (t0 + t)< (t0 +t) + (delta_t)i < (t0 + t) + 1
All are successful in mining but they would have wasted there energy in doing so this shows the total wastage of the network resulting in a orphan block would need to be.
sum of { (t+ (delta_t)i ) * (hr)i } for every i
I'm sure there are other reason behind this that I have missed out on but it is an interesting topic to look over even just as a thought experiment.