My questions:
1. How does a miner chooses which transactions of the available 100 to include, is this done by bitcoind, and configured (minimum fee?) in bitcoin.config and/or command line options ?
With solo mining, the miner would typically allow bitcoind to select the transactions and build the block for them using whatever default rules the developers of bitcoind have chosen. For mining pools, the pool operator chooses the transactions. They may do this with bitcoind, or they may create their own custom software for the purpose.
2. Transactions are put in a block. Let's assume M1 is mining B1 and M2 is mining B2. Is there a mechanism that prevents T1 to be in both B1 and B2, or is it allowed for T1 to be in both B1 and B2 ?
While they are attempting to mine the block, each miner can be mining a block that has the same transactions as the blocks that other miners are attempting to mine. Once one miner successfully solves the block and broadcasts the solution, all other miners will have to abandon the block they were attempting to mine as soon as they hear about the solved block. They will create a new block to mine that does not include any of the transactions that are in the solved block they just received.
3. What happens to M2/B2 if M1/B1 finds the golden nonce ? Does M2 have to create a new block which excludes all the transactions validated in B1, or is it ok to have T1 in both B1 (that has just been validated) and B2, which is still to be validated ?
Unless they were attempting to attack the network, M2 would abandon B2 and create a new block which excludes all the transactions validated in B1.
4. If 3 is OK, where does B2 fit in the blockchain once it's validated because B2 has a reference to the previous block in the chain and B1 now sneaked in.
If M2 continued to work on B2 after hearing about B1, and M2 managed to solve B2 soon enough after hearing about B1 that they were able to send the block to some miners that hadn't heard about B1 yet, then the blockchain would temporarily fork. Some miners would be working on a block that builds on top of B2 while others would be building on top of B1. Whichever portion of the network solved their block next would broadcast the solved block and that chain would become the "longest". Every node that was working on the other fork and hears about the new block would abandon their fork and the losing block would be orphaned.
If the entire network had already heard about B1 by the time M2 solved B2, then all nodes would simply refuse to relay or build on B2. M2 would have to work on the next block in the chain all by themselves and would have to find the solution to the next block before anyone else in the network solves a block that builds on B1.
5. If 3 is not OK, is all the work that M2 has done on B2 a waste because it has to create a new block and start finding the golden nonce on the new block ?
Every time a miner computes a hash, they either succeed or they have to modify the block header (change the nonce) and try again. Modifying the entire block (due to a solved block being broadcast) isn't significantly worse than modifying the nonce. Either way the miner has a new/different block header that they can hash to see if they find a solution.