4 Seconds? Is that needed for finding a block each time?
That's how long it takes to search for staking opportunities each 16 seconds. Most times you check you find that there isn't any such opportunity. There's only one per minute globally, so even with 100% of the staking weight you'd have around a 1 in 4 chance of any particular search being successful. When we do find a block, it will be at a random point through that 4 second search, and so on average a successful search takes 2 seconds ((0 + 4) / 2).
Given a difficulty leading to 16 seconds these 4 seconds would be huge. I mean the difference to orphan blocks or get block orphaned. Or isnt it needed to calculate a block?
I don't think you're understanding still. "Difficulty leading to 16 seconds" isn't what's happening. The difficulty adjusts how hard it is to stake a block. It adjusts such that we find around one block per minute. But blocks can only be found when the time (in seconds since some date in 1970) is a multiple of 16. That only happens every 16 seconds. That's fixed by the protocol (until the developers change the protocol again, of course), and isn't related to the difficulty.
So the 4 seconds are real and you meant it goes through each of that outputs (shouldnt it be inputs as long as they arent sent out?) and checks if it finds a hash? Does the output amount matter here? I mean you described rounding the amount of clams down to an integer. Does this apply to the address these outputs are on, so a big amount of clams or only to the single output? If the latter then one could get an advantage by sending the clams in amounts of 1 to a new address. The chance to find a block would be maximized?
It's a real 4 seconds. 4 seconds out of every 16 seconds the CPU on one core of JD's staking wallet server is pegged at 100%. They're outputs of the transactions that created them. They're not the inputs of any transactions yet, or they wouldn't be unspent. They're potential inputs, if you like, but actual outputs. When they stake they become inputs of the staking transaction.
The rounding down to an integer was related to how the age of an output affected its staking power in an older version of CLAM. I think it used to multiply the value by the age and round down to an integer. I don't think it does that rounding any more, or the multiplication by the age. These days the staking power (called the "weight") is just the same as the value in CLAMs. Each output is considered separately. It doesn't matter if you have lots of 1 CLAMs outputs on a single address, or in lots of different addresses. They each get their own individual chance of staking, with a probability proportional to their own individual value in CLAMs.
There is a benefit to splitting your outputs up into several smaller outputs. Suppose you have 1000 CLAMs. It will stake very quickly, and become 1001 CLAMs. But then it will take 8 hours to mature before it can stake again. The best you could hope for it that it will stake 3 times per day (since that's how many 8 hour maturation periods you can fit into a day).
If instead you split it into 1000 outputs of size 1, each one tries to stake independently. Each one has a 1000 times lower chance of staking than the 1000 CLAM output did, but there are 1000 of them, so it takes roughly the same time for one of them to stake, and turn from 1 CLAM to 2 CLAMs. Then, however, only the 2 CLAM output is frozen for 8 hours while it matures. The other 999 CLAMs continue trying to stake. So you have saved yourself an 8 hour wait for 99.9% of your value.
If you split your value up into *too* many outputs, you'll have too much hashing to do every 16 seconds that you won't be able to get through it all. And if you ever want to spent your outputs, having them split up into millions of tiny pieces makes the transaction which spends them very big (and so very expensive in tx fees).
So there's a tradeoff - split enough, but not too much.