I'm rephrasing the stuff Charles-Tim and hosseinimr93 have already said (they are both correct), but i tried to make it a bit "simpler"...
My simplification is a bit "over the top", some aspects are oversimplified, but i'm trying to give an ELI13 explanation here
The bitcoin network re-adjusts the difficulty every 2016 blocks to make sure that, on average, one block gets found every 10 minutes. That's about ~6 blocks per hour. It doesn't matter if the hashrate climbs or plummets, the network re-adjusts every ~2 weeks to make sure that on average ~6 blocks are found per hour.
A block can only contain 1 Mb of transaction data, NOT including the witness data of a segwit transaction... A block can contain up to 4 Mb of data, but the first Mb is the "actual" transaction data, and the other 3 Mb is only witness data of segwit transaction. This means that only ~6 Mb of "actual" transaction data (not including witness data) can be added to the blockchain each hour. This is also the reason that, when the mempool is overflowing as it is now, a miner has a choice to optimize the block he/she is trying to solve... He/she can make any combination of unconfirmed transaction he/she wishes.
When people use legacy addresses, the complete transaction has to fit into the first Mb of the block. When people use segwit addresses, only a part of the transaction has to fit into the first Mb, and a big chunk of the transaction data can fit in the "extra" 3Mb. The first Mb is the actual bottleneck, a miner cares less about the witness data since he has 3Mb of blockspace to store said data.
Any transaction has a number of inputs (with a value) and a number of outputs (with a value). A miner solving a block can send the block reward to himself, but he can also add the sum of all the values of all inputs of all transactions in his block minus the sum of all the values of all the outputs of all transactions in his block to said block reward... SOOOOO for a miner, if there are plenty of transactions in the mempool to pick from, it's financially beneficial to add transactions with the highest fee per byte of transaction data (since the total size of the block is a limiting factor). Since a segwit transaction NOT including the witness data is much smaller than a legacy transaction, it needs less fee in order to become interesting for a miner to be added to the block he/she is trying to solve.
As soon as you get this, it's time to look at taproot