I'm going to read it carefully this time. Thanks. But from the first glance, I see: "Playing more than 5,000 hands each time", "... for 10,000 hands"... Doesn't that mean "studying the opponent plays for hours"? I mean, to me, it does.
No. As I said, it is not.
Let's have citations directly from the paper.
The core of Pluribus’s strategy was computed through self-play, in which the AI plays against copies of itself, without any data of human or prior AI play used as input.
Pluribus’s self-play produces a strategy for the entire game offline, which we refer to as the blueprint strategy. Then during actual play against opponents, Pluribus improves upon the blueprint strategy by searching for a better strategy in real time for the situations in which it finds itself during the game.
Each of the two humans separately played 5000 hands of poker against five copies of Pluribus. Pluribus does not adapt its strategy to its opponents and does not know the identity of its opponents, so the copies of Pluribus could not intentionally collude against the human player.
In this experiment, 10,000 hands of poker were played over 12 days. Each day, five volunteers from the pool of professionals were selected to participate on the basis of availability.
You can skim the actual paper, you don't have to necessarily to read it carefully. The 5,000 and 10,000 plays are only to measure the effectiveness of the AI. It does not imply those are the number the AI learn the opponents play, they already have the gameplay blueprint/strategy right before it is rigorously tested.
Pluribus’s self-play produces a strategy for the entire game offline, which we refer to as the blueprint strategy. Then during actual play against opponents, Pluribus improves upon the blueprint strategy by searching for a better strategy in real time for the situations in which it finds itself during the game.
Only by then, on the actual live gameplay, do they adapt and adjust accordingly. So it can't be said and is not to study the opponent's play by hours.