For example one goal could be to find a link between a transaction and an IP address. The spy node connects to as many nodes as it can at the same time and if it sees tx1 coming from IPx for the first time then (some time later) sees tx2 spending outputs of tx1 and coming from same IPx and so on it can eventually conclude that addresses a, b, c, d belong to IPx and from there it could be possible to link IPx to the person's identity hence deanonymizing transactions.
The only risk is privacy risk (although I should add that there are a lot of good work done by core team to make such attempts as hard as possible), and of course wasting your resources.
- One of them is literary called "snoopy" (the client name in version message)
- They can't reply to getdata, getblock, getheader, etc. since they don't have any blockchain
- The version message some of them use during handshake is buggy and if you send them a false block height they start advertising that!
- Some of them keep coming and going (they don't remain connected)
- They also don't ask for same things a normal node would such as checking their headers with you first to sync