Hey all,
I've been planning to write a few scripts relating to BitcoinTalk. It's been on my "developer bucket list" to write something to detect users who have multiple accounts. In order to accomplish this, and have a reliable list, I'd have to determine some logic in order to base this.
Content within HR tags will be updated as the thread goes along.
I have a few things in mind (and I'll be updating this as the thread goes along - adding new ideas & such):
When and if topics are created for the data (which will either be by me, or others): I'll post the links here under the respective categories.
For
account quality detection:
- Looking for # of words, paragraphs, sentences, etc.. gathering the average of each user in order to determine a account quality number. This number can be used in tandem to determine if a report is made on a account (with other scripts). Obviously this isn't enough to report by itself, but usernames w/ low quality could be sent into a spreadsheet of some sort for manual lookup.
- [your/others ideas here]
For
multiple account detection:
- Look for same address usage between posts (BTC, ETH, etc)
- Look for same account usage between posts (telegram, skype, emails, etc)
- [your/others ideas here]
For
copy-pasta detection:
- Write a script to determine copy-pasta from accounts by matching the text of posts to similar text of other sites in order to return a probability percentage of the user copy/pasting (including src for manual analysis). Users w/ percentage points above a certain number will be put into a list & potentially reported to threads/mods. IE: external plagiarism detection
- Write a script to determine copy-pasta by matching post content against other users post content. High similarities will raise red flags. IE: internal plagiarism detection *note: suchmoon mentioned that working on something similar, so other scripts may set precedence*
- Original script may want to ignore quote tags. However, if the case, depending on how built (if use full text, or word by word) another side-script would have to be built to prevent users from just wrapping their messages in quote tags.
- [your/others ideas here]
For
trust abuse/merit abuse:
- Detecting trust abuse (users who send out a large amount of negative trusts, using the same text). This would obviously avoid trusted members (as some good campaign managers send out trusts w/ same text). This is mostly targeted towards members w/ no trust, or negative trust (ie: newer members, no trade history, etc). Results would be posted in a thread in a list format using tildas "~" so people can copy/paste the list of abusers into their trust lists. Allowing the ability for users to request they be removed from this list by public poll within thread (this should probably be handled manually)
- [your/others ideas here]
General ideas for all scripts- Automatic posting to anti-spam threads w/ results (in such a way as to not create more spam though)
- Platform where users which have been reported by scripts can be documented, with automatic ban detection. That way scripts aren't looking into users if they have already been reported/banned.
- [your/others ideas here]
Results would be posted here for mods to look at (if need be), or just to keep a record of such a connection. I'd also probably link to results in
this topic and maybe load it up on a website of mine.
I wanted to post this thread in advance to see if anyone else had any other logic / ideas in mind for these scripts/bots? This will solely be when I have the time to create this (which won't be for a couple of weeks), so I thought I'd post this well in advance. I'll update the above list with approved suggestions that I plan to work on.
Thanks!
P.S: If any mods/admins aren't ok with me scraping the site, by all means let me know. I'd obviously write the bot/script in such a way that it doesn't slam the server & only send a certain amount of requests per second/minute (more or less like a Google bot). I know other users have written similar bots/scraping tools, so I thought it'd be ok. But if not, just let me know
Change log:Edit (September 19th, 2018): I'll be updating this thread (see under bolds) with new ideas as this thread progresses. Also, if anyone else wishes to contribute to my scripts (or even build their own one-offs targeting the ideas above), just let me know that you're working on it, and I'll mark it in the thread. While I agree different scripts/algorithms would be harder to avoid/abuse, obviously I'd want all of the scripts to developed in a timely manner, so duplicating work probably isn't a good idea as of this moment.
Edit (September 20th, 2018): Adding trust/merit abuse columns - automatic detection of users abusing trust/merit system system.