Pages:
Author

Topic: "Multiple Accounts" / Copy-pasta detection scripts/bots (Read 881 times)

hero member
Activity: 1540
Merit: 759
Not quite like that but, that sound usefull too in giving some more insight on the merited post.

I meant to run different kind of automated check on merited comments wrote from low rank members since as we can see, may have higher chances to have been faked.
So running few different kind of automated techniques, (even if they take several seconds per post) on few thousand messages is going to be easier than do the same on every single unmerited post.

So more or less targeting "newbie/"/low ranking members w/ 1 merit+.

We could add a script where it does the above, and then looks at top senders of merit to newbies for abuse. Although there may be legitimate senders within that list.
hero member
Activity: 784
Merit: 1416
Not quite like that but, that sound usefull too in giving some more insight on the merited post.

I meant to run different kind of automated check on merited comments wrote from low rank members since as we can see, may have higher chances to have been faked.
So running few different kind of automated techniques, (even if they take several seconds per post) on few thousand messages is going to be easier than do the same on every single unmerited post.
hero member
Activity: 1540
Merit: 759
Another thought i had about plagiarism.

As far as i can see the main goal of faking content is just obtaining merits, wouldn't save a lot of time and resources to just check directly messages which receive merits, like every friday.
If from those you even remove messages from higher ranks, which are unlikely to risk the account, this reduces the total number to be checked to a very small fraction.
More manageable and the possibility to run deeper and lengthy methods to verify the content is authentic.

So like contrasting the # of merits against the quality of post, generating a list of users to be looked into? Excluding merits sent/received by HQ members?

Let me know if I got that right.

It's not a bad idea. Would still require some manual labor & not be completely automated. I'll throw it on the list if it's cool with you?
hero member
Activity: 784
Merit: 1416
Another thought i had about plagiarism.

As far as i can see the main goal of faking content is just obtaining merits, wouldn't save a lot of time and resources to just check directly messages which receive merits, like every friday.
If from those you even remove messages from higher ranks, which are unlikely to risk the account, this reduces the total number to be checked to a very small fraction.
More manageable and the possibility to run deeper and lengthy methods to verify the content is authentic.
hero member
Activity: 1540
Merit: 759
I'd just have to write a side-script to prevent users from just wrapping their messages in ["quote"] tags.
Quoted text isn't counted for payment for signature spammers, so they're unlikely to hide their plagiarism that way.

Never knew that. But I can’t see why some one would copy-paste. Good to know it’s checked. At first glance BTT looks complicated for me. But I now understand the complicated for a lot of reasons.

Copy/pasting is rampant on this forum. For bounty sig scammers it's the easiest way to get a high post count in a quick amount of time while looking like you're spending the time to write out a post.

Because most campaign managers have to manage many participants, plagiarism (copy/pasting) can get overlooked.

TBH, by building these scripts, campaign managers should have an easier time (in theory).

Update: added ideas sent from a user in PM: account quality detection

Update 2: adding idea for detecting trust abuse
jr. member
Activity: 448
Merit: 3
I'd just have to write a side-script to prevent users from just wrapping their messages in ["quote"] tags.
Quoted text isn't counted for payment for signature spammers, so they're unlikely to hide their plagiarism that way.

Never knew that. But I can’t see why some one would copy-paste. Good to know it’s checked. At first glance BTT looks complicated for me. But I now understand the complicated for a lot of reasons.
hero member
Activity: 1540
Merit: 759
Good idea, I think we should report the whole bounty board as plagiarism Smiley

Code:
foreach($forum_categories as $category_name => $category_values) {
if($category_name == 'Bounties (Altcoins)') {
foreach($category_values['posts'] as $post_id => $post_content) {
BitcoinTalkAPI::report($post_id);
}
}
}

Well that's done Wink (was gonna write it in python, but I've been coding with PHP all day so)

there are a lot of users who use actively bounty work & majority percent bounty post almost like same. Not only that script will detect similarity percentage so bounty post approximately 60/70% similar to each others. So how could it detected this script.

TBH, that's kind of the point. We'd have to determine a percentage of similarity that we agree is "report-worthy"; but I wouldn't be surprised if these scripts report a large amount of bounty users.

I'd just have to write a side-script to prevent users from just wrapping their messages in ["quote"] tags.
Quoted text isn't counted for payment for signature spammers, so they're unlikely to hide their plagiarism that way.

I guess, is that a standardized thing among all campaign managers though? I'm guessing eventually it would become rather obvious to them though. Definitely not a top priority script if required.
legendary
Activity: 3654
Merit: 8909
https://bpip.org
If it is using synonymous is becoming quite complicated, you need to be able to identify two different words are actually the same. Peraphs as you read the sentences you should substitute words with a code wich corresponds to a subset of synonymous then use these cleaned sentences to run the checks.
Maybe there are dictionaries ready for this sort of things. In any case comparing 1 message with all the previous message running perhaps multiple check can be quite expensive to perform.

There are dictionaries and other methods to deal with synonyms but they don't work well for crypto-themed texts without a serious ML effort. Worse yet, Bitcointalk text spinning bots don't really care much if the text makes sense so they'll replace "cryptocurrency" with "financial encoding" or some bullshit like that. Semantic comparison seemed quite useless to me so far in this context though I'm not an expert by any means - just learning as I go.

there are a lot of users who use actively bounty work & majority percent bounty post almost like same. Not only that script will detect similarity percentage so bounty post approximately 60/70% similar to each others. So how could it detected this script.

Good idea, I think we should report the whole bounty board as plagiarism Smiley
copper member
Activity: 728
Merit: 250
there are a lot of users who use actively bounty work & majority percent bounty post almost like same. Not only that script will detect similarity percentage so bounty post approximately 60/70% similar to each others. So how could it detected this script.
hero member
Activity: 784
Merit: 1416
Few thoughts about the spinned texts:

If the spinned text is not using synonymous it may help before to run any check to prepare the data, for example reorder all the word of the sentence in alphabetical order.

If it is using synonymous is becoming quite complicated, you need to be able to identify two different words are actually the same. Peraphs as you read the sentences you should substitute words with a code wich corresponds to a subset of synonymous then use these cleaned sentences to run the checks.
Maybe there are dictionaries ready for this sort of things. In any case comparing 1 message with all the previous message running perhaps multiple check can be quite expensive to perform.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
I'd just have to write a side-script to prevent users from just wrapping their messages in ["quote"] tags.
Quoted text isn't counted for payment for signature spammers*, so they're unlikely to hide their plagiarism that way.

* Assuming the campaign has a campaign manager that does at least some of his job.
hero member
Activity: 1540
Merit: 759
The difficulty will be to find sources to match against (unsure if scraping Google will be permitted, we'll see).

Google has a search API. Not sure if there is a free tier though.

Considering their pricing change on maps, I'm going to assume not. I'll look into it though, thanks Smiley

@LoyceV: I'll make a note that if comparing messages for plagiarism, we should probably be ignoring ["quote"] tags within our scripts. I know it would probably make plagiarism detection more reliable, I'd just have to write a side-script to prevent users from just wrapping their messages in ["quote"] tags.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Tinker a little with the number of words and the threshold for detection of duplicates, and you're probably almost there for a large share of the copy-pasta spam.
I'm more worried about the very high number of positive results. Let me play around a bit with yesterday's data, from post 45850092 up to post 45893434. My scraper caught 43184 out of 43343 posts (it misses some burst posts). This is after the new Merit requirements, so there's less spam already.

I'll show the 50 most used posts (raw HTML excluding quotes; the number at the start of each line shows how often they appear). Those posts are exactly the same each time they were posted:
Code:
   288 (post was empty or only a quote)
    162 Do you have a telegram channel?
     91 Proof of Authentication:
Joined Telegram Campaign
     45 Bump
     25 bump
     24 microguy talks to himself just like he trades himself just like he lol himself Grin Grin Grin

sounds like the shytcoin is showing its age like microguy is 🤔

sounds like a igotspots shytcoin scam checkpoint dysfunction  still better than btc right

Whats in your wallet

https://imgur.com/rPLBZVM
     23

For a more general context on our seed round, and the reasons for this funding round please read our /i3ufCd]medium article

     20
Hello Everyone, GOeureka are live now with Bounty Campaign.
 Please follow given link to participate

     19 hi
i noticed you deleted you telegram account recently
why?
i am still waiting the letter and when it arrives how can i contact you?
please contact me at @AmbrogioOrfeu on telegram
     18 IMPORTANT ANNOUNCEMENTS ABOUT INBOT FUTURE :

1. Our revenue for first 6 months was more than whole 2017!
2. We are hiring Partner Managers and Business Operations Managers.
3. We are moving InToken from Ethereum to Stellar blockchain.
4. We will list InToken without an ICO.
     17 hello everyone
here im talking about a new cryptocurrency which is THUNDERSTAKE (TSC) .TSC PoS staking rewards: 900% APR fixed, every block number dividable by 10 is a superblock with double APR (1800 %) .
we have made products with TSC logo which you can buy from our website with TSC coin as payment.TSC is live on CMC and 5 exchanges, Cyptobridge,mercatox,Stokes.exchange, bitrex and escodex .
here is our website link https://thunderstake.com and discord link :   https://discord.gg/wmu9Zcx you can get everything from here have a look
     16 Up
     16 Proof of Authentication:
Joined Telegram Campaign

     15 up
     14 week #1
Reddit Campaign
Reddit name:
Reddit user Url:
Like any post on Subreddit (list with links to post):
1.

     14 #proof:
Twitter username:@cryptonerdd
Telegram username:@cryptonerdd
ERC20 address:0x51494b94939D2C8353d069206887687C40eD92B9

     13 microguy talks to himself just like he trades himself just like he lol himself Grin Grin Grin

sounds like the shytcoin is showing its age like microguy is 🤔

sounds like a igotspots shytcoin scam checkpoint dysfunction  still better than btc right

Whats in your wallet

https://imgur.com/rPLBZVM
     12 Bitcointalk username: aloha0001
Forum rank: member
Posts count:  255
ETH address: 0x04ddhA7Bb8b08af5E6866C1efc3rehe54a2859E6

     12
Update
     11 reserved
     11 Twitter

Retweets
1.https://mobile.twitter.com/MaestroProject1/status/1003536243370545152
2.https://mobile.twitter.com/MaestroProject1/status/1003824945430843393
3.https://mobile.twitter.com/MaestroProject1/status/1004547290063765508
4.
5.

Tweets
1.https://mobile.twitter.com/amanda_septiasa/status/1003681341915869184
2.

     10 Week #1
Facebook

Shares + Likes

1. https://www.facebook.com/amro.trikid/posts/10212205125466225
2. https://www.facebook.com/amro.trikid/posts/10212210015988485
3. https://www.facebook.com/amro.trikid/posts/10212219066614745
4. https://www.facebook.com/amro.trikid/posts/10212228785097701
5. https://www.facebook.com/amro.trikid/posts/10212228787657765

     10 WEEK#1
Facebook Campaign
Facebook Link: https://facebook.com/deerey.area
Friends: 1100

Post:

Shared:

     10 Twitter Campaign     
Twitter user Url:   https://twitter.com/4LUtr1qGRLB   
Repost and Like any post on Twitter (list with links):     
https://twitter.com/bitflipcc/status/10101578403

     10 Bitcointalk account URL :
TELEGRAM username: @zlo2323
language: Korean
Rank: Jr.Member
Eth address: 0xaE0304fd2b399c790170aA6Ea6A1d6E78713f96

     10
test
     10

     10 #PROOF OF AUTHENTICATION POST
Joined Twitter Campaign
Bitcointalk Username: Dollar1980
Telegram Username: @TahsibGhurair
Twitter Username: @Tahsib_Ghurair
Twitter Account Url: https://twitter.com/Tahsib_Ghurair

      9 Native language: Russian                                                                 
Bitcointalk username: Sabergas1w7                                                                   
Profile link: https://bitcointalk.org/index.php?action=profile;u=161465763                                                                     
Part of the bounty you apply for: ANN                                                               
Experience: NO                                                                 
Telegram: https://t.me/Sadbis1g7                                                                 
Email: [email protected]                                                               
Ethereum address: 0x91D8f2e4hjdEC122568f4c2cd5D14a362glk561F                                                               
Please PM me if you accept.

      9 #Proof of Authentication

Campaign : Telegram & Twitter
Bitcointalk Username: notnotok
Telegram Username : @khalidalbudoor
Twitter Account Link: https://twitter.com/khalidalbudoor7
Twitter Username: @khalidalbudoor7

      9 #PROOF OF AUTHENTICATION POST
Joined Twitter Campaign
Bitcointalk Username: ExcellentOffer86
Twitter Account Url: https://twitter.com/Saeed_Imtiaz1
Telegram Username: @Saeed_Imtiaz1

      8 Week #1
Twitter

Retweets
1. https://twitter.com/MaestroProject1/status/998832211800412160
2. https://twitter.com/MaestroProject1/status/998839809895350272
3. https://twitter.com/MaestroProject1/status/999005931881906176
4. https://twitter.com/MaestroProject1/status/999036079238868992
5. https://twitter.com/MaestroProject1/status/999043596345950208

Tweets
1. https://twitter.com/hellofancydei/status/1004721044937105413
2. https://twitter.com/hellofancydei/status/1004721411192049667 
      8 Facebook
Week #1

Twitter Profile Link: https://twitter.com/CREoday_ru
Like and Retweet:
1. https://twitter.com/medXe1/status/961630808724459520
2. https://twitter.com/medXe1/status/962393102601412608
3. https://twitter.com/medXe1/status/962767627113455616
4. https://twitter.com/medXe1/status/962768328770146309
5. https://twitter.com/medXe1/status/975583417281712128

Facebook Profile Link: https://www.facebook.com/ar.amur.ru
Like and Share:
1. https://www.facebook.com/ar.amur.ru/posts/597475630588609
2. https://www.facebook.com/ar.amur.ru/posts/597613640574808
3. https://www.facebook.com/ar.amur.ru/posts/597994343870071
4. https://www.facebook.com/ar.amur.ru/posts/598519390484233
5. https://www.facebook.com/ar.amur.ru/posts/599002237102615

      8 https://i.imgur.com/QBgno2y.png

We invite you to bring your project to Altmarkets.cc,


Add your coin to our exchange by requesting Here


(OPTIONAL) Join us on Discord to speak directly to us about your listing request : https://discord.gg/ZhQzy5f

Our Fees - https://altmarkets.cc/fees
Listing Policy: https://altmarkets.cc/add_coin
      7 week 1

Tweet link :
1.
2.
3.

Retweet link :
1. https://twitter.com/MaestroProject1/status/10016030348670208
2.
3.
4.
5.

LIke & share link :
1. https://web.facebook.com/coinhunt1/posts/28284343478955285
2.
3.
4.
5.

      7 Proof of joined post
Campaign in which you participate: Linkedin campaign
ETH address: 0x02Aft679fd80E9dD51cac1dc5se45f42578fhj64

      7 I want to reserve a signature campaign.
BitcoinTalk name: jordarheje89
BitcoinTalk profile link: https://bitcointalk.org/index.php?action=profile;u=1866560678;sa=summary
Eth Address: 0xCd332c24rhehBfa3A9d658D2F33Aheh2eF5689

      7 Bump.
      7
RainCheck | Update
      7 +12000 subcribers on Telegram
Come and chat with the Team
https://t.me/brodweyrealteam

      7 #proof:
Twitter username:@cryptonerdd
Telegram username:@cryptonerdd
ERC20 address:0x51494b94939D2C8353d069206887687C40eD92B9
      7 #Proof of Authentication Post Link

Twitter Campaign
Twitter Account : https://twitter.com/DarinaBovsiktak
Facebook Campaign
Facebook: https://www.facebook.com/DorianTopz
      7 #PROOF OF AUTHENTICATION POST
Joined Twitter Campaign
Bitcointalk Username: ExcellentOffer86
Twitter Username: @Saeed_Imtiaz1
Twitter Account Url: https://twitter.com/Saeed_Imtiaz1
Telegram Username: @Saeed_Imtiaz1

      7 ##PROOF OF AUTHENTICATION##
Bitcointalk Username: trishaanywhite


Joined Campaigns: Twitter
Twitter User Name: trishaanywhite
Twitter Account Url  : https://twitter.com/trishaanywhite


Joined Campaigns: Telegram
Telegram user Name: @trishaany
Telegram Url: https://t.me/trishaany


      6 TRANSLATION IN INDONESIAN
Bitcointalk username: adelaisav
Native language: indonesia
Email: [email protected]
Telegram: @filarisdianto
Part of bounty you apply for : ALL
Translation/moderation experience: https://docs.google.com/spreadsheets/d/1Ltym_vuCnAvpGD7F7KnldJtm7wYP8S3sdZ7pdRaK8Jg/htmlview
ETH address: 0xb02518F08daeb2Ef11a50edB152C59507D0EB2F5
Pm me if you need sir
      6 Reserve
      6 Project looks great but there are tons of projects like this and my question is, how can you be a bit defirrent than other payment system?
      6 IMPORTANT ANNOUNCEMENTS ABOUT INBOT FUTURE :

1. Our revenue for first 6 months was more than whole 2017!
2. We are hiring Partner Managers and Business Operations Managers.
3. We are moving InToken from Ethereum to Stellar blockchain.
4. We will list InToken without an ICO.

      6 Hi dev,
I'm writing to you with an offer of listing at one of the major masternodes monitoring website - http://masternodes.plus (MasterNodesPlus).
You have been selected and approved for listing as recommended masternode coin.
To be listed at the website, you can use one of the three offers:

Normal listing-up to 24 hours: 0.1BTC
Listing an ICO (coin not available on any exchange) up to 6 hours: 0,3BTC

You can make your request for the lisitng here:
https://masternodes.plus/contact.html


Regards,
Timothy James-Quill
\93MNP\94

      6 A request to prospective clients, please post a message on the forum thread first to keep the thread alive and then make a contact using above mentioned contacts for prompt response.

--------------------------------------------------------------
For users in China/Hong, they can also contact via QQ.

QQ: 256447418
The first line is my own description. It's mainly caused by bounty spammers: they quote their own old post, then edit it to add their latest bounty report spam. My scraper catches the posts before they're edited.

This doesn't really catch plagiarism, but it catches spam. When you're looking for word phrases to detect plagiarism, you're likely to get even more hits than this.

The second entry came from Cidonar, who bumped this thread 162 times. That board shouldn't allow deleting posts within 24 hours, but it does.
The user isn't banned, as he deleted the evidence.

The third entry ("Proof of Authentication") came from many different users in this thread. I've just reported a few asking to check the thread.

The sixth entry ("microguy talks to himself") came from BitCoin ranger, who had 24 posts deleted by moderators.

Manually going through this list is a lot of work, while there aren't many posts to report. It's not very effective to do.
legendary
Activity: 3654
Merit: 8909
https://bpip.org
Tinker a little with the number of words and the threshold for detection of duplicates, and you're probably almost there for a large share of the copy-pasta spam.

I experimented with n-grams a little bit and couldn't find a good value. Low n yields too many false positives, high n doesn't detect spinners, etc. So I'm using a mixture of algorithms and base the decision on the pattern of the results of those algorithms - e.g. if the similarity of two texts using algorithm A is 70%, then union/intersect/otherwise manipulate the texts, run algorithm B, if it scores 90% then run algorithm C to eliminate false positives - made up numbers but you get the idea. Works ok-ish, but as I mentioned it doesn't scale well and I need to do more testing on larger samples.

The difficulty will be to find sources to match against (unsure if scraping Google will be permitted, we'll see).

Google has a search API. Not sure if there is a free tier though.
hero member
Activity: 1540
Merit: 759
I don't know how it works but I think there is a bot on Steemit "@cheetah" that detect plagiarism, thus developing a similar bot wont be a problem (there are many senior developers in this forum).

It will be great if you succeed to write a script that detects members sendig Merits to each others.

I don't think it is going to be hard to code such script but you will need an access to the Merit database.

There's plenty of paid APIs to support plagiarism detection externally, so if I was lazy and rich I'd use those lol. Although, I'm uncertain of their reliability.

But realistically, external plagiarism detection isn't super difficult; although it may be more difficult than internal detection. I won't go too far into details (hashing, storage methods, etc), but essentially you're taking the copy of the text (or portions of it) & matching it against search engine results / meta descriptions.
I'm sure there's plenty of other methods as well.

The difficulty will be to find sources to match against (unsure if scraping Google will be permitted, we'll see).

Point is though: if 3 different developers develop it 3 different ways (using different sources) it will be far more difficult for bots/spammers to reverse engineer/abuse.

If you're working on plagiarism detection already, I'll probably work on multiple account detection first. Granted, multiple bots running from different developers with different sets of algorithms probably isn't a bad idea (will make it harder for bots to avoid)

I think we can certainly run multiple attacks on plagiarism as long as we coordinate to reduce overlap in which users we've reported etc, e.g. using the thread I mentioned and also https://bpip.org to check for bans.

With the little time I have available I'm still probably weeks away from a reasonably usable product and even then it would cover only a relatively small set of potential plagiarism. LoyceV mentioned that forum gets ~50k posts a day - many of which can be ignored or whitelisted but still that's a lot of garbage to sift through.


Maybe we can create some sort of central location for defining which users have been reported by bots.
If I have time, maybe I'll create something web-based, and just give out API keys to users who can prove they have an operating script.

Would just sort of be a web-based platform to set which users are reported by scripts/bots, and then it would track if those users actually have a ban through the use of BPIP (If Vod permits)

Dumping the info into a thread probably isn't ideal, but worst comes to worst we can rely on that until a more advanced system is produced.

If it helps you guys to know about declared alts, here are mine.

Talk Merit
JetAid



Thanks Jet Cash, if I do implement an alt detection system, I'd make the reporting of users more manual than automated.
I'm sure there's many users (such as yourself) who have alts for various reasons and aren't being nefarious and don't deserve a report.

If anyone has any further ideas for methods, keep em comin' Smiley
qwk
donator
Activity: 3542
Merit: 3411
Shitcoin Minimalist
Detecting the text spinners will be a whole different level!
I guess a quick and dirty approach could be something like this:
1. take samples of all occurrences of 4 consecutive words
2. create their md5 (or whatever you prefer) hashes
3. store those hashes in a database
4. count number of hash collisions with other posts

So, a simple text like:
The quick brown fox jumps over the lazy dog

would result in 6 individual hashes:
The quick brown fox
quick brown fox jumps
brown fox jumps over
fox jumps over the
jumps over the lazy
over the lazy dog

Tinker a little with the number of words and the threshold for detection of duplicates, and you're probably almost there for a large share of the copy-pasta spam.
member
Activity: 518
Merit: 21
We already have several tools for this purpose, you can see one here done by @DdmrDdmr

Code:
https://public.tableau.com/profile/ddmrddmr#!/vizhome/BitcointalkMeritDashboard/GlobalSummary
This forum has full of enthusiast people working together shaping up for the betterment of this forum. I do believe that it could be achieve with the help from other members collaborating with each other. Thus, collaboration will help and get the job done easier. If i only have this kind of expertise then definitely I am more than willing to help you guys. Sad to say I am just only following and taking down important details for the future implmentation and update with this forum. GO! GO! GO!
legendary
Activity: 2366
Merit: 1512
#1 VIP Crypto Casino
It will be great if you succeed to write a script that detects members sendig Merits to each others.

I don't think it is going to be hard to code such script but you will need an access to the Merit database.

We already have several tools for this purpose, you can see one here done by @DdmrDdmr

Code:
https://public.tableau.com/profile/ddmrddmr#!/vizhome/BitcointalkMeritDashboard/GlobalSummary
legendary
Activity: 2520
Merit: 2853
Top Crypto Casino
I don't know how it works but I think there is a bot on Steemit "@cheetah" that detect plagiarism, thus developing a similar bot wont be a problem (there are many senior developers in this forum).

It will be great if you succeed to write a script that detects members sendig Merits to each others.

I don't think it is going to be hard to code such script but you will need an access to the Merit database.
legendary
Activity: 3654
Merit: 8909
https://bpip.org
If you're working on plagiarism detection already, I'll probably work on multiple account detection first. Granted, multiple bots running from different developers with different sets of algorithms probably isn't a bad idea (will make it harder for bots to avoid)

I think we can certainly run multiple attacks on plagiarism as long as we coordinate to reduce overlap in which users we've reported etc, e.g. using the thread I mentioned and also https://bpip.org to check for bans.

With the little time I have available I'm still probably weeks away from a reasonably usable product and even then it would cover only a relatively small set of potential plagiarism. LoyceV mentioned that forum gets ~50k posts a day - many of which can be ignored or whitelisted but still that's a lot of garbage to sift through.
Pages:
Jump to:
© 2020, Bitcointalksearch.org