Pages:
Author

Topic: [Active] Finding spam and scams by keyword - page 2. (Read 3544 times)

legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
August 13, 2020, 10:42:47 AM
#1
The data
See loyce.club/badposts
All categories / spam / scam / other / advertising / email
Most recent posts are shown first.

Whitelisted users
Code:
LoyceV
LoyceMobile
TheBeardedBaby
Lafu
Rizzrack
Timelord2067
marlboroza
morvillz7z
NotATether
actmyname
nutildah
hosseinimr93
Mitchell
Bthd
light_warrior

Post keywords
Whitelisted users can post keywords. Please don't use very common words (such as "scam") that trigger too many false positives. And please keep all keywords in one code tag in one post: edit it to add/remove keywords.
Other users can also post, with 2 possible outcomes:
  • I whitelist you and process your keywords
  • I remove your post
Please don't quote code-tags.

Tips
Leave out "https" from scam links.

Format
Within code-tags, post either "scam:thiswebsiteisascam" or "spam:Ikeeppostingthesamecrapeverywhere". One phrase per line. See this example.
Remove the line to exclude the keyword in the next update.
Only use a space after "scam:" if you want to include a space in your search.
Keyword "other:" is meant for text that isn't necessarily spam or a scam, but needs highlighting nonetheless.
Keyword "advertising" is meant for sites that are often used to copy an article and create a backlink.
Keyword categories:
  • spam:
  • scam:
  • other:
  • advertising:
  • email:

Features
I search the last ~200,000 posts for new keywords. This covers approximately 1 month.
I update all keywords every 20 minutes.
I show all matches for all usernames and ranks. No exceptions. Not all of them are bad.
Whitelisted users are shown in green.

Limitations
There's a maximum of 50 keywords per post (if you add more those will be ignored). I can increase this if needed.
There's a maximum of 4000 matches per category. If there are more, the oldest are removed.
The minimum keyword length is 4 (for other) or 5 (for spam/scam) characters.
I also search quotes.

Report posts!
This list is only useful if someone actually reports the bad posts Smiley

Post removal
I want to keep this topic compact (so I can quickly scrape it many times). That means I'll delete almost all posts that don't contain a list from a Whitelisted user. I would say "I hope I'm not offending anyone", but really, I'm okay with that Tongue



Q&A
What are you trying to accomplish with this thread?
See around here Smiley

Will this look into titles?
Some types of scams involve posting very little content in the OP body but then go on to include important keywords in the title.
For now: nope Sad
I don't keep track of titles with this data.

This could be cross-checked with your list of banned accounts.
Thanks, I've added it.

Can I use it for searching for alts? Like links to twitter, facebook, telegram usernames, etc?
Can I use it for catching plagiarism, like searching for a whole sentence or this will clogged the server, or maybe only a phrases and not so common words like we did in the SpamBuster club with suchmoon?
My plan was to only look back about 100k posts (currently just over 2 weeks), so it won't really help you here. But it would (near) instantly add new posts, and that's what I'm aiming for here.
Searching all my data without database takes too long to do on a regular basis. You should try TryNinja's database though!

would it catch spoofed urls? www.bitcointalk.org
Without "www", the url turns into 9gag.com. I think theymos should give "www.bitcointalk.org" the same treatment.
I've added category "other" for things like "www.bitcointalk.org".

It shows every word which has keyword "moron" or "moran" inside:
umoran
Is it supposed to work like this?
I search for the exact phrase (case insensitive), so it matches anything. You can add a space in front of it (as you did already), but that might miss some matches too. It's more or less as intended, if I change this, it might overlook other matches.

Can we also include the reasoning behind why something is a scam? Perhaps a link to a thread that explains it?
It can be explained in the post in this topic. I don't want to add repetitive explanations to my badposts page.

"the eth pill stuff" has malware https://bitcointalksearch.org/topic/m.54876299
It would be great if you can add this to your earlier post Smiley



Please don't quote code-tags.

Please don't quote code-tags.
You should write that on the first row of the OP.
It's a tad higher now. Don't worry about overlooking it: I only added it today. I don't think it's much of a problem though: only the first code tag in each post is processed, and I now remove duplicate keywords to reduce search time.

Each 15 minute update takes about 1 second to process.
Each new keyword takes a few minutes to process, reading all 200,000 posts is slow. Processing several new keywords at once is more efficient, so feel free to add them Smiley

spam:minepi.com/
scam:github.com/pillforeth/
I'm trying to improve searching for whole words only. I now remove the trailing slash ("/") from the keyword before searching. I don't think it matters for your keywords, but it can improve other strings.

other: moran
other: moron
other: moron
Try without the spaces now Smiley

I've tried with space in front and the end, it didn't find anything. I have also tried with space in front it also didn't find anything.
You were trying to adjust for my old search, right while I was adjusting it to improve matching complete words only.

scam:https://github.com/pillforethereum/ETHpillAN/
There's many of those altcoin pills nowadays.
You should probably omit the "https://"-part, a scammer can do the same.

How far back does your search go?
See:
I search the last ~200,000 posts for new keywords. This covers approximately 1 month.
This search takes about 2 minutes for new keywords. It's mainly meant to catch new posts, older posts can be found through other means.

Any thoughts why this post and User wasnt catched today
The post was edited, see the unedited post.
Unfortunately, I can't know which posts have been edited, so this is a loophole to escape my badposts list.

I think you should just make another link/html file that contains all cointelegraph spammers.
I did already, see loyce.club/badposts/advertising.html.

@Loyce can you please remove one of these keywords :
github_com/ProjectEthereumPill
github_com/ProjectEthereumPill/EthereumPill/
Thanks, done:
The following overlapping keywords have been removed:
github.com/ProjectEthereumPill/EthereumPill
github.com/pillforethereum/ETHpillAN (because of github.com/pillforeth)

I also noticed the "Banned" notification is only useful for new keywords, because my banned list is only updated once a day. I ran a one-time update from scratch, searching the last ~800,000 posts, this updates the banned-status on older posts.

Hope I did it right.
HYIP and MLM are too short, the minimum word length is 5 for scams.

@LoyceV did you delete the old info ? The scam link displays only 22 archived posts.
I do have logs, but it's too many lines to search now. I think someone must have entered a keyword with many hits, then removed the keyword again. My "badposts" only shows the latest 4000 posts each time I update it, but when a keyword is removed, all those entries are removed too.
I've manually reset it to re-check all keywords in the last ~200k posts. This restored a longer list again.

Is ETC officially a scam? I see it in the blacklist words as a scam.
That is debatable...
However the "scam : ETChash" keyword just helps catch posts like this one that have malware download links

I think the keyword "PhoenixMiner" is too general and gives out a lot of fake positives...



I made a new toy:
[Newbie scrutiny instead of jail] Every new user's first post: loyce.club/patrol:
See loyce.club/patrol/

Please Report (or Merit) the posts when needed Wink

It's updated once a minute.

Sample:
Image loading...
Pages:
Jump to: