Author

Topic: Users that wiped all their posts. Or anohter project for LoyceV (Read 478 times)

legendary
Activity: 1582
Merit: 1059
nutildah-III / NFT2021-04-01
those guys' profiles look like they've never spammed in their life. Roll Eyes
BPIP has a wall of shame.

I had no idea. This is damned brilliant mate Grin I think I've noticed some names I've been reporting in the last months, which is quite funny. Tongue
legendary
Activity: 1638
Merit: 1329
Stultorum infinitus est numerus
I mean it would probably be easier to implement something similar to seclog where it just tracks every edit. If a person edits one of their posts, it just gets posted there. If not, there can be a similar thing that just tracks "post" calls and someone whose scraping can just monitor it instead.
Theymos is the only person who can implement this. I don't think LoyceV or any other user can provide such data unless all posts are tracked one by one.

That is what I have meant there. It is probably easier for theymos to code and create and for all the patrollers, scripters to have access and use to locate edits faster and efficiently. You can scrape posts, that's not going to scale well though.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
those guys' profiles look like they've never spammed in their life. Roll Eyes
BPIP has a wall of shame.
legendary
Activity: 2380
Merit: 5213
I mean it would probably be easier to implement something similar to seclog where it just tracks every edit. If a person edits one of their posts, it just gets posted there. If not, there can be a similar thing that just tracks "post" calls and someone whose scraping can just monitor it instead.
Theymos is the only person who can implement this. I don't think LoyceV or any other user can provide such data unless all posts are tracked one by one.

I mentioned before that the sad thing about reporting and "cleaning up" someone's spammy mess, is the fact that after everything has been reported and deleted, those guys' profiles look like they've never spammed in their life. Roll Eyes
That's why I had already suggested in another thread showing number of recent posts deleted by moderators on profiles. 
legendary
Activity: 1582
Merit: 1059
nutildah-III / NFT2021-04-01
I guess it's better when they delete the posts themselves. It means that we don't have to waste our time and energy reporting them, and the result is the same.

I mentioned before that the sad thing about reporting and "cleaning up" someone's spammy mess, is the fact that after everything has been reported and deleted, those guys' profiles look like they've never spammed in their life. Roll Eyes
legendary
Activity: 1638
Merit: 1329
Stultorum infinitus est numerus
viable option

Well, it's better than constantly rescraping 50 million posts but you would still need to rescrape nearly 3 million user profiles.

Another shortcut would be to rescrape only users who post something new (going by the "recent" page) but that would miss a few if someone edits something and never posts again. Probably rare though.


I mean it would probably be easier to implement something similar to seclog where it just tracks every edit. If a person edits one of their posts, it just gets posted there. If not, there can be a similar thing that just tracks "post" calls and someone whose scraping can just monitor it instead.
legendary
Activity: 3654
Merit: 8909
https://bpip.org
viable option

Well, it's better than constantly rescraping 50 million posts but you would still need to rescrape nearly 3 million user profiles.

Another shortcut would be to rescrape only users who post something new (going by the "recent" page) but that would miss a few if someone edits something and never posts again. Probably rare though.
legendary
Activity: 1638
Merit: 1329
Stultorum infinitus est numerus
Unless there is a public database where it logs every edit for every post. (While it does check if it has been edited, hence the last edit timestamp) I don't think it's really that possible. Unless, of course, you literally archive every single post ever on Bitcointalk. If anyone has several terabytes of storage lying around, that might actually be cool. If anyone codes a system like this, I might be able to provide some storage though.

Average post length is ~1000 bytes. At 50 million posts it would be around 50GB, maybe up to 100GB if you want to go crazy with fancy text indexing. Not terabytes.

However it is technically impossible to capture edits without continuously scraping the whole post history of every active user. AFAIK there is no public indication of an edit anywhere except the timestamp on a post inside a thread, so using that timestamp would mean rescraping every thread. Assuming that users have to log in order to edit their posts it would probably be easier to go by their profile "last active" timestamp and scrape only post histories of active users. This could miss some moderator edits though.

Determining which posts have been deleted is also impossible without massive rescraping. Post counts can't be relied upon due to some boards that don't count posts (SD/IT).

Although most of the time that's not really an issue. Usually a question about deleted (or edited) posts arises when there's a suspicion about a specific user and then that user can be checked e.g. against LoyceV's archive.

Isn't moderator edit a rare thing anyway? I honestly think "Assuming that users have to log in order to edit their posts it would probably be easier to go by their profile "last active" timestamp and scrape only post histories of active users" is a very viable option. Assuming that every post is recorded, if a user edits their post which changes their last activity it can check the posts of that user's (lets say like last 25 posts) posts and compares the posts to the posts in the database that were registered beforehand. If there are any differences, that's it. However, there must be some sort of a "edited post" call for this, probably. Otherwise refreshing a page pretty much updates last activity too.
legendary
Activity: 3654
Merit: 8909
https://bpip.org
Unless there is a public database where it logs every edit for every post. (While it does check if it has been edited, hence the last edit timestamp) I don't think it's really that possible. Unless, of course, you literally archive every single post ever on Bitcointalk. If anyone has several terabytes of storage lying around, that might actually be cool. If anyone codes a system like this, I might be able to provide some storage though.

Average post length is ~1000 bytes. At 50 million posts it would be around 50GB, maybe up to 100GB if you want to go crazy with fancy text indexing. Not terabytes.

However it is technically impossible to capture edits without continuously scraping the whole post history of every active user. AFAIK there is no public indication of an edit anywhere except the timestamp on a post inside a thread, so using that timestamp would mean rescraping every thread. Assuming that users have to log in order to edit their posts it would probably be easier to go by their profile "last active" timestamp and scrape only post histories of active users. This could miss some moderator edits though.

Determining which posts have been deleted is also impossible without massive rescraping. Post counts can't be relied upon due to some boards that don't count posts (SD/IT).

Although most of the time that's not really an issue. Usually a question about deleted (or edited) posts arises when there's a suspicion about a specific user and then that user can be checked e.g. against LoyceV's archive.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
As shown in that link, I have 3 versions: all posts (updated every minute), per topic (no automated updates yet due to lack of time for testing) and per user (also no automated topics yet). I'm currently running an update for both.
This is horribly inefficient. You should be able to scrape once, and run various queries for each of your three copies of what you are saving.
I only scrape the data once, but have separate processes for each "category".

Quote
I believe I remember reading that you are not a programmer, nor know any programming languages so I am curious who is helping you with your project.
Just me Smiley
copper member
Activity: 1666
Merit: 1901
Amazon Prime Member #7
Unless there is a public database where it logs every edit for every post. (While it does check if it has been edited, hence the last edit timestamp) I don't think it's really that possible. Unless, of course, you literally archive every single post ever on Bitcointalk. If anyone has several terabytes of storage lying around, that might actually be cool. If anyone codes a system like this, I might be able to provide some storage though.
You could log every post until x time, and scrape every account's profile, moving up 'x' and re-scraping profiles whose scraped post count doesn't match their profile post count until the number of posts on the profile match the number of posts that have been scraped. This will calibrate how many posts each person should have to their actual posts.

After the above is done, you can continuously scrape new posts, and profile links to confirm if their post count has increased by one for each new post they have made. If not, their post history can be scraped, and checked against their existing posts in your DB. This subsequent scrape doesn't need to be saved, you will only need to update your DB with which of their posts was deleted when you find it.  


As shown in that link, I have 3 versions: all posts (updated every minute), per topic (no automated updates yet due to lack of time for testing) and per user (also no automated topics yet). I'm currently running an update for both.
This is horribly inefficient. You should be able to scrape once, and run various queries for each of your three copies of what you are saving.

I believe I remember reading that you are not a programmer, nor know any programming languages so I am curious who is helping you with your project.
legendary
Activity: 2730
Merit: 7065
His profile shows that he has 35 posts, when you check his latest posts 6 are still there. Those are the threads that he once created and he can't delete those. BPIP on the other hand shows he has 0 posts, why the difference on bpip?

Most of the content he posted in his OPs are visible because it was quoted somewhere in the threads.
I remember his name, he was very active in Development & Technical Discussion and Bitcoin Technical Support.

legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
As shown in that link, I have 3 versions: all posts (updated every minute), per topic (no automated updates yet due to lack of time for testing) and per user (also no automated topics yet). I'm currently running an update for both.

You'll have to check the posts numbers to be sure, but I think my data is pretty complete for the past couple of months.

If you check this link you should be able to access all the posts of that specific account. Although Loyce mentioned that his webhost ran out of space. That might be the reason there are gaps between the time periods of that acc. Although you can't see whether it has been edited and changed into what, you can see the original post.
In the "early days" of my scraper, I may have missed some posts, but I think the user just didn't post during those gaps (such as from for instance Oct 9 to Oct 16). It'll take a while to update, if he made any posts after Oct 25 they'll show up in a while. I think it takes around 12 hours to update.

Unfortunately it's not possible to detect deleted posts, so I can't highlight them without scraping everything again.
legendary
Activity: 3528
Merit: 7005
Top Crypto Casino
Don't know if it's possible (though I'm pretty sure LoyceV will prevail), but I'd be interested to see the results.  There would probably be a lot of Newbie accounts that tried to scam and then tried to cover their tracks, but I'd like to see if there are any older members whose names I haven't seen in a long time.  I think there were a couple of them a while back who were deleting most if not all of their posts.

Too bad there's no option on the forum to delete your account.  I'm not sure why that is.
legendary
Activity: 1638
Merit: 1329
Stultorum infinitus est numerus
Unless there is a public database where it logs every edit for every post. (While it does check if it has been edited, hence the last edit timestamp) I don't think it's really that possible. Unless, of course, you literally archive every single post ever on Bitcointalk. If anyone has several terabytes of storage lying around, that might actually be cool. If anyone codes a system like this, I might be able to provide some storage though.

Loyce is trying to:

https://bitcointalksearch.org/topic/viewing-unedited-posts-and-deleted-posts-view-per-post-per-user-or-per-topic-5167469

Which was why I was asking him about the scrapes.

-Dave

Oh. I did not know this. Looking into it I see that Loyce already has somewhat working system in place. If you check this link you should be able to access all the posts of that specific account. Although Loyce mentioned that his webhost ran out of space. That might be the reason there are gaps between the time periods of that acc. Although you can't see whether it has been edited and changed into what, you can see the original post.
legendary
Activity: 3500
Merit: 6320
Crypto Swap Exchange
Unless there is a public database where it logs every edit for every post. (While it does check if it has been edited, hence the last edit timestamp) I don't think it's really that possible. Unless, of course, you literally archive every single post ever on Bitcointalk. If anyone has several terabytes of storage lying around, that might actually be cool. If anyone codes a system like this, I might be able to provide some storage though.

Loyce is trying to:

https://bitcointalksearch.org/topic/viewing-unedited-posts-and-deleted-posts-view-per-post-per-user-or-per-topic-5167469

Which was why I was asking him about the scrapes.

-Dave
legendary
Activity: 1638
Merit: 1329
Stultorum infinitus est numerus
Unless there is a public database where it logs every edit for every post. (While it does check if it has been edited, hence the last edit timestamp) I don't think it's really that possible. Unless, of course, you literally archive every single post ever on Bitcointalk. If anyone has several terabytes of storage lying around, that might actually be cool. If anyone codes a system like this, I might be able to provide some storage though.
sr. member
Activity: 2030
Merit: 356
So I noticed yesterday or the day before some posts were missing in one of threads I was in.
Did not think much about it, could have been the user, could have been a mod.

Today I noticed that another thread was shorter and it looks like the user deleted a bunch of their posts.

https://bitcointalksearch.org/user/hardwalletattacker1-2668562

So, is there a way to generate a list of what was scraped vs. what is there?

Not important, just curious.

-Dave

This User HardwalletAttacker1 had recently changed the password and he deleted all of this pervious posts. I think he is someone who had bought this account and trying to clear the pervious posts history.
However the account is already tagged on some false information explanations.
Quote
Do not trust this user's explanation of technical details. He prefers to spew nonsense rather than learn how things actually work.
legendary
Activity: 3500
Merit: 6320
Crypto Swap Exchange
So I noticed yesterday or the day before some posts were missing in one of threads I was in.
Did not think much about it, could have been the user, could have been a mod.

Today I noticed that another thread was shorter and it looks like the user deleted a bunch of their posts.

https://bitcointalksearch.org/user/hardwalletattacker1-2668562

So, is there a way to generate a list of what was scraped vs. what is there?

Not important, just curious.

-Dave
Jump to: