Pages:
Author

Topic: LoyceV's Topic Details: highlight deleted and edited posts (forum wide) - page 3. (Read 2226 times)

legendary
Activity: 1722
Merit: 5937
Quote
Bump: I need testers Smiley

55 second early bump, that's how much I need testers
Hm, I don't know what I am doing wrong, but this thing doesn't work for me, all I get is 404 error message.

I tried inserting topicID 5256136 ([ANN] DSF - The SoFi Blockchain - Redefine Social Network with Blockchain, but nothing. Tried waiting for a few minutes, as you said to wait a bit, but that didn't help either.

This is the link
http://loyce.club/archive/details/topic_5256136.html

edit: it works now Smiley
So I guess I had to wait for it to get processed/updated, as you said. I like this new feature a lot, it makes it so much easier to find deleted/edited posts.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Archiving a thread is now as easy as posting the right link anywhere on the forum (and waiting a bit)!

Short version:
Get a topicID you want to see, for instance 5145594.
Insert the topicID into the following link and post it on any public board on Bitcointalk: https://loyce.club/archive/details/topic_5145594.html
Wait a bit, then click the link!

Full version
Almost a year ago, I opened 35M posts! View unedited/deleted posts (search per post, per user or per topic). This has a lot of data (currently around 60 GB), but it's painstaking to manually find exactly which posts in a topic are edited or deleted.
I started archiving posts in July 2019. Especially at the beginning I missed some posts due to down time, and even now I occasionally miss some posts due to connection problems.
Since February 2020, I'm scraping and archiving all older posts too (this will take a couple more months to complete) Update: this was completed in August 2020.

What it does
I've created an on-demand service to get details from any topic for:
  • All posts that I didn't archive yet*
  • All posts that have been edited*
  • All posts that have been deleted
  • All posts that received Merit (not implemented yet)
*I create a new archive of every edited or unarchived post.

The Topic Details
If a new update for the same topic is requested, I'll include a list of all previous Topic Details.
You'll have to make a new post to be detected by my scraper. Editing an existing post won't get detected.
Please don't quote the archive link, it'll trigger another update.

Sample output explained
Image loading...
Post 50988796 links to the post on Bitcointalk (even if the post itself has been deleted).
Post 235 is older than my archive. I don't have an unedited backup, so I created a new backup.
Post 236 is Deleted! I have an unedited backup, no need for a new backup.
Post 242 was Edited! I have an unedited backup, and created a new backup of the current post.
Post 246 is Unedited! No need for a new backup.
Post 251 doesn't have an unedited backup (which means my scraper was offline at the moment), so I created a new backup.
The (link) at the end of each line points at that specific row in my list.

Image loading...
If I have no unedited backup, I check if I made a later backup. This backup can't tell if the post was edited in the first months (or even years), so I don't mark the post as Edited! or Unedited!. However, if the post was edited after I created the backup, I make a new backup.
If a post was removed before I tried to archive it, I (obviously) can't list it.

Limitations
  • Only one request per post. If you post more than one request, only the first one is processed.
  • I allow 5 tasks at once! If my scraper is busy (see status.txt), you'll have to wait a bit and post a new request (in a new post). It's okay to delete or edit the post afterwards.
  • Topics in Investigation are ignored.
  • Creating an overview takes about 10 seconds per page (to limit load on the forum). However, if several tasks are running simultaneously, it slows down other tasks. It might also take a bit longer for topics with many deleted posts.
  • Every quote has "Today" in my archived post, while the actual post now shows the date. I ignore this when comparing the current post and my archive. A few seconds around the end of each day, this can lead to a post accidentally being marked as edited.
  • My initial plan was to make this for a user's post history too, but it was much more work than anticipated, so I skipped that.
  • This service is currently limited to scraping the first 250 pages of a topic. If that's not enough, or you want for instance pages 400-500, feel free to post your request.
    If there are more than 250 pages, all archived posts will be marked as "Deleted" (example). I know this isn't ideal, but I'll let it be for now.
  • Scrape time is Amsterdam time, but the time mentioned in scraped quotes is forum time.

Test it!
Please try it, and let me know if it works as expected.

Bugs
Please post! This is far from finished.

Intended use
I'm hoping this can be useful to expose certain scammers. Please don't turn this into a(nother) witch hunt.

Be nice
Don't try to abuse this. I don't want to make a blacklist, but I will if I have to.

Todo
  • Fix "Today" for today's posts Done!
  • Image tags seem to change within the HTML code over time, so unedited posts with images might be marked as edited. I'm not sure yet how to tackle this.
  • Show Merit per post.
  • Fix the missing username Done!
  • highlight banned users
  • Add "older posts" to my topic-lists (once I'm done, so I can catch it if those have been deleted (after I scraped them, of course). See this post. Such deleted posts (for instance in staked addresses are currently overlooked
  • Also list deleted "older" posts. I only found out now (May 29, 2021) that this doesn't work.
Pages:
Jump to: