Archiving a thread is now as easy as posting the right link anywhere on the forum (and waiting a bit)!
Short version:Get a topicID you want to see, for instance
5145594.
Insert the topicID into the following link and
post it on any public board on Bitcointalk:
https://loyce.club/archive/details/topic_5145594.htmlWait a bit, then click the link!
Full versionAlmost a year ago, I opened
35M posts! View unedited/deleted posts (search per post, per user or per topic). This has a lot of data (currently around 60 GB), but it's painstaking to manually find exactly which posts in a topic are edited or deleted.
I started archiving posts in July 2019. Especially at the beginning I missed some posts due to down time, and even now I occasionally miss some posts due to connection problems.
Since
February 2020, I'm scraping and archiving all older posts too
(this will take a couple more months to complete) Update: this was completed in August 2020.
What it doesI've created an on-demand service to get details from any topic for:
- All posts that I didn't archive yet*
- All posts that have been edited*
- All posts that have been deleted
- All posts that received Merit (not implemented yet)
*I create a new archive of every edited or unarchived post.
The Topic DetailsIf a new update for the same topic is requested, I'll include a list of all previous Topic Details.
You'll have to make a new post to be detected by my scraper. Editing an existing post won't get detected.
Please don't quote the archive link, it'll trigger another update.
Sample output explainedPost
50988796 links to the post on Bitcointalk (even if the post itself has been deleted).
Post 235 is older than my
archive. I don't have an unedited backup, so I created a new backup.
Post 236 is
Deleted! I have an unedited backup, no need for a new backup.
Post 242 was
Edited! I have an unedited backup, and created a new backup of the current post.
Post 246 is
Unedited! No need for a new backup.
Post 251 doesn't have an unedited backup (which means my scraper was offline at the moment), so I created a new backup.
The (
link) at the end of each line points at that specific row in my list.
If I have
no unedited backup, I check if I made a
later backup. This backup can't tell if the post was edited in the first months (or even years), so I don't mark the post as
Edited! or
Unedited!. However, if the post was edited after I created the backup, I make a new backup.
If a post was removed before I tried to archive it, I (obviously) can't list it.
Limitations- Only one request per post. If you post more than one request, only the first one is processed.
- I allow 5 tasks at once! If my scraper is busy (see status.txt), you'll have to wait a bit and post a new request (in a new post). It's okay to delete or edit the post afterwards.
- Topics in Investigation are ignored.
- Creating an overview takes about 10 seconds per page (to limit load on the forum). However, if several tasks are running simultaneously, it slows down other tasks. It might also take a bit longer for topics with many deleted posts.
- Every quote has "Today" in my archived post, while the actual post now shows the date. I ignore this when comparing the current post and my archive. A few seconds around the end of each day, this can lead to a post accidentally being marked as edited.
- My initial plan was to make this for a user's post history too, but it was much more work than anticipated, so I skipped that.
- This service is currently limited to scraping the first 250 pages of a topic. If that's not enough, or you want for instance pages 400-500, feel free to post your request.
If there are more than 250 pages, all archived posts will be marked as "Deleted" (example). I know this isn't ideal, but I'll let it be for now. - Scrape time is Amsterdam time, but the time mentioned in scraped quotes is forum time.
Test it!Please try it, and let me know if it works as expected.
BugsPlease post! This is far from finished.
Intended useI'm hoping this can be useful to expose certain scammers. Please don't turn this into a(nother) witch hunt.
Be niceDon't try to abuse this. I don't want to make a blacklist, but I will if I have to.
TodoFix "Today" for today's posts Done!- Image tags seem to change within the HTML code over time, so unedited posts with images might be marked as edited. I'm not sure yet how to tackle this.
- Show Merit per post.
Fix the missing username Done!- highlight banned users
Add "older posts" to my topic-lists (once I'm done, so I can catch it if those have been deleted (after I scraped them, of course). See this post. Such deleted posts (for instance in staked addresses are currently overlooked- Also list deleted "older" posts. I only found out now (May 29, 2021) that this doesn't work.