Pages:
Author

Topic: Viewing unedited posts and deleted posts, view per post, per user or per topic - page 6. (Read 8800 times)

staff
Activity: 3500
Merit: 6152
I found a bug (which I'm posting here as a reminder to myself): Posts on the עברי (Hebrew) board don't show up. Example: this post is missing, while it exists.
I'll see if I can add them later. I think it has something to do with the right-to-left writing, even selecting text on that board doesn't work as expected.
Update: عربية (Arabic) has the same problem.

I could be wrong but I think that's only valid for the posts that have both latin and arabic/hebrew alphabets.

A suggestion... Have you thought about making a browser extension? I think it would be nice if we could access the archived posts directly from the forum's posts instead of having to manually check every post.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Sneak preview: http://loyce.club/archive/oldposts/
How to use:
  • Find the msgID you need. Let's use 28228
  • Remove the last 5 digits from the msgID to get the directory name (if there are less than 5 digits, use 0): 0
  • Replace the last 2 digits of the msgID by xx, and add .html (if there are less than 5 digits, use 0xx): 282xx.html
  • Add "#msg" and the msgID: #msg28228
  • Put everything together and go to http://loyce.club/archive/oldposts/0/282xx.html#msg28228
I found a bug (which I'm posting here as a reminder to myself): Posts on the עברי (Hebrew) board don't show up. Example: this post is missing, while it exists.
I'll see if I can add them later. I think it has something to do with the right-to-left writing, even selecting text on that board doesn't work as expected.
Update: عربية (Arabic) has the same problem.

The problem doesn't occur with my real-time post scraper. this post for instance is archived just fine.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
There are two types of deleted post.
1. Deleted by moderators
2. Deleted by users.

Do you have any plan to differentiate this two types?
There's a third: Deleted by the topic starter of a self-moderated topic. I could only get data on the first category by keeping track of modlog, but I'm not going to add this feature as it will lead to incomplete data.
full member
Activity: 333
Merit: 105
www.cd3d.app
  • All posts that have been deleted (within or after 10 minutes)

There are two types of deleted post.
1. Deleted by moderators
2. Deleted by users.

Do you have any plan to differentiate this two types?
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
I'm thinking about an upgrade for searching topic histories: it's currently quite difficult to see which posts are deleted. I want to make a new thread (but need some time to do so) to request a topic analysis, after which the result will be a list of all posts in that topic.
In this, I'll highlight:
  • All posts that I didn't scrape
  • All posts that have been edited (within or after 10 minutes)
  • All posts that have been deleted
  • All posts that received Merit
I think this can be useful, but I need some time Smiley
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
May I ask why you don't archive post titles?
It started out as lazyness, and I kept it that way for compatibility.

Quote
They would be useful while detecting deleted topics made by suspected scammers, I've seen some which only put their price tag in the title and not in their post body.
I still have topic titles, see http://loyce.club/archive/topics/516/5167469.html for example. You just don't find it with each post.
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
May I ask why you don't archive post titles? They would be useful while detecting deleted topics made by suspected scammers, I've seen some which only put their price tag in the title and not in their post body.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Another thing I've been thinking about is if someone posts something that's not allowed by my webhost. If that would happen, I'll have to edit it too.
Well, that's a first: I censored data
The credit card info sale is not good. pointed me at this post: http://loyce.club/archive/posts/5417/54178004.html
I didn't check if my webhost allows it, but censored it anyway.

Update (April 23, 2020): I received a PM (I'll keep the sender private, so sorry, no credits) about http://loyce.club/archive/posts/5428/54281329.html, which is now censored too.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
1. Shiversnow
BTC Address: 12ujAKqXCwxFipZ6a8xdpXAo7EoitSGwMd



2. Ingoats
BTC Address: 12iAFdKUFf2BincSosJt3Ns2x1xSS3okFi



I just only need these both post to make sure they're apply in that's campaign.
When I use google search I found it, but in the forum I couldn't find it.
I searched starting at post 50385301 (March 30, 2019) until post 50844598 (May 1, 2019). Out of those 459298 posts, I have saved 431578. I must have experienced some down time.

I found "12ujAKqXCwxFipZ6a8xdpXAo7EoitSGwMd" in only this (unedited) post:
Code:
Shiversnow
972438
50393313
Economy / Services / Re: [OPEN] [SIGNATURE CAMPAIGN] BLOCKCHAIN ZERO TO ONE

#Proof of Authentication
Bitcointalk Username: Shiversnow
Rank: Full Member
Bitcoin Wallet Address: 12ujAKqXCwxFipZ6a8xdpXAo7EoitSGwMd

I found "12iAFdKUFf2BincSosJt3Ns2x1xSS3okFi" in only this (unedited) post:
Code:
Ingoats
1083582
50393708
Economy / Services / Re: [OPEN] [SIGNATURE CAMPAIGN] BLOCKCHAIN ZERO TO ONE

#Proof of Authentication
Bitcointalk Username: Ingoats
Rank: Full Member
Bitcoin Wallet Address: 12iAFdKUFf2BincSosJt3Ns2x1xSS3okFi
member
Activity: 213
Merit: 53
1. Shiversnow
BTC Address: 12ujAKqXCwxFipZ6a8xdpXAo7EoitSGwMd



2. Ingoats
BTC Address: 12iAFdKUFf2BincSosJt3Ns2x1xSS3okFi



I just only need these both post to make sure they're apply in that's campaign.
When I use google search I found it, but in the forum I couldn't find it.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Probably, it's part of this:
I also have older posts: I've saved (most) unedited posts (6.2 million posts) since September 12, 2018, until the start of this topic. This data has not been added to this topic, and I can't really add it because I tried to remove quotes and that has some bugs. You can request to dig up unedited data when needed.
But that's currently stored in large compressed files. Can you tell me what exactly you're looking for? I can probably dig up all posts made in that topic (without quotes and the above mentioned bugs), but it might be easier if you tell me what you're looking for.
member
Activity: 213
Merit: 53
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Update: I now have the first 11 million posts scraped!
Some of today's weird usernames will most likely end up in my files forever! I scrape topics, and get the username from the post, not from the profile. Oh well.......
Update: Usernames aren't affected without logging in. I'm quite happy with that, as most of my scraping doesn't use an account.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
See http://loyce.club/archive/topics/ for posts made in a certain topic (Working!)
Updated every 5 minutes.
Notification: On many of those pages, the topic title was missing (due to an error). I've temporarily renamed the current version to http://loyce.club/archive/topics.old____fixing_errors_in_some_of_the_titles/. If you're looking for this data, use this, but please don't post links to that URL. Once the update is done, I'll remove this link.
The normal location, http://loyce.club/archive/topics/, has incomplete data at the moment.

Update: done! The normal link works again Smiley
legendary
Activity: 3010
Merit: 8114
I'm looking for a post from July 2018.
I started scraping 2 months later (I haven't published those posts online). My older post scraping project scrapes one thread at a time, so if the post was made in a old topic and only deleted recently, I might have it. But that's not very likely.

That's alright, I managed to find a copy of it elsewhere. Thanks for the info.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
I'm looking for a post from July 2018.
I started scraping 2 months later (I haven't published those posts online). My older post scraping project scrapes one thread at a time, so if the post was made in a old topic and only deleted recently, I might have it. But that's not very likely.
legendary
Activity: 3010
Merit: 8114
Update: I now have the first 11 million posts scraped!
It took longer than expected (real life has been very busy lately), but it's done: I now have posts up to April 6, 2015 archived.

Out of these 11 million first posts, 1,520,880 (13.8%) are Deleted or Off-limits (most likely deleted).

Hi Loyce, by any chance did you get around to archiving posts after April 2015 and up to where you started at July 2019? I'm looking for a post from July 2018. Thanks.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Update: I now have the first 11 million posts scraped!
It took longer than expected (real life has been very busy lately), but it's done: I now have posts up to April 6, 2015 archived.

Out of these 11 million first posts, 1,520,880 (13.8%) are Deleted or Off-limits (most likely deleted).
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Update: I now have the first 11 million posts scraped! At the moment, the first 6.1 million are available on loyce.club, processing all data takes approximately 3 days. At the current rate, I'm on schedule to complete archiving all posts around August.
I've been thinking about expanding my archived posts to all posts that haven't been deleted yet.
An update: I have started this project! Measured in scraping time, it's the biggest project I ever started. In the past 9 days, I've scraped about 4% of all data, so I expect to complete this around August.
There's also a chance I'll run out of disk space because of the millions of large posts made by bounty spammers, but I'll deal with that when it happens.

Sneak preview: http://loyce.club/archive/oldposts/
How to use:
  • Find the msgID you need. Let's use 28228
  • Remove the last 5 digits from the msgID to get the directory name (if there are less than 5 digits, use 0): 0
  • Replace the last 2 digits of the msgID by xx, and add .html (if there are less than 5 digits, use 0xx): 282xx.html
  • Add "#msg" and the msgID: #msg28228
  • Put everything together and go to http://loyce.club/archive/oldposts/0/282xx.html#msg28228

Limitations
  • Currently, the first 2.1 million posts are available.
  • I'll scrape the first 5.21 million topics and all posts in there.
  • That means I'll archive 53.36 million posts, this partially overlaps with my scraper for new posts.
  • This is a one-time thing, I won't update it with newer posts (I scrape unedited versions for those).
  • The time "scraped on" is Amsterdam time.

If no username is mentioned, it's either "Anonymous" or "random". I forgot those exist when I started scraping, and it's not important enough to start over.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
https://bitcointalksearch.org/topic/bounty-qravity-ico-4337249

Quote the post and then click on preview. You will see that the post is shown correctly. But it doesn't work when it is posted. It says "INVALID BBCODE: close of unopened tag in table (1)"
Maybe it hits the 64 kB limit in HTML, the Russian characters take a lot more space that way. I'm not sure if that's the limit though, I've made posts that take 80 kB when scraped.

Interesting how you can quote the post to see contents. I know this is a scenario that's too specific but I'll post my two cents anyway.
There are more bug in SMF that cause the preview to show differently than the real post.

Quote
if posts with broken bbcode are still unedited and quotable, then their contents could be salvaged. Could that be worth pursuing?
I only want to archive what the forum shows as public information.
Pages:
Jump to: