Pages:
Author

Topic: Viewing unedited posts and deleted posts, view per post, per user or per topic - page 6. (Read 8635 times)

legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
There are two types of deleted post.
1. Deleted by moderators
2. Deleted by users.

Do you have any plan to differentiate this two types?
There's a third: Deleted by the topic starter of a self-moderated topic. I could only get data on the first category by keeping track of modlog, but I'm not going to add this feature as it will lead to incomplete data.
full member
Activity: 333
Merit: 105
www.cd3d.app
  • All posts that have been deleted (within or after 10 minutes)

There are two types of deleted post.
1. Deleted by moderators
2. Deleted by users.

Do you have any plan to differentiate this two types?
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
I'm thinking about an upgrade for searching topic histories: it's currently quite difficult to see which posts are deleted. I want to make a new thread (but need some time to do so) to request a topic analysis, after which the result will be a list of all posts in that topic.
In this, I'll highlight:
  • All posts that I didn't scrape
  • All posts that have been edited (within or after 10 minutes)
  • All posts that have been deleted
  • All posts that received Merit
I think this can be useful, but I need some time Smiley
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
May I ask why you don't archive post titles?
It started out as lazyness, and I kept it that way for compatibility.

Quote
They would be useful while detecting deleted topics made by suspected scammers, I've seen some which only put their price tag in the title and not in their post body.
I still have topic titles, see http://loyce.club/archive/topics/516/5167469.html for example. You just don't find it with each post.
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
May I ask why you don't archive post titles? They would be useful while detecting deleted topics made by suspected scammers, I've seen some which only put their price tag in the title and not in their post body.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Another thing I've been thinking about is if someone posts something that's not allowed by my webhost. If that would happen, I'll have to edit it too.
Well, that's a first: I censored data
The credit card info sale is not good. pointed me at this post: http://loyce.club/archive/posts/5417/54178004.html
I didn't check if my webhost allows it, but censored it anyway.

Update (April 23, 2020): I received a PM (I'll keep the sender private, so sorry, no credits) about http://loyce.club/archive/posts/5428/54281329.html, which is now censored too.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
1. Shiversnow
BTC Address: 12ujAKqXCwxFipZ6a8xdpXAo7EoitSGwMd



2. Ingoats
BTC Address: 12iAFdKUFf2BincSosJt3Ns2x1xSS3okFi



I just only need these both post to make sure they're apply in that's campaign.
When I use google search I found it, but in the forum I couldn't find it.
I searched starting at post 50385301 (March 30, 2019) until post 50844598 (May 1, 2019). Out of those 459298 posts, I have saved 431578. I must have experienced some down time.

I found "12ujAKqXCwxFipZ6a8xdpXAo7EoitSGwMd" in only this (unedited) post:
Code:
Shiversnow
972438
50393313
Economy / Services / Re: [OPEN] [SIGNATURE CAMPAIGN] BLOCKCHAIN ZERO TO ONE

#Proof of Authentication
Bitcointalk Username: Shiversnow
Rank: Full Member
Bitcoin Wallet Address: 12ujAKqXCwxFipZ6a8xdpXAo7EoitSGwMd

I found "12iAFdKUFf2BincSosJt3Ns2x1xSS3okFi" in only this (unedited) post:
Code:
Ingoats
1083582
50393708
Economy / Services / Re: [OPEN] [SIGNATURE CAMPAIGN] BLOCKCHAIN ZERO TO ONE

#Proof of Authentication
Bitcointalk Username: Ingoats
Rank: Full Member
Bitcoin Wallet Address: 12iAFdKUFf2BincSosJt3Ns2x1xSS3okFi
member
Activity: 213
Merit: 53
1. Shiversnow
BTC Address: 12ujAKqXCwxFipZ6a8xdpXAo7EoitSGwMd



2. Ingoats
BTC Address: 12iAFdKUFf2BincSosJt3Ns2x1xSS3okFi



I just only need these both post to make sure they're apply in that's campaign.
When I use google search I found it, but in the forum I couldn't find it.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Probably, it's part of this:
I also have older posts: I've saved (most) unedited posts (6.2 million posts) since September 12, 2018, until the start of this topic. This data has not been added to this topic, and I can't really add it because I tried to remove quotes and that has some bugs. You can request to dig up unedited data when needed.
But that's currently stored in large compressed files. Can you tell me what exactly you're looking for? I can probably dig up all posts made in that topic (without quotes and the above mentioned bugs), but it might be easier if you tell me what you're looking for.
member
Activity: 213
Merit: 53
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Update: I now have the first 11 million posts scraped!
Some of today's weird usernames will most likely end up in my files forever! I scrape topics, and get the username from the post, not from the profile. Oh well.......
Update: Usernames aren't affected without logging in. I'm quite happy with that, as most of my scraping doesn't use an account.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
See http://loyce.club/archive/topics/ for posts made in a certain topic (Working!)
Updated every 5 minutes.
Notification: On many of those pages, the topic title was missing (due to an error). I've temporarily renamed the current version to http://loyce.club/archive/topics.old____fixing_errors_in_some_of_the_titles/. If you're looking for this data, use this, but please don't post links to that URL. Once the update is done, I'll remove this link.
The normal location, http://loyce.club/archive/topics/, has incomplete data at the moment.

Update: done! The normal link works again Smiley
legendary
Activity: 2940
Merit: 7892
I'm looking for a post from July 2018.
I started scraping 2 months later (I haven't published those posts online). My older post scraping project scrapes one thread at a time, so if the post was made in a old topic and only deleted recently, I might have it. But that's not very likely.

That's alright, I managed to find a copy of it elsewhere. Thanks for the info.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
I'm looking for a post from July 2018.
I started scraping 2 months later (I haven't published those posts online). My older post scraping project scrapes one thread at a time, so if the post was made in a old topic and only deleted recently, I might have it. But that's not very likely.
legendary
Activity: 2940
Merit: 7892
Update: I now have the first 11 million posts scraped!
It took longer than expected (real life has been very busy lately), but it's done: I now have posts up to April 6, 2015 archived.

Out of these 11 million first posts, 1,520,880 (13.8%) are Deleted or Off-limits (most likely deleted).

Hi Loyce, by any chance did you get around to archiving posts after April 2015 and up to where you started at July 2019? I'm looking for a post from July 2018. Thanks.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Update: I now have the first 11 million posts scraped!
It took longer than expected (real life has been very busy lately), but it's done: I now have posts up to April 6, 2015 archived.

Out of these 11 million first posts, 1,520,880 (13.8%) are Deleted or Off-limits (most likely deleted).
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Update: I now have the first 11 million posts scraped! At the moment, the first 6.1 million are available on loyce.club, processing all data takes approximately 3 days. At the current rate, I'm on schedule to complete archiving all posts around August.
I've been thinking about expanding my archived posts to all posts that haven't been deleted yet.
An update: I have started this project! Measured in scraping time, it's the biggest project I ever started. In the past 9 days, I've scraped about 4% of all data, so I expect to complete this around August.
There's also a chance I'll run out of disk space because of the millions of large posts made by bounty spammers, but I'll deal with that when it happens.

Sneak preview: http://loyce.club/archive/oldposts/
How to use:
  • Find the msgID you need. Let's use 28228
  • Remove the last 5 digits from the msgID to get the directory name (if there are less than 5 digits, use 0): 0
  • Replace the last 2 digits of the msgID by xx, and add .html (if there are less than 5 digits, use 0xx): 282xx.html
  • Add "#msg" and the msgID: #msg28228
  • Put everything together and go to http://loyce.club/archive/oldposts/0/282xx.html#msg28228

Limitations
  • Currently, the first 2.1 million posts are available.
  • I'll scrape the first 5.21 million topics and all posts in there.
  • That means I'll archive 53.36 million posts, this partially overlaps with my scraper for new posts.
  • This is a one-time thing, I won't update it with newer posts (I scrape unedited versions for those).
  • The time "scraped on" is Amsterdam time.

If no username is mentioned, it's either "Anonymous" or "random". I forgot those exist when I started scraping, and it's not important enough to start over.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
https://bitcointalksearch.org/topic/bounty-qravity-ico-4337249

Quote the post and then click on preview. You will see that the post is shown correctly. But it doesn't work when it is posted. It says "INVALID BBCODE: close of unopened tag in table (1)"
Maybe it hits the 64 kB limit in HTML, the Russian characters take a lot more space that way. I'm not sure if that's the limit though, I've made posts that take 80 kB when scraped.

Interesting how you can quote the post to see contents. I know this is a scenario that's too specific but I'll post my two cents anyway.
There are more bug in SMF that cause the preview to show differently than the real post.

Quote
if posts with broken bbcode are still unedited and quotable, then their contents could be salvaged. Could that be worth pursuing?
I only want to archive what the forum shows as public information.
legendary
Activity: 2394
Merit: 1412
Leading Crypto Sports Betting & Casino Platform
I don't do BBCode, I only read HTML. This error came from the forum, not from me. I just save the post as it was made before editing.
I should have guessed this. It must be the easiest way to scrape anyway...

Quote the post and then click on preview. You will see that the post is shown correctly. But it doesn't work when it is posted. It says "INVALID BBCODE: close of unopened tag in table (1)"
Interesting how you can quote the post to see contents. I know this is a scenario that's too specific but I'll post my two cents anyway.

Contents of posts otherwise invisible due including a table with broken tags are accessible to any forum member able to quote the post, but invisible in the eyes of robots. I don't see any utility for any poster to do this to their posts intentionally. If they can edit their threads contents could be replaced with something like a dot and be done with it.

But it could be that a few thousands of such posts exist. Google gives out 3100 results when you google ("INVALID BBCODE: close of unopened tag in table" site:bitcointalk.org), some duplicates and some coming from signatures of course.
2550 results if you remove two users that came up with broken sugnatures ("INVALID BBCODE: close of unopened tag in table" site:bitcointalk.org -Gamesbuy -trinaldao)

Now, I'm stepping into territory of a sub-case in a sub-case, but if posts with broken bbcode are still unedited and quotable, then their contents could be salvaged. Could that be worth pursuing? Probably not. But strictly speaking it should be done if you'd want to grab everything that's available.
legendary
Activity: 2380
Merit: 5213
I see here that a thread was captured with just the error message for wrong BBcode:
http://loyce.club/archive/posts/5395/53951381.html

Any way to include the content in spite of wrong BBcode?
I don't do BBCode, I only read HTML. This error came from the forum, not from me. I just save the post as it was made before editing.

Yes, that's an error (maybe a bug) from the forum.

That archived post was like the following post.

https://bitcointalksearch.org/topic/bounty-qravity-ico-4337249

Quote the post and then click on preview. You will see that the post is shown correctly. But it doesn't work when it is posted. It says "INVALID BBCODE: close of unopened tag in table (1)"
Pages:
Jump to: