Author

Topic: How to get all posts through "recent"? (Read 393 times)

administrator
Activity: 5222
Merit: 13032
October 03, 2019, 04:14:05 PM
#14
That page looks different in Firefox and in Chrome, but I can't really figure out what I'm looking at.

It's an XML sitemap file. Search engines use that file to keep up-to-date on forum posts. It's designed for computers to process, not humans; different browsers display it differently.
legendary
Activity: 2380
Merit: 5213
October 03, 2019, 01:31:28 PM
#13
Most of the missed posts are in Wall Observer thread
25 out of those 40,000 posts have been missed.
14 out of 25 posts are in hidden threads. So we can say that 11 out of 40,000 posts have been missed. 9 out of 11 missed posts are in Wall Observer thread. That's 82% of missed posts.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
October 03, 2019, 10:43:47 AM
#12
Are you sure?
No, apparently I wasn't sure. I tested it again, and all 5 test-posts in this thread ended showed up in "recent".

I checked 40,000 of my scraped posts, and I have:
9990/10000
9996/10000
9996/10000
9993/10000

That means it must have been a coincidence that I missed 2 posts in the Wall Observer thread in a short time span, right at the moment I was testing my scraper there.

I can live with missing less than 0.1% of all posts (and some of the missing posts are on hidden boards (I only know of VIP and Staff boards) and Investigations is excluded.

May I know how could you find this post only with knowing msgID?
Quote a post - any post, doesn't matter. Then in the URL that looks like this:

Code:
https://bitcointalk.org/index.php?action=post;quote=52617659;topic=5189156.0;num_replies=8;sesc=...

Replace the number after "quote=" with the ID of the post you're looking for.
That's a neath trick! But difficult to automate, so I can't really use it to check for missing posts.

All posts you can see should be listed there, though due to database concurrency limitations, ones made in the last few seconds might not show up, even if others before/after them do.
Thanks. If it's a known limitation, I'll just let it be Smiley

Quote
Note that if you don't need to get posts ASAP, it may be more easy and efficient for you to use https://bitcointalk.org/sitemap.php. All of the last-modification times are accurate to within a couple of hours.
That page looks different in Firefox and in Chrome, but I can't really figure out what I'm looking at.
administrator
Activity: 5222
Merit: 13032
October 01, 2019, 05:47:49 PM
#11
All posts you can see should be listed there, though due to database concurrency limitations, ones made in the last few seconds might not show up, even if others before/after them do.

Note that if you don't need to get posts ASAP, it may be more easy and efficient for you to use https://bitcointalk.org/sitemap.php. All of the last-modification times are accurate to within a couple of hours.
legendary
Activity: 3654
Merit: 8909
https://bpip.org
October 01, 2019, 12:46:59 PM
#10
May I know how could you find this post only with knowing msgID?

Quote a post - any post, doesn't matter. Then in the URL that looks like this:

Code:
https://bitcointalk.org/index.php?action=post;quote=52617659;topic=5189156.0;num_replies=8;sesc=...

Replace the number after "quote=" with the ID of the post you're looking for. You'll get the post quoted in the text box and then you can use the user's post history and the contents of the post to find it. Note that the link in the quote is not valid, e.g. if you click Preview and click the link it won't go to the correct thread.
staff
Activity: 2408
Merit: 2021
I find your lack of faith in Bitcoin disturbing.
October 01, 2019, 12:07:33 PM
#9
Some of those might be missing legitimately - e.g. quickly deleted, or posted on an invisible board, for example:
[...]
52602024
52602357

I didn't check the other ids, but these two are the last 2 posts in the Staff forum.

Invisible boards?
Which boards are invisible? Are there some boards that are only visible to moderators?

Yes there is a special board for the Staff. Another one for the VIPs. And maybe other boards, but I don't have access to these ones Smiley.
And a special one for the April Fool's Day ideas, Theymos takes this very seriously Smiley.

Ok, I'm lying for the last one.
legendary
Activity: 2380
Merit: 5213
October 01, 2019, 11:17:11 AM
#8
Some of those might be missing legitimately - e.g. quickly deleted, or posted on an invisible board, for example:
Invisible boards?
Which boards are invisible? Are there some boards that are only visible to moderators?

All others seem to exist in the WO thread, except this one in a different thread:
Do you mean this thread?
So, there is a bug. Am I right?
All of the posts in this thread should be shown in "Recent" too.

May I know how could you find this post only with knowing msgID?
The links of posts contain topic number too.
legendary
Activity: 3654
Merit: 8909
https://bpip.org
October 01, 2019, 10:49:38 AM
#7
These numbers are the IDs of missed posts in http://loyce.club/archive/posts/5259/ and http://loyce.club/archive/posts/5260/

Some of those might be missing legitimately - e.g. quickly deleted, or posted on an invisible board, for example:

52591179
52591721
52592748
52598319
52598892
52602024
52602357

All others seem to exist in the WO thread, except this one in a different thread:

https://bitcointalksearch.org/topic/--5026942
legendary
Activity: 2380
Merit: 5213
October 01, 2019, 10:42:48 AM
#6
As far as I know all posts should be shown in "recent".
These numbers are the IDs of missed posts in http://loyce.club/archive/posts/5259/ and http://loyce.club/archive/posts/5260/
52590233
52591100
52591174
52591179
52591311
52591721
52592748
52597731
52598319
52598892
52602024
52602357
52604597
52607589
It seems that there is a bug. It can be from Loyce.club or Bitcointalk.
legendary
Activity: 3654
Merit: 8909
https://bpip.org
October 01, 2019, 10:26:55 AM
#5
test

Edit: it looks like I was wrong, sorry LoyceV for confusing you. I got the same result as o_e_l_e_o. No idea then why you missed those posts.
legendary
Activity: 2268
Merit: 18748
October 01, 2019, 10:25:15 AM
#4
Are you sure? See posts 26 and 30 in the screenshot below - both show up for me in recent, both in the same thread (Here: https://bitcointalksearch.org/topic/bountyieo-cryptoknowmics-worlds-first-decentralize-media-platform-live-5174107).





Edit: Confirmed I can see this post and suchmoon's test post below both on recent at the same time, albeit on different pages.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
October 01, 2019, 09:34:58 AM
#3
Doesn't "recent" show only the most recent post in each thread?
You're right! Mind blown :O

I never knew that. I'll edit the title to my new question: how do I get all posts? This messes up my data projects.
legendary
Activity: 3654
Merit: 8909
https://bpip.org
October 01, 2019, 09:32:37 AM
#2
Doesn't "recent" show only the most recent post in each thread? Not sure where I got that from but I thought that's how it worked.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
October 01, 2019, 09:05:33 AM
#1
While scraping recent, I noticed I missed some posts. My logs show this:
Quote
Downloading recent.html
1. userID: 819696 - username: Hypnosis00 - msgID: 52615289
2. userID: 2286354 - username: FrequencyRules058 - msgID: 52615288
3. userID: 662400 - username: kzv - msgID: 52615287
4. userID: 1226689 - username: phoen - msgID: 52615285
5. userID: 93751 - username: ltcdice - msgID: 52615284
6. userID: 947291 - username: Polar91 - msgID: 52615283
7. userID: 2480302 - username: Bullrunking - msgID: 52615282
8. userID: 543165 - username: citronick - msgID: 52615281
9. userID: 2294946 - username: reena024 - msgID: 52615280
10. userID: 1000199 - username: krogothmanhattan - msgID: 52615279
The post ending on 86 this post is missing. I missed another post from the same thread too. I don't have the board on ignore, some other posts in the same thread show up as expected.

It's missing from half way the recent-page, and I have the same post missing a few seconds earlier or later too. That means the post was really missing from the page, which makes me think it's a bug in "recent".
Jump to: