Pages:
Author

Topic: Viewing unedited posts and deleted posts, view per post, per user or per topic - page 4. (Read 8800 times)

legendary
Activity: 1512
Merit: 7340
Farewell, Leo
The search tool looks great, good job @bitmover!

I would just like to report a bug, once I search for satoshi it returns me error 404. On other members it works fine.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
It seems like a decent amount of work and effort is needed but now that I think of it does the benefits justify it?
No Tongue I have enough to do already, but worse: I'd have to check every post one by one, instead of downloading several posts at once from recent.
sr. member
Activity: 840
Merit: 375
I scrape posts as a guest, without logging in. That means I only see HTML, there is no BBCode.
The only way to see BBCode of a post would be by clicking "edit" (for my own posts) or "quote" for posts from other users. And that's only possible if the topic isn't locked.
I guess the forum database stores posts as BBCode, but that's above my pay grade.

Well one solution is first making a dummy bitcointalk account; to get a post quote that's the template:
Code:
https://bitcointalk.org/index.php?action=post;quote=POST_NUMBER;topic=TOPIC_NUMBER;sesc=SESSION_TOKEN
after replacing the three fields we get a link where we can get the the content of the textarea html tag which contains the bbcode of the post.

Ofcourse since it's a dummy account it won't have posts so
clicking "edit" (for my own posts)
isn't needed

It seems like a decent amount of work and effort is needed but now that I think of it does the benefits justify it? I don't know
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Have you ever considered scraping the bbcode of posts instead of the html? Or at least scrape both of them?
I scrape posts as a guest, without logging in. That means I only see HTML, there is no BBCode.
The only way to see BBCode of a post would be by clicking "edit" (for my own posts) or "quote" for posts from other users. And that's only possible if the topic isn't locked.
I guess the forum database stores posts as BBCode, but that's above my pay grade.
sr. member
Activity: 840
Merit: 375
Hey LoyceV

Have you ever considered scraping the bbcode of posts instead of the html? Or at least scrape both of them? This would solve some problems, one of them is determining accuratly if an image changed in a post or not

This idea might be naive but let me know your thoughts eitherway
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Is there another page that shows this
See LoyceV's Topic Details: highlight deleted and edited posts (forum wide):
I think it would be super useful if we could see which posts/replies of a user profile have been deleted by having it highlighted on the user page also...
I agree, but:
My initial plan was to make this for a user's post history too, but it was much more work than anticipated, so I skipped that.
I could still add it when I have time, but my "topic details" are barely used, and I expect the same for "user details".



TryNinja's site can help searching this data:
The new ninjastic.space website is out! Remade from scratch.

- Unedited/archived post: https://ninjastic.space/post/55139442
- Unedited/archived posts on specific topic: https://ninjastic.space/topic/5273824
- Unedited/archived posts by address: https://ninjastic.space/address/1NinjabXd5znM5zgTcmxDVzH4w3nbaY16L
legendary
Activity: 3696
Merit: 2219
💲🏎️💨🚓
Hopefully I'm in he right thread, this page: https://loyce.club/archive/members/13/131361.html only shows the initial post, not any deletions, edits etc Is there another page that shows this, or has this page not updated?
sr. member
Activity: 840
Merit: 375
You may want to read my OP Wink That post can be found here. It was scraped a lot later though, so it's not "unedited".
I see. I'll just make it display "Original post not available" if I get a 404 error from your website.



Thank you for taking the time to respond to my questions!
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
sometimes I get an 404 error when requesting for example this post, my best guess is that posts prior to a certain certain haven't been archived, is that it?
You may want to read my OP Wink That post can be found here. It was scraped a lot later though, so it's not "unedited".
sr. member
Activity: 840
Merit: 375
Let's give it a try Smiley
Done, test it please.

Works! Thanks Cheesy

Here's a quick sneak peek:





Another question (sorry  Grin), sometimes I get an 404 error when requesting for example this post, my best guess is that posts prior to a certain certain haven't been archived, is that it?
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
sr. member
Activity: 840
Merit: 375

I currently have this in apache2.conf:
Code:
# Code from suchmoon, added July 8, 2020. See https://bitcointalk.org/index.php?topic=5102296.msg54755930#msg54755930                                          
                                                                                                                                   
  Header set Access-Control-Allow-Origin "https://bitcointalk.org"                                                                                              


From my understanding, this only allows requests to the latestversion.txt file. Since I'm doing requests to /archives/posts/*/* it doesn't work for me.

Will this work (I haven't changed it yet)?
Code:

  Header set Access-Control-Allow-Origin "https://bitcointalk.org"       

This *should* work because /var/www normally contains all hosted files and therefore should allow requests to all them if they are coming from bitcointalk.org. Let's give it a try Smiley
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Then from my understanding LoyceV will have to add the entire forum host "https://bitcointalk.org/" to the allowed CORS websites for my extension to work. Hm, let's see what he says about that.
This is what I said last time:
I have no idea what any of this means
I currently have this in apache2.conf:
Code:
# Code from suchmoon, added July 8, 2020. See https://bitcointalk.org/index.php?topic=5102296.msg54755930#msg54755930                                          
                                                                                                                                   
  Header set Access-Control-Allow-Origin "https://bitcointalk.org"                                                                                              
Just tell me what to change Smiley
Will this work (I haven't changed it yet)?
Code:

  Header set Access-Control-Allow-Origin "https://bitcointalk.org"        

Quote
For what it's worth: I hadn't figured out the right CSS yet for posts at that time. It gets better later Smiley
sr. member
Activity: 840
Merit: 375
It seems like you are blocking requests, is there a specific reason why? I'm able to bypass this by using a CORS proxy, but since it's your website I would like to ask you first.
It's just the default behavior. I asked for the same thing before and he included the forum in the allowed CORS websites (but just for the page I needed).

Then from my understanding LoyceV will have to add the entire forum host "https://bitcointalk.org/" to the allowed CORS websites for my extension to work. Hm, let's see what he says about that. It's totally possible to bypass it by using a CORS proxy like https://cors-anywhere.herokuapp.com/; you make a request to 'https://cors-anywhere.herokuapp.com/https://loyce.club/archive/posts/5190/51902990.html' instead of requesting directly "https://loyce.club/archive/posts/5190/51902990.html" and it works like a charm. Edit: Saw on your old post that it's the solution you are now using. What did you use to setup your proxy server?

It's loaded to his website with "../theymos.css", so it's using the relative path from the current website (bitcointalk). Just include it in the extension or replace the string to use the full path I posted above.

Yea that was the problem thanks, I included theymos.css directly and it works now.
legendary
Activity: 2758
Merit: 6830
It seems like you are blocking requests, is there a specific reason why? I'm able to bypass this by using a CORS proxy, but since it's your website I would like to ask you first.
It's just the default behavior. I asked for the same thing before and he included the forum in the allowed CORS websites (but just for the page I needed).

Another question, even by bypassing this, it's unable to get the theymos.css file for some reason it's trying to get it at "https://bitcointalk.org/theymos.css", can you give me this css file so I can include it manually in the extension?
https://loyce.club/archive/posts/theymos.css

It's loaded to his website with "../theymos.css", so it's using the relative path from the current website (bitcointalk). Just include it in the extension or replace the string to use the full path I posted above.
sr. member
Activity: 840
Merit: 375
Hey @LoyceV

I'm currently making the chrome extension and reached the part where I need to make an XMLHTTP request to get the html of the original post from your website; I get an error:
Quote
Access to XMLHttpRequest at 'https://loyce.club/archive/posts/5221/52217187.html' from origin 'https://bitcointalk.org' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource.


It seems like you are blocking requests, is there a specific reason why? I'm able to bypass this by using a CORS proxy, but since it's your website I would like to ask you first.


Another question, even by bypassing this, it's unable to get the theymos.css file for some reason it's trying to get it at "https://bitcointalk.org/theymos.css", can you give me this css file so I can include it manually in the extension?

Edit:Didn't see suchmoon's post in the first page, was able to include the theymos.css file manually thanks to him.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Update:
I finished scraping all oldposts. The last post archived as "oldpost" is Post 53360099, which was created December 16, 2019 and archived July 9, 2020.
I won't update this archive anymore. I still want to add those posts to my "per topic" and "per member" lists, but I need to find some time for that.
This "oldposts" archive takes 72 GB on the server. My real time posts take 14 GB.
I don't really want to break existing hyperlinks to individual archived posts, but some day I may have to convert them into 100 posts per page too (that reduces disk usage a bit). I currently have over 3 million individual posts stored.

I've fixed this bug:
I found a bug (which I'm posting here as a reminder to myself): Posts on the עברי (Hebrew) board don't show up. Example: this post is missing, while it exists.
I'll see if I can add them later. I think it has something to do with the right-to-left writing, even selecting text on that board doesn't work as expected.
Update: عربية (Arabic) has the same problem.
I'll re-scrape these boards after finishing scraping all posts.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Well, that's a first: I censored data
I censored 2 posts because of accidental doxing (and quoting this):
https://loyce.club/archive/posts/5494/54944593.html
https://loyce.club/archive/posts/5494/54944843.html

I'm just posting it here for full disclosure.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
I didn't realize they were saved for 7 days. Thanks for the tip.
A warning though: if you post/preview a lot, you'll reach 100 drafts within a day (and the older drafts are lost).
Pages:
Jump to: