Pages:
Author

Topic: Viewing unedited posts and deleted posts, view per post, per user or per topic - page 9. (Read 8803 times)

legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Did the service work correctly on November 28-29? I see there are edited posts, but they did not get into the archive.
The question is quite serious since a scammer with Hero rank can avoid punishment.
I have 9995, 9998 and 9995 posts (out of 10000) around these days. That means I missed only 0.04% of all posts, and some of them were probably in topics that aren't publicly accessible.

I'm currently running an update on members and topics, that should be enough to find back most posts.

Can you share links to the posts that you can't find in my archive?
legendary
Activity: 2478
Merit: 1951
Leading Crypto Sports Betting & Casino Platform
LoyceV

Did the service work correctly on November 28-29? I see there are edited posts, but they did not get into the archive.
The question is quite serious since a scammer with Hero rank can avoid punishment.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
I just noticed my webhost's DNS server is gone, and looking further I noticed I haven't uploaded any posts for the past 2.5 days.

Amazingly, my hosting itself works fine Cheesy So, I've switched DNS servers to Namecheap, and I'm currently uploading 23,000 posts. That means the file timestamps will be incorrect, but the data inside the files is still good.
legendary
Activity: 3696
Merit: 2219
💲🏎️💨🚓
I would appreciate it if you PM me that post / thread?
I think you misunderstood what I was trying to say: I don't have any posts older than September 2018, and I've only been uploading scraped files since July this year (with some incomplete data mainly at the start).
So I don't have the thread you're looking for, and I can't find it on Archive.org either. Sorry Sad

Yeh, my bad. I haven't read many of the scraped posts, so am a little rusty.

Have wrapped up what I was working on and got it out there for peer review.

Regards,
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
I would appreciate it if you PM me that post / thread?
I think you misunderstood what I was trying to say: I don't have any posts older than September 2018, and I've only been uploading scraped files since July this year (with some incomplete data mainly at the start).
So I don't have the thread you're looking for, and I can't find it on Archive.org either. Sorry Sad
legendary
Activity: 3696
Merit: 2219
💲🏎️💨🚓
Is this post too far back in time? https://bitcointalksearch.org/topic/--3387794
Yes. The oldest post I have is 45589103. Back then, I didn't upload the files and now just keep them locally in case someone asks for them.

I would appreciate it if you PM me that post / thread?

Regards,
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Is this post too far back in time? https://bitcointalksearch.org/topic/--3387794
Yes. The oldest post I have is 45589103. Back then, I didn't upload the files and now just keep them locally in case someone asks for them.
legendary
Activity: 3696
Merit: 2219
💲🏎️💨🚓
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
My webhost is out of disk space Shocked
I've opened a ticket and keep local backups of all newly scraped posts so no data will be lost. Once they resolve this, my uploads should start catching up.

Update: this was fixed 2 days ago.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
I just wanted to know whether this tool archives even replies that have been made in a deleted topic or this applies only to directly deleted topics.
I'm not sure what you're asking me... I archive all post (okay, maybe 99.9% of all posts) within a few seconds. If the topic gets deleted, I keep the archive.
I'm still working on improving the per-topic and per-user view.
legendary
Activity: 2338
Merit: 1261
Heisenberg
Hey LoyceV, I would like to appreciate the work you have done to this community.
 I just wanted to know whether this tool archives even replies that have been made in a deleted topic or this applies only to directly deleted topics.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Sneak preview: I'm working on a per-topic-view. See http://loyce.club/archive/topics/
And I'm working on a and per-user-view, see http://loyce.club/archive/members/
legendary
Activity: 2296
Merit: 2262
BTC or BUST
It was a joke.. It doesn't mean I don't still like you LV..
Maybe it serves as a reminder that you all are being monitored..

I may have been an outlaw all my life but i have always strived to be an honest outlaw..
legendary
Activity: 3318
Merit: 2008
First Exclusion Ever
From a technical perspective, there is nothing to prevent a government from collecting post information on bitcointalk. However if bitcointalk policies explicitly disallow government law enforcement from collecting information in mass via automation, in general, law enforcement will have trouble using information gained via these means as the basis for a warrant, or as admissible evidence in court.
This makes sense, you should send the suggestion to theymos (or create a topic in Meta).

Unfortunately without an explicit terms of service, this kind of policy is legally meaningless, and Theymos has made it pretty clear in the past he doesn't want to use one.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
From a technical perspective, there is nothing to prevent a government from collecting post information on bitcointalk. However if bitcointalk policies explicitly disallow government law enforcement from collecting information in mass via automation, in general, law enforcement will have trouble using information gained via these means as the basis for a warrant, or as admissible evidence in court.
This makes sense, you should send the suggestion to theymos (or create a topic in Meta).
copper member
Activity: 1652
Merit: 1901
Amazon Prime Member #7
The the root cause problem is the administration allows too much access to posts. A straightforward solution is to have an hourly rate limit as to how many page views an individual IP address/range can access on an hourly and daily basis that is something above what a *person* would see in the normal course of reading, but well below the necessary amount of page views required to view all posts. The scraping of posts for non-academic use should also be explicitly prohibited by the administration.
I'm glad those restrictions aren't in place. That wouldn't stop any government agency to scrape all posts, as they can just use many different servers, but it would make it very difficult to make user-contributions (such as Vod's BPIP.org or my Trust/Merit data).

The forum currently allows on average 1 page download per second, and that already means I have to set several seconds delay, to prevent different scrapers from conflicting with each other's data scraping.
The 1 page download per second is a standard generalized limit when no commercial relationship exists. Although this limit has been communicated by the administration, it is also what should be the assumed limit when scraping information from a website.

I believe merit data is actually published by the administration, along with trust data. This information is less intrusive than information contained in posts. If my above proposal were to be changed to 'thread page view' I understand BPIP would be entirely unaffected.

I would encourage you to review the Twitter developer terms, and the Instagram API TOS. These policy documents prohibit many of the things that are done with bitcointalk information. I would presume the 'average' bitcointalk user to care more about privacy than the 'average' Instagram or Twitter user.

From a technical perspective, there is nothing to prevent a government from collecting post information on bitcointalk. However if bitcointalk policies explicitly disallow government law enforcement from collecting information in mass via automation, in general, law enforcement will have trouble using information gained via these means as the basis for a warrant, or as admissible evidence in court.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
The the root cause problem is the administration allows too much access to posts. A straightforward solution is to have an hourly rate limit as to how many page views an individual IP address/range can access on an hourly and daily basis that is something above what a *person* would see in the normal course of reading, but well below the necessary amount of page views required to view all posts. The scraping of posts for non-academic use should also be explicitly prohibited by the administration.
I'm glad those restrictions aren't in place. That wouldn't stop any government agency to scrape all posts, as they can just use many different servers, but it would make it very difficult to make user-contributions (such as Vod's BPIP.org or my Trust/Merit data).

The forum currently allows on average 1 page download per second, and that already means I have to set several seconds delay, to prevent different scrapers from conflicting with each other's data scraping.
copper member
Activity: 1652
Merit: 1901
Amazon Prime Member #7

In our country, our government is banning crypto, so if I posted my information here and if tomorrow some government agency wanted to track me down - then your tool is going to enable them to do that. And possibly land a person in jail for 10 years - just for being associated with crypto.



Now, I'd like to hear your thoughts about these 2 situations. What do you intend to do to avoid / solve such issues?

If someone posts something on Facebook or Twitter, with privacy settings allowing anyone to see the post with the hashtag #crypto, anyone specifically looking at your profile can see your post, but it is not trivial for someone to obtain all posts containing a hashtag.

There are limitations as to how many tweets can be distributed to entity user per day and per month. The daily number of tweets that can be sent to a third party is large (50,000), but is a small percentage of the total tweets posted every day (500 million). If someone like DPR was posting on twitter instead of bitcointalk, they probably would still have gotten caught, while a HK protestor would probably be safe on twitter, while the HK government (a sockpuppet of the Chinese government) might be investigated similar to how legendster describes if they are posting on bitcointalk.

LoyceV is not the one who invented scraping forum posts, nor is he the only one to be actively scraping posts. There are many ways to download forum posts via automated means, and it is not difficult to get posts into a DataFrame that can later be analyzed.  

The the root cause problem is the administration allows too much access to posts. A straightforward solution is to have an hourly rate limit as to how many page views an individual IP address/range can access on an hourly and daily basis that is something above what a *person* would see in the normal course of reading, but well below the necessary amount of page views required to view all posts. The scraping of posts for non-academic use should also be explicitly prohibited by the administration.
legendary
Activity: 3654
Merit: 8909
https://bpip.org
So the Switzerland thing is just a CIA cover? Makes sense now.
Pages:
Jump to: