The the root cause problem is the administration allows too much access to posts. A straightforward solution is to have an hourly rate limit as to how many page views an individual IP address/range can access on an hourly and daily basis that is something above what a *person* would see in the normal course of reading, but well below the necessary amount of page views required to view all posts. The scraping of posts for non-academic use should also be explicitly prohibited by the administration.
I'm glad those restrictions aren't in place. That wouldn't stop any government agency to scrape all posts, as they can just use many different servers, but it would make it very difficult to make user-contributions (such as Vod's BPIP.org or my Trust/Merit data).
The forum currently allows on average 1 page download per second, and that already means I have to set several seconds delay, to prevent different scrapers from conflicting with each other's data scraping.
The 1 page download per second is a standard generalized limit when no commercial relationship exists. Although this limit has been communicated by the administration, it is also what should be the assumed limit when scraping information from a website.
I believe merit data is actually published by the administration, along with trust data. This information is less intrusive than information contained in posts. If my above proposal were to be changed to 'thread page view' I understand BPIP would be entirely unaffected.
I would encourage you to review the Twitter
developer terms, and the Instagram
API TOS. These policy documents prohibit many of the things that are done with bitcointalk information. I would presume the 'average' bitcointalk user to care more about privacy than the 'average' Instagram or Twitter user.
From a technical perspective, there is nothing to prevent a government from collecting post information on bitcointalk. However if bitcointalk policies explicitly disallow government law enforcement from collecting information in mass via automation, in general, law enforcement will have trouble using information gained via these means as the basis for a warrant, or as admissible evidence in court.