Author

Topic: Raw data of bitcointalk forum - how to avoid or correctly scrap information (Read 215 times)

sr. member
Activity: 812
Merit: 270
I recently discover some posts related to Merit system based on raw data file provided by theymos

Here you go: https://bitcointalk.org/merit.txt.xz

Similar to trust.txt.xz, it'll be updated weekly. It will show only the last 120 days of data; someone else should archive the old ones if you want them.

Then another one from LoyceV that points to a username/id mapping (pastebin) but I don't know who provided it, how and when it has been last updated.

For some time I was thinking about retrieving some information from the forum to create some stats but it would involve a lot of pages scraping.

So my first question is: is there any list of available raw data to use? I heard about the trust data (https://bitcointalk.org/trust.txt.xz) but what I clearly looking for is related to the forum architecture: thread parent/children, message parent thread, message author, user id/names mapping, etc...

Second question: if this kind of data is not available, what are the policies concerning the forum scrapping?

EDIT: just found theymos thread concerning new data dumps here: https://bitcointalksearch.org/topic/raw-data-of-bitcointalk-forum-how-to-avoid-or-correctly-scrap-information-3151741
So I locked the thread.
Jump to: