Author

Topic: [ANN] All bitcointalk users table project (Read 359 times)

copper member
Activity: 1554
Merit: 489
Stop the war!
February 01, 2020, 11:47:31 AM
#13
Ready to use! All profiles are parsed. Added filter by banned users and some stats



copper member
Activity: 1554
Merit: 489
Stop the war!
January 17, 2020, 08:07:40 AM
#12
I probably really will try to parse from multiple addresses
legendary
Activity: 2338
Merit: 10802
There are lies, damned lies and statistics. MTwain
January 17, 2020, 07:59:44 AM
#11
<…>
Not sure how he did it either (I think you can achieve it with multiple VMs, but I have not done it myself).
I assume you’ve set your scraper script to intervals no shorter than 1 second between queries.
copper member
Activity: 1554
Merit: 489
Stop the war!
January 17, 2020, 07:32:36 AM
#10
When I make a search with the reg. date set to the range 1 January 2014 - 17 January 2020, I get just an “error”. I was trying to find myself. Is this because you haven’t parsed users from that date and forward?

Yes. Parser is still working right now an it is parsed only till April 2013
legendary
Activity: 2758
Merit: 6830
January 17, 2020, 07:28:44 AM
#9
When I make a search with the reg. date set to the range 1 January 2014 - 17 January 2020, I get just an “error”. I was trying to find myself. Is this because you haven’t parsed users from that date and forward?
copper member
Activity: 1554
Merit: 489
Stop the war!
January 17, 2020, 07:22:40 AM
#8
If I recall correctly, the last full profile DB published by a forum member was @piggy’s Open scraped data of all the users - SQL Lite DB - 2.481.270 users.

That was published over a year ago now, and at the time it took him just over 5 days to obtain a full DB dump (different IPs running parallel processes). It seemed to take-up quite some personal time, and was thus done only a couple of times. The good thing about getting the data in those five days is that the dataset will have more inner time/value related consistency than performing it over a longer period of time (such as a month - which is what it would probably take me too).

I don’t understand how he managed to circumvent the CloudFlare defense.

I am parsing with one process and one IP address and with no parallel reqiests. But even with such a low speed, my bot was banned twice yesterday by CloudFlare. It took a long time to solve the problem. Not sure I solved it completely
legendary
Activity: 2338
Merit: 10802
There are lies, damned lies and statistics. MTwain
January 17, 2020, 07:05:33 AM
#7
If I recall correctly, the last full profile DB published by a forum member was @piggy’s Open scraped data of all the users - SQL Lite DB - 2.481.270 users.

That was published over a year ago now, and at the time it took him just over 5 days to obtain a full DB dump (different IPs running parallel processes). It seemed to take-up quite some personal time, and was thus done only a couple of times. The good thing about getting the data in those five days is that the dataset will have more inner time/value related consistency than performing it over a longer period of time (such as a month - which is what it would probably take me too).
copper member
Activity: 1554
Merit: 489
Stop the war!
January 16, 2020, 07:43:15 AM
#6
Thank you. It seems that forum do not like links to free .tk domains. forbtt[dot]tk
Yeah got that. Great job on the site and I hope you will start adding features too with more staffs like trust, flag etc. Post, Activity, Last Active, Merit, Local time, Website, BTC address are some dynamic column. How are you going to keep them updated. What frequency you are checking each users?


I have start it only 16 hours ago and now it parsed 55000 users. So I think that all 3000000 users will parsed about 870 hours (one month)
After that I will parse again with begin but except deleted users. It should be faster.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
January 16, 2020, 07:42:54 AM
#5
It seems that forum do not like links to free .tk domains. forbtt[dot]tk
That's spam protection for Newbies. Dot.tk links are fine:

You really need to PM theymos and request to change the username to something else. From your trust page I already see it has already got attention and obviously I think all those users are correct in their inputs.
This is a troll account (some say it's owned by banned user korner), created right after theymos DefaultTrust.
legendary
Activity: 2464
Merit: 3878
Hire Bitcointalk Camp. Manager @ r7promotions.com
January 16, 2020, 07:36:28 AM
#4
Thank you. It seems that forum do not like links to free .tk domains. forbtt[dot]tk
Yeah got that. Great job on the site and I hope you will start adding features too with more staffs like trust, flag etc. Post, Activity, Last Active, Merit, Local time, Website, BTC address are some dynamic column. How are you going to keep them updated. What frequency you are checking each users?

By the way, nice username you have there :-P

Edit:
You really need to PM theymos and request to change the username to something else. From your trust page I already see it has already got attention and obviously I think all those users are correct in their inputs. Your creation seems exciting (on the site) and I really hope you will respect the forum value too.
PS: I left you 5 merits to show some encouragements on your work.
copper member
Activity: 1554
Merit: 489
Stop the war!
January 16, 2020, 07:26:55 AM
#3
Whatever the link was in your post, it has been removed. And talking about the project, we have LoyceV DdmrDdmr and some other users who scraps forum data and they also have the data you have scrapped so far. I hope you are aware of http://loyce.club site.

Anyway, let's see what you bring up with the data you are collecting. Good luck.

Thank you. It seems that forum do not like links to free .tk domains. forbtt[dot]tk
legendary
Activity: 2464
Merit: 3878
Hire Bitcointalk Camp. Manager @ r7promotions.com
January 16, 2020, 07:23:01 AM
#2
Whatever the link was in your post, it has been removed. And talking about the project, we have LoyceV DdmrDdmr and some other users who scraps forum data and they also have the data you have scrapped so far. I hope you are aware of http://loyce.club site.

Anyway, let's see what you bring up with the data you are collecting. Good luck.

Edit: Just found the IP. First impression is good. I like the thing that I can filter by number of posts and merits. Also the sorting option of the table data.

I would be grateful for the feedback and suggestions.
Give this feature to the users to see x number of data in one page. Right now it's everything in one page and once you will have huge data then I am sure the page will take ages to load.
copper member
Activity: 1554
Merit: 489
Stop the war!
January 16, 2020, 07:19:24 AM
#1
I am working on script that will collect all bitcointalk users profiles. This is a simple parser with simple frontend. The Project is temporaly hosted on this domain http://forbtt.tk

For now there is fully parsed and ready to use. You can test it with sorting users by id, name, posts etc. For addition you can use some filters lile Minimum posts and Minimum merits. Now reguest is limited by 3000 users on page.

I would be grateful for the feedback and suggestions.




Jump to: