Open scraped data of all the users - SQL Lite DB - 2.481.270 users

Piggy

hero member

Activity: 784

Merit: 1416

Quote from: TheBeardedBaby on May 14, 2019, 01:19:17 AM

~~@Piggy, when I try to download any of the files I get an error.~~

Quote

Access to doc-08-64-docs.googleusercontent.com was denied You don't have authorization to view this page.

~~Can you share it with me please, ill give you an email address in PM.~~

I fixed it, there was a download link you have given and I haven't seen it ...

Yes i was just checking the links and everything is in order, they are still active and shared

TheBeardedBaby

legendary

Activity: 2240

Merit: 3150

₿uy / $ell ..oeleo ;(

~~@Piggy, when I try to download any of the files I get an error.~~

Quote

Access to doc-08-64-docs.googleusercontent.com was denied You don't have authorization to view this page.

~~Can you share it with me please, ill give you an email address in PM.~~

I fixed it, there was a download link you have given and I haven't seen it ...

Piggy

hero member

Activity: 784

Merit: 1416

Quote from: cryptovigi on April 25, 2019, 05:15:31 PM

April is approaching the end, so let me ask you again if there is a chance that before the holidays you will release the current user database?

As I mentioned earlier, I would love to dig in it a bit Wink

i'll try to organize it as soon as i have some time to set up the whole thing, unfortunately is not something i can just start but need some manual setup and periodic checking while running to make sure everything works properly Undecided

cryptovigi

hero member

Activity: 714

Merit: 611

Quote from: Piggy on March 12, 2019, 02:08:54 AM

Quote from: cryptovigi on March 11, 2019, 08:29:25 PM

hey @Piggy, are you planning to scrape profiles data once again?
it would be great to look at fresh data and watch what has changed over the last few months...

i was thinking to run it again probably at the begin of April, there may have been a lot of changes indeed.

April is approaching the end, so let me ask you again if there is a chance that before the holidays you will release the current user database?

As I mentioned earlier, I would love to dig in it a bit Wink

Piggy

hero member

Activity: 784

Merit: 1416

Quote from: cryptovigi on March 11, 2019, 08:29:25 PM

hey @Piggy, are you planning to scrape profiles data once again?
it would be great to look at fresh data and watch what has changed over the last few months...

i was thinking to run it again probably at the begin of April, there may have been a lot of changes indeed.

cryptovigi

hero member

Activity: 714

Merit: 611

hey @Piggy, are you planning to scrape profiles data once again?
it would be great to look at fresh data and watch what has changed over the last few months...

Coin-1

legendary

Activity: 2674

Merit: 2334

Top-tier crypto casino and sportsbook

Quote from: Piggy on February 12, 2019, 05:14:46 AM

Quote from: Coin-1 on February 11, 2019, 11:21:06 PM

Thanks for sharing the standalone database you scraped.

I used your file "btctalk_full.db" to create the full list of red tagged members.

I simply executed the following SQL query:

Code:

select UserId from UserData where SUBSTR(Trust, -1, 1) = "!"

sqlite3 has listed about 15000 users. It was easy.

Keep in mind the list can be slightly different now, since the recent DT changes. That is a snapshot made in Dicember.

Yes, I understand it. The trust ratings of some users have really changed. I created two more lists of red tagged members which are called "_included" and "_excluded". I manually manage these lists. Undecided

Will you again scrape data in the future?

Quote from: DdmrDdmr on February 13, 2019, 05:01:02 AM

If I’m correct, the scraped profiles show Trust with a DT depth 2 view (which will be different from the view of those that have a Custom Trust list).

I guess that Piggy uses his wonderful @mention notification bot to scrape data. This auxiliary account probably only has DefaultTrust in its list of trusted users.

DdmrDdmr

legendary

Activity: 2338

Merit: 10802

There are lies, damned lies and statistics. MTwain

Quote from: Coin-1 on February 11, 2019, 11:21:06 PM

<...>

I took a look at that with @Piggy’s prior full BD profile scrape (November 2018), and encountered at the time 14.969 negative trusted profiles. I broke it up by rank and a couple of other things at the time (see Analysis - DT Depth 2 - Profile Distribution).

If I’m correct, the scraped profiles show Trust with a DT depth 2 view (which will be different from the view of those that have a Custom Trust list).

Piggy

hero member

Activity: 784

Merit: 1416

Quote from: Coin-1 on February 11, 2019, 11:21:06 PM

Thanks for sharing the standalone database you scraped.

I used your file "btctalk_full.db" to create the full list of red tagged members.

I simply executed the following SQL query:

Code:

select UserId from UserData where SUBSTR(Trust, -1, 1) = "!"

sqlite3 has listed about 15000 users. It was easy.

Keep in mind the list can be slightly different now, since the recent DT changes. That is a snapshot made in Dicember.

Coin-1

legendary

Activity: 2674

Merit: 2334

Top-tier crypto casino and sportsbook

Thanks for sharing the standalone database you scraped.

I used your file "btctalk_full.db" to create the full list of red tagged members.

I simply executed the following SQL query:

Code:

select UserId from UserData where SUBSTR(Trust, -1, 1) = "!"

sqlite3 has listed about 15000 users. It was easy.

DdmrDdmr

legendary

Activity: 2338

Merit: 10802

There are lies, damned lies and statistics. MTwain

Quote from: Piggy on December 07, 2018, 04:10:06 AM

<…>

Ok, thanks @Piggy. I managed to download the raw files, but I’m having some trouble importing them into my environment. The issue is on my side, so I just have to keep on at it until I resolve it (import field displacement, but since I’m doing it manually, I may have changed something compared to last time). Anyway, I’ll let you know when I resolve it (I need to find some linear time and it’s week-end now plus I’m getting data for the Dashboard).

Edit: Ok got the import working and ended with the same 2.481.270 records as you indicated in the updatated OP.

Note: Really fast data retrieval on your behalf! (under three days).

Edit 2: I think I found 651 profiles that were in the first extract, but that do not form part of the second extract. I've checked a few cases against the raw files and did not find them there (i.e. 553457 biteditor; 406094 0099ff; 715282 Pleasersvxuq; 812886 Sumprnma). The issue is barely noticeable from a statistical point of view, since most are Brand New forum members anyway.

Piggy

hero member

Activity: 784

Merit: 1416

Quote from: DdmrDdmr on December 05, 2018, 10:28:49 AM

Quote from: Piggy on December 05, 2018, 08:53:34 AM

<...>

Thankyou @Piggy. It will be interesting to take a look at from a comparison point of view with the previous dataset. If you can upload the raw files upon completion (like last time) all the better. I’m not sure how speedy I can get on to it, but I definitively want to see what insights are derivable.

Scraping has finished and the data can be found in here: https://drive.google.com/open?id=1mGEk6V3c_D-IhSYbuJvPrGGVEWLb0V8L

There is both raw data and the SQL Lite DB. There are now 2.481.270 users in it and about 44.206 new users since the last run.

DdmrDdmr

legendary

Activity: 2338

Merit: 10802

There are lies, damned lies and statistics. MTwain

Quote from: Piggy on December 05, 2018, 08:53:34 AM

<...>

Thankyou @Piggy. It will be interesting to take a look at from a comparison point of view with the previous dataset. If you can upload the raw files upon completion (like last time) all the better. I’m not sure how speedy I can get on to it, but I definitively want to see what insights are derivable.

Piggy

hero member

Activity: 784

Merit: 1416

I have started a new scraping run, if there are no problems within a few days the new data should be available

LoyceV

legendary

Activity: 3290

Merit: 16489

Thick-Skinned Gang Leader and Golden Feather 2021

Quote from: cryptovigi on November 15, 2018, 09:13:49 AM

One important thing: I understand that this dataset was prepared and shared for statistical and research purposes, this case you should consider deleting some personal data that are not necessary for these purposes such as e-mail addresses, messengers or wallets. Although they are publicly available they can be also used in an undesirable manner, so maybe better not give away full sets of them...

I've thought about this too, but it's the user's choice to make their data public. Some even use YOPmail, but only a few of them have ever posted. Those accounts are just waiting to be compromised.

mazdafunsun

full member

Activity: 490

Merit: 123

Nice job, somthing I was thinking of doing but never got around to do .

Quote from: DdmrDdmr on November 09, 2018, 06:42:18 AM

P.D. Great scraping speed there (I figure at least 5 processes running 24/7)

Last time i used 10+ processes and it included the time of last post which slows down the process.

cryptovigi

hero member

Activity: 714

Merit: 611

Great dataset!!!

It would be great to have similar one from the past for example the 24th January 2018 - many interesting comparisons could be made than...
but even without it it's a huge material for research... thanks for sharing

One important thing: I understand that this dataset was prepared and shared for statistical and research purposes, this case you should consider deleting some personal data that are not necessary for these purposes such as e-mail addresses, messengers or wallets. Although they are publicly available they can be also used in an undesirable manner, so maybe better not give away full sets of them...

DdmrDdmr

legendary

Activity: 2338

Merit: 10802

There are lies, damned lies and statistics. MTwain

Quote from: Piggy on November 12, 2018, 11:15:06 AM

<...>Yes this seems a good idea to get something sensible out of it. I'll make another run in the begin of December.

Ok, thanks, looking forward to it. With the current dataset, I’ve seen a couple of things that may be worth summing up and posting in the coming days. I just have to be able to get enough free time for it.

Note: Got the raw files into a table with the same cardinality as your original full user table, so the cleansing process I’ve applied is the same. I’ve also contrasted it to @mazdafunsun’s topics, and, as expected, the distribution by rank is very much aligned.

Piggy

hero member

Activity: 784

Merit: 1416

Quote from: DdmrDdmr on November 12, 2018, 07:43:46 AM

@Piggy, what could be interesting to do, is to have another data extraction after, let’s say, a month or so after your current dataset, in order to be able to use both datasets to see if some meaningful insights are derivable.

With one dataset, the core derivable data will be similar to part of what @mazdafunsun posted on his OPs, but two datasets allow for the time dimension to play a role. I can see a couple of additional things I can derive with just one dataset, but two, with a one month interval in between or so, would be smashing if you have the time.

Yes this seems a good idea to get something sensible out of it. I'll make another run in the begin of December.

DdmrDdmr

legendary

Activity: 2338

Merit: 10802

There are lies, damned lies and statistics. MTwain

@Piggy, what could be interesting to do, is to have another data extraction after, let’s say, a month or so after your current dataset, in order to be able to use both datasets to see if some meaningful insights are derivable.

With one dataset, the core derivable data will be similar to part of what @mazdafunsun posted on his OPs, but two datasets allow for the time dimension to play a role. I can see a couple of additional things I can derive with just one dataset, but two, with a one month interval in between or so, would be smashing if you have the time.

Topic: Open scraped data of all the users - SQL Lite DB - 2.481.270 users (Read 971 times)