Author

Topic: Readable merit dataset for your own evaluations (Read 481 times)

member
Activity: 143
Merit: 90
February 02, 2020, 06:20:15 AM
#12
Data sets were updated. New file added with January 2020 merit data.

Full history file now contains merit records from 24th January 2018 to 31th January 2020.

Find all files https://github.com/ptrk01/bitcointalkorg_meritdata
member
Activity: 143
Merit: 90
Data sets were updated. New file added with December 2019 merit data.

Full history file now contains merit records from 24th January 2018 to 31th December 2019.

Find all files https://github.com/ptrk01/bitcointalkorg_meritdata
member
Activity: 143
Merit: 90
Data sets were updated.

Full history file now contains merit records from 24th January 2018 to 30th November 2019.

Also, there is a file which contains merit records from November 2019 only.

Find all file in https://github.com/ptrk01/bitcointalkorg_meritdata
member
Activity: 143
Merit: 90
edit: as an example, I believe the following line in your CSV file contains an incorrect name:
Quote
2018-09-17;07:02:12;1;;nkampala;BITSSA

Thanks for letting me know about the issue.

I took a closer look. This is the corresponding raw data set.
Quote
1537167732   1   5025631.msg45813886   2093373   1053767

The issue with the double ; appears when the thread cannot be found. It seems the thread or post was deleted. In this case it is about this thread https://bitcointalk.org/index.php?topic=5025631.msg45813886.0 which is missed.

The second issue in the data record is the receiver's username. It is documented as BITSSA but instead it is BITSSA : BITCOIN EXCHANGE. I will adjust my script accordingly so that if there is a colon in the username (which is a very rare case), the complete username is documented.
copper member
Activity: 1652
Merit: 1901
Amazon Prime Member #7
Full History (24th January 2018 and 31th October 2019)
250887 merit records
Github
I noticed that some data has a delimiter of a comma, while other data has a semi-colon as a delimiter. This makes it more difficult to analyze. It is also a best practice to use ID numbers instead of user generated names in CSV files because thread names, or usernames may contain the delimiter.

You can map names into your dataset after you analyze it for display purposes. This also helps remove any biases you may have with regards to what you are trying to prove.  

I put the entire merit dataset into a comma delimited CSV file with a header row and uploaded it here.

edit: as an example, I believe the following line in your CSV file contains an incorrect name:
Quote
2018-09-17;07:02:12;1;;nkampala;BITSSA

I haven't looked, but I suspect there is also issues with the transactions involving the following UIDs:
['1053767', '1187433', '2307758', '2471646', '2471831']
member
Activity: 143
Merit: 90
Data sets were updated.

Full history file now contains merit records from 24th January 2018 to 31th October 2019.

Also, there is a file which contains merit records from October 2019 only.
member
Activity: 143
Merit: 90
I sent you my last merit. Could you potentially do this for the entire history of the merit system?

I uploaded the full history (from 25th January 2018 to 27th September 2019).
legendary
Activity: 2338
Merit: 10802
There are lies, damned lies and statistics. MTwain
<...>
Currently, I publish all the sMerit TXs here: https://fusiontables.google.com/DataSource?docid=1wM2Op6_ol8_0iP0sDEemIGr9weKvIeLPvKsKMpFy#rows:id=1. The data is downloadable (File-> Dowload) as a csv, and is updated every Friday. The only issue is that I may not continue feeding that tool from December onwards, since it is going to be discontinued.

Prior to publishing the TXs there, I do upload them to internal Google Sheets such as there:
https://docs.google.com/spreadsheets/d/1GTngeRJlWSEg1bFY-z0S6nqZxwSGUsUTeaYREnOQieI/edit?usp=sharing (Part I)
https://docs.google.com/spreadsheets/d/1V7kW7q-dHIK-dJj7byUbBE1PLyVjmG5lQb_wJHxgUIU/edit?usp=sharing (Part II)

The above are Google Sheets, and cam be exported to Excel and csv amongst others. The reason for the file to be split is to make it easier to load on the Fusion Tables structure, but otherwise I would just feed a single Google spreadsheet.

Data is derived from the cumulative of merit.txt files that the forum published every Friday (it’s not a simple merge, since there are TX in common between files – 113 aprox in common out of the 120 days in each merit.txt. All the data is kept in a single database, applying each week’s cumulative file beforehand.

That is not all the info nevertheless, since forum structure tied to each message is obtained separately.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Sure, where do I get the entire history?
http://loyce.club/Merit/merit.all.txt (updated weekly, usually at the end of Friday).
member
Activity: 143
Merit: 90
I sent you my last merit. Could you potentially do this for the entire history of the merit system?

Thank you!
Sure, where do I get the entire history? I only found the data set of the last four months.
legendary
Activity: 3010
Merit: 8114
I sent you my last merit. Could you potentially do this for the entire history of the merit system? I'd like to compile my own database in a single Excel file. I know DdmrDdmr and LoyceV have done similar things but I do appreciate being able to import the data directly into Excel.
member
Activity: 143
Merit: 90
I have read that some would like to perform merit evaluations themselves, but the data provided by theymos is not very readable. That's why I wrote a script that provides the same data with readable date, time, category path, thread name and usernames (from & to).

I make the data freely available for everyone on Github. Have fun with the data analysis.

The data was created automatically, so there is no guarantee the data is consistent. It is based on raw data provided by theymos and LoyceV.



Full History (24th January 2018 and 31th January 2020)
299935 merit records
Github

Subset History (23th May 2019 and 20th September 2019)
41948 merit records
Github

October 2019 History
11912 merit records
Github

November 2019 History
18228 merit records
Github

December 2019 History
14734 merit records
Github

January 2020 History
16086 merit records
Github
Jump to: