Pages:
Author

Topic: Additional data dumps? - page 2. (Read 947 times)

hero member
Activity: 536
Merit: 513
March 17, 2018, 09:37:35 AM
#8
It seems to me that some local boards do not have sufficient smerit distribution, and it would be good to clarify that directly from data dump, which would help designing an appropriate distribution of merit sources.  It would be useful to have

post ID, topic ID, board ID, merit

and check how much each local board is active and whether sufficient smerits are distributed.  Of course spams and non-high-quality posts will be counted but I assume they are roughly proportional to the total number of posts.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
March 17, 2018, 04:01:20 AM
#7
UID -> name, merit, potential activity, posts
I can think of a few:
1. Add "Activity" (not just "potential")
2. Add a banned-status to this list (ignore temporary bans)
3. Add either "merit earned" or "merit received for free at introduction"

Side note: there are more than 200 usernames with a comma, this will make processing a CSV difficult. Can you make this a file with just UID and name?
jr. member
Activity: 40
Merit: 5
PM me to buy my sig space.
March 17, 2018, 12:17:39 AM
#6
Just fyi,

You can see and gauge how much sMerit someone has simply by the transparency of the system. So that's a data dump hidden field.

You can calculate how much they've receieved versus how much they've sent... and from there you'll know how much sMerit they have left :/
copper member
Activity: 2996
Merit: 2371
March 16, 2018, 11:41:21 PM
#5
I might suggest dumping the post history of individual users/accounts. This could be restricted by rank and otherwise be rate limited. I think this would be difficult to recreate any meaningful mirror site with this information.

As others have mentioned, the security log would be beneficial. The mod log, not so much because of its limited information.

It would be helpful if users outboxes (and other folders) can be downloaded since they cannot be easily searched. Obviously downloading this information would be restricted to users who are logged into their own account.
legendary
Activity: 1582
Merit: 1064
March 16, 2018, 12:33:12 AM
#4
Currently there are two big data dumps available which auto-update weekly, trust.txt.xz and merit.txt.xz. These auto-updating dumps are pretty easy to set up, so I was thinking that it might be a good idea to produce several more of these, perhaps in the end forming a "ghetto API". What dumps would be most useful? Some that I was thinking of were:

 UID -> name, merit, potential activity, posts
 post ID -> topic ID, time, UID
 topic ID -> board ID, first post ID
 board ID -> board name

I'm not going to dump post contents in any form, since that would both be a massive file and it'd make things very easy for those annoying phishing mirror sites.

Modlog definitely.
hero member
Activity: 908
Merit: 657
March 16, 2018, 12:11:17 AM
#3
It might be helpful to have a continuous version of the seclog without having to rely on archived pages.
legendary
Activity: 2968
Merit: 3406
Crypto Swap Exchange
March 16, 2018, 12:00:58 AM
#2
What dumps would be most useful? Some that I was thinking of were:

 UID -> name, merit, potential activity, posts
This (the rest, aren't that important). I hope accounts with 0 post/activity are excluded (to eliminate having a massive file for information that's not needed).

Can we get another weekly dump, in form of tracking the positive/negative ratings (ex. Sent from where and sent to where) and also knowing removed ratings from someone? (Credit goes to Vod, based on this thread).
administrator
Activity: 5222
Merit: 13032
March 15, 2018, 11:13:52 PM
#1
Currently there are two big data dumps available which auto-update weekly, trust.txt.xz and merit.txt.xz. These auto-updating dumps are pretty easy to set up, so I was thinking that it might be a good idea to produce several more of these, perhaps in the end forming a "ghetto API". What dumps would be most useful? Some that I was thinking of were:

 UID -> name, merit, potential activity, posts
 post ID -> topic ID, time, UID
 topic ID -> board ID, first post ID
 board ID -> board name

I'm not going to dump post contents in any form, since that would both be a massive file and it'd make things very easy for those annoying phishing mirror sites.
Pages:
Jump to: