My username coincides with the eighth's position best second performing letter, and that’s about it …
I haven’t got my DB handy today, so I can neither confirm nor deny the algorithm, but since you are on my Trust list, I’m ok with trusting the criteria (and while at it, suggest that people with usernames with 'D's and 'M's and 'R's try to perform better … barely on the map!).
As I mentioned to LoyceV, you're earning merits the hard way
Continuing with the supporting data.
The median length of a merit-earning post is 549 characters
meritdb=# select percentile_disc(0.5) within group (order by length(p."ContentWithoutQuotes"))
from (select sum(m."MeritAmount"), p."PostId", p."ContentWithoutQuotes" from "Posts" p inner join "MeritTxs" m on m."PostId" = p."PostId"
where p."Content" is not null group by p."PostId", p."ContentWithoutQuotes" having sum(m."MeritAmount") >= 0) p;
percentile_disc
-----------------
549
(1 row)
For comparison, the
average length of a meritorious post is higher, probably skewed by some massive essays. Obviously we're not interested in that.
meritdb=# select avg(length(p."ContentWithoutQuotes"))
from (select sum(m."MeritAmount"), p."PostId", p."ContentWithoutQuotes" from "Posts" p inner join "MeritTxs" m on m."PostId" = p."PostId"
where p."Content" is not null group by p."PostId", p."ContentWithoutQuotes" having sum(m."MeritAmount") >= 0) p;
avg
-----------------------
1445.3326025288816902
(1 row)
top merit-earning words (after removing stop words such as pronouns and conjunctions)
This one is a bit tricky. I use some gnarly code to pre-process posts when I fetch them from Bitcointalk, which includes:
- removing quotes ("ContentWithoutQuotes" in the example above)
- removing punctuation and HTML tags (e.g. smileys)
-
removing stop words ("ContentReduced" in the example below).
- some other witchcraft that is not relevant here.
So this is roughly what's happening before the data gets to the database:
p.ContentReduced = string.Join(" ", p.ContentWithoutPunctuationAndTags.Split(' ').Where(w => !StopWords.Contains(w)));
And then:
meritdb=# select word, sum("MeritAmount") from (
select m."MeritAmount", w.word
from "Posts" p inner join "MeritTxs" m on m."PostId" = p."PostId", unnest(string_to_array(p."ContentReduced", ' ')) w(word)
where trim(p."ContentReduced") > ''
) g
group by word order by 2 desc limit 100;
word | sum
-------------+--------
bitcoin | 322354
merit | 307474
color | 289022
member | 177705
post | 160538
This counts all merits of a post towards each word and does it as many times as the word appears in the post. There may be other ways to determine the most "meritorious" words but intuitively this result
looks about right.