Author

Topic: Forum Merited Messages- Does size count? (Read 450 times)

legendary
Activity: 3346
Merit: 3130
May 24, 2018, 12:24:15 PM
#17
What has come out of this analysis is that most of the senior members are correct. The problem is not that there aren't enough merits, but that they aren't being awarded. I suspect that most responsible members who still have some Smerit will make awards if they see a post that they think is worth it. That means that there are two obvious problems. There aren't enough decent posts worthy of awards, and those that are worthy aren't being seen. This has been my feeling for some time.


There are decent posts, but they are hidden in between a lot of crappy ones. It is insane to see how the worst posts are the bigger ones. If you, for instance, write an insightful post in the Politics&society board, probably it will not be answered.
I've spent the last week searching for good posts in there, also I have written one trying to begin a debate about the privacy and social media, but there are no answers to this, for it seems to be difficult to answer, I'm afraid.
In conclusion, I can say that many good posts are hidden and get lost fast, for most of the users in here prefer to answer to a shitty megrathreads with a couple of words instead of taking a time to think and being interactive in a mindful debate.
To think implies an effort, I'm afraid, and most of the people in here are not willing to do their best but to gain activity in order to join some bounty.
This is sad, but there are still some good newbies needing rescue, maybe, jaja.

legendary
Activity: 2338
Merit: 10802
There are lies, damned lies and statistics. MTwain
<...>  My attention span is extremely short as a result of repeated smartphone usage.  I tend to do more skimming than close reading, and I generally look at the sentence structure and content more so than length.
<...>

Actually, you've made a very good point there. The kind of device is, my case, very influential both in terms of reading and writing on the forum. The phone screen is just a pain to move around on. It gets the job done for skim-reading, but it’s  annoying for writing anything over a paragraph or two. iPad device is fine, but even trimming quotes is a bit cumbersome. Laptop is where i’m most at ease on. I often find myself switching devices depending on what I want to post.
legendary
Activity: 3556
Merit: 7011
Top Crypto Casino
I tend to do rather long post, but the nature of what I expose normally requires it.
Yes, and I'd add that you've earned a lot of merit--deservedly--and it isn't necessarily because of the length of your posts, but the content.  I'm not smooching your gluteus here, but a lot of your posts are extremely well-written and obviously have a lot of thought put into them.  That, on bitcointalk, is a rare thing and stands out from 99% of everything else that gets written. 

I'm a sucker for a good meme that makes me chuckle, and I've merited a few of those even if the post doesn't have anything else.  I've also merited posts that are short but witty and those that state my thoughts exactly, even if they aren't long posts.  My attention span is extremely short as a result of repeated smartphone usage.  I tend to do more skimming than close reading, and I generally look at the sentence structure and content more so than length.  That's just me, and everyone has their own criteria for merit-giving.

Good thread, OP.
legendary
Activity: 2338
Merit: 10802
There are lies, damned lies and statistics. MTwain
Length of posts should not matter.  A lot of the shitposters I've seen are the ones that ramble on for paragraphs.  Best thing is to keep your posts simple and relevant.

What the numbers don’t/won’t/can’t say is the correlation between length of the post and the content degree of utility or generated interest. I tend to do rather long post, but the nature of what I expose normally requires it. This analysis focuses on meta post data really, not content (that would be great, but out of my scope).
legendary
Activity: 2338
Merit: 10802
There are lies, damned lies and statistics. MTwain
<...>

Yes, I concur on the conclusion.

I saw your initiative and I love it. I've seen that Seoincorporation has enrolled, and started to spread the word on the Spanish forum. Well done. Let's see how it evolves.
full member
Activity: 630
Merit: 172
Length of posts should not matter.  A lot of the shitposters I've seen are the ones that ramble on for paragraphs.  Best thing is to keep your posts simple and relevant.
legendary
Activity: 2828
Merit: 2472
https://JetCash.com
What has come out of this analysis is that most of the senior members are correct. The problem is not that there aren't enough merits, but that they aren't being awarded. I suspect that most responsible members who still have some Smerit will make awards if they see a post that they think is worth it. That means that there are two obvious problems. There aren't enough decent posts worthy of awards, and those that are worthy aren't being seen. This has been my feeling for some time.

Did you see my latest newbie initiative. I thought that could provide an information thread, and also a small collection of posts by new members that were meritable. This could be especially useful, as it is aimed at foreign language newbies who may not have merit sources in their local communities.
legendary
Activity: 2338
Merit: 10802
There are lies, damned lies and statistics. MTwain
I blame it on Satoshi - he's got a lot of the halved sMerits. Smiley

Yes, it actually reminds me of bitcoin, but on a worse scale. Estimates are that 4M of the (not yet) 21M bitcoins are "lost" to wallets that are non-retrievable. That’s 19%. I think we may be at a much higher ratio here..
legendary
Activity: 2338
Merit: 10802
There are lies, damned lies and statistics. MTwain
I've made a scraper once that excluded quotes, the trick was to count how deep the quote was nested.
Annoying to do, but it worked fine.

Impressive data analysis!

Thanks Loyce(Mobile). I may have complicated mine a bit by going bottom up, reversing the text strings and reversing them again at the end of the process. working on a reversed string was easier for me after multiple approaches.

Scraper is a real bummer, since the merited message is not a page but a sort of detail section of the page. Initially, my scraped retrieved data but randomly scraped the wrong message from the page, no matter what I did (delay timers, scrolls, etc).

I had to change tactics, scrape all the page, learned and used xPath, kept only merited messages, and then delete duplicates (pages with multiple merited messages would be scraped once per merited message Id). The the scraper I use is free but limited per batch to 10K records, so the overhead of the above issue turned it into a fusion of plenty of batches (since duplicate records are also counted in the 10K per batch limit). A pain, but managed to finally get there.

Perhaps mastering the tool beforehand really well would have been good, but I seldom use them...
legendary
Activity: 2828
Merit: 2472
https://JetCash.com
Some of the halved sMerits will have been spent, and some of the airdrop may have gone to dormant accounts. There will also be a load of SMerits unspent in deleted accounts. It looks as if a broad guess might be that there are around half a million sMerits available - that sounds incredibly high, especially as everybody seems to think there is a merit famine.

I blame it on Satoshi - he's got a lot of the halved sMerits. Smiley
legendary
Activity: 2338
Merit: 10802
There are lies, damned lies and statistics. MTwain
<...>
If you skipped the first month of the merit system, and just did the analysis for subsequent periods, would it be reasonable to assume that most of the sMerit had been used? It would also show how the system was working after members had become used to it. ( well some of them anyway Smiley ).

The available data is as follows:


We can see on the table of data that a total of 154.986 sMerits have been awarded since initial kick-off.

In a post from theymos (Re: Merit & new rank requirements), he stated that the initial distributed total sMerits were around 600k.
On the other hand, sMerit Sources generate a max of 18500 sMerit per 30 days. Of course, these don’t empty their monthly allowance and were probably did not have this amount since the beginning.

With the above, a safe scenario would be:
+ 600.000 Initial Airdrop (wow!)
+ 48.000 Merit Sources (12k/month as a hypothesis)
- 154.986 Awarded sMerits
------------------------------------------------------------
493.014 Available

And that’s not even considering the system generated halved sMerits by means of those awarded sMerits (we don’t know how much those have generated and still remain unspent). 
If the initial airdrop is what is stated, the amount available is amazing. So either there is a lot of sMerit in aggregate dormant accounts, or I’m too tired today and can’t think straight. I’d say the former, and it’s sMerit that may or not come in to use one day. So to answer your question, I’d say that there is a lot around, but dormant.
I’m I right there?

That’s why the weekly analysis of what’s going on is good. Looking at the table above (forget for now week Nº 17 because it’s a partial week), the weekly sMerit being awarded seems to stabilize in the 4,6k-5k area.
Is that a lot? I would say that it isn’t, especially in relation to available sMerit calculated above.
Newly awarded users (nToNew -> never previously awarded before) is in the 315 to 380 area. That is the amount of new people that get their first sMerit per week. That too seems small.

Now all the above can be seen as either good or bad in relation to what the effective potential circulation of sMerit should be. That is, the amount that keeps the system healthy and doesn’t choke the system. That quantity I do not know, and is correlated to tactical objectives (there must be some right?).
hero member
Activity: 1659
Merit: 687
LoyceV on the road. Or couch.
I've made a scraper once that excluded quotes, the trick was to count how deep the quote was nested.
Annoying to do, but it worked fine.

Impressive data analysis!
legendary
Activity: 2828
Merit: 2472
https://JetCash.com
(also, it’s like going to your (ex) 7 circles of hell to move around the html data ….).


Ah! Satan's Slave - I forgot about that. I seem to start these projects, and then go off at a tangent. Fit to Talk probably works because other members are supporting me, and I'm grateful for that. I keep thinking I should put together a description of the merit system. It seems that very few members understand it, and they think a merit is a merit, and of course it isn't . Smiley

If you skipped the first month of the merit system, and just did the analysis for subsequent periods, would it be reasonable to assume that most of the sMerit had been used? It would also show how the system was working after members had become used to it. ( well some of them anyway Smiley ).
legendary
Activity: 2338
Merit: 10802
There are lies, damned lies and statistics. MTwain

A better way to look at it would be to compare merited posts vs unmerited. That would probably require scraping every single post though unless theymos can provide some aggregated data.
<...>

Correct. We really should compare to unmerited posts, but that is not feasible by scraping. Well, not all would need to be scraped as we could work with large enough random samples and work on from there.

Nevertheless, just getting this data has been a nightmare and much more complicated that all my previous stat posts. Then the second part is getting to analyse the post itself. That’s when nightmare fase II begins (at least using the tools I use).

But yes, this analysis depicts the general idea of merited posts, but does not (cannot without a great effort) contrast with a control group conformed by non-merited posts.

<...>
Breaking this down by section/subsection should filter every thing naturaly due to the context of the way things are done on a forum/subforum level. This could be done further ahead.

This analysis only covers sMerited Posts, so the Quote breakdown may not enlight too much since the dataset is already biased to those merited (also, it’s like going to your (ex) 7 circles of hell to move around the html data ….).

legendary
Activity: 3654
Merit: 8909
https://bpip.org
A better way to look at it would be to compare merited posts vs unmerited. That would probably require scraping every single post though unless theymos can provide some aggregated data.

E.g. "65,07% of the sMerited posts have less than 100 words" doesn't mean than shorter posts are favored by merit senders. It likely means that most posts on the forum are shorter than 100 words.

I'd be more interested in something like "1% of under-100-words get merit and 2% of over-1000-words get merit".
legendary
Activity: 2828
Merit: 2472
https://JetCash.com
Some images are essential to a thread - stats analysis graphs etc. So if we exclude those, as many of them receive merits, it would appear that including redundant images stops the poster from getting merits. This certainly applies in my case, because I put the posters on ignore, so their other posts don't get merits from me either. I think that the quote analysis needs to be split into full quotes, and snipped quotes. A long full quote can get the poster ignored, especilly when it is not essential to his post.

Long formatted posts also get skipped or ignored, except in the ANN section. So maybe their posts should be excluded from the general analysis.

Thanks for doing the analysis though, it does provide some interesting info.
legendary
Activity: 2338
Merit: 10802
There are lies, damned lies and statistics. MTwain
1. Introduction

I’ve often seen the “size doesn’t matter” as opposed to “size does matter” used when talking about merited posts. I wondered if this was true, and to what degree (in this context). While being at it, I came across other overall merited post features that answered an initial set of questions I wanted to resolve (limited to sMerited posts):

Q1) How often are user images included in posts?
Q2) What about forum images (i.e. smilies, etc.)?
Q3) How about quotes?
Q4) Is the thread’s OP the main merited post in general?
Q5) How many posts prior to Merit System kick-off have been merited, and how far back?
Q6) How close does sMerit awarding occur in relation to the date of the post?
Q7) The size one: What size do merited post have? The longer, the more sMerited?


The information exposed here is done from the global Forum Merited Post’s angle. I’m sure that if we took this analysis down a level (forum Section/Subsection), the profiles would not all concur, but a general view is a good starting point, at least for now.

Dataset baseline:
Data as of 18/05/2018 (and around).
Total Merited Post Base: 45.947 (non-deleted messages).

I’ve decided to show the graphs and drop the data tables on this occasion for a more fluid lecture. I’ve also included extreme cases. These are, as they say, extreme cases, and should not take the focus of the core information shown on the graphs (they are kinky though..)

Disclaimer: Getting hold of this information is unfortunately a pain, and trying to break it down even more due to the HTML tags and specially to the quotes (be them nested or standalone). After sometime, I believe I’ve managed to extract the text from the awarded messages pretty well, excluding the quoted text parts which I consider are part of the context to messages, but do not add “real length” to the message body (therefore I exclude them from the message length).
The algorithm still has some flaws when the post has many objects of different nature (quotes, tables, code, images, etc.). I’m not going to build a 100% robust parser now,  since I consider that even a 95% correct cleanse of the post for word count is good enough at this stage.
Some local languages are more troublesome. For example, Chinese characters often do not have many spaces and therefore the word count can be erratic there. I have not excluded these posts as they are not too many and represent only a small noise in the overall picture.
Also references to urls are counted as a word.  
So, all in all, this is a rather good approximation, but not a 100% exact one.



2. User Images

It turns out that the majority, 88,58% of the awarded posts, do not have user images. 6,61% have only one, 1,43% have 2 images, and 3,38% have 3 or more images.

Note: that I’m counting images here and cannot (nor wish to) retrieve information as to the actual image size. Some a large images with graphs and text, whilst others are mere icons.



Examples of the extreme cases are (most of the time the images do not all load even after refreshing the page):
a)   170 images - A post from August 2015 (services) post awarded with 5  sMerits: Case 1
b)   131 Images -  A post from September 2017 (services) awarded with 27 sMerits: Case 2
c)   118 Images -  A post from March 2018 (mining - harware) awarded with 6 sMerits: Case 3
d)   108 Images -  A post from October 2017 (services) awarded with 9 sMerits: Case 4
e)   106 Images -  A post from September 2015 (mining support – small dog icons count) awarded with 1 sMerit: Case 5


2. Forum Images

Forum images are those icons such as emoticons that reference an address on the forum, and not an external link.
79,17% of awarded posts do not use any forum images, 13,53% use one, 7,31% use two or above. Emoticons are therefore not abused and seem to be kept at bay.




Examples of the extreme cases are:
a)   83 forum images - A post from April 2018 (Economics Speculation) post awarded with 5  sMerits: Case 6   (I’ll skip a few now, since the same author has the top 8 cases in the same forum area).
b)   30 forum images - A post from February 2018 (Altcoin Discussion) post awarded with 1  sMerit: Case 7    (It looks like there are more than 30, but some are ascii characters).
c)   27 forum images - A post from January 2018 (Economics Speculation) post awarded with 1  sMerit: Case 8
d)   24 forum images - A post from February 2018 (Italian Trading) post awarded with 1  sMerits: Case 9
e)   21 forum images - A post from April 2018 (Spanish) post awarded with 2 sMerits: Case 10


3. Quotes

Quotes are on the other hand a heavily used feature: 54,59% of the awarded posts, do not use quotes, but the remaining 45,41% do. 3,1% use 5 or more quotes.



Examples of the extreme cases are:
a)   290 quotes! - A post from April 2014 (Altcoin Discussion) post awarded with 7  sMerits: Case 11
b)   84 quotes - A post from April 2018 (Meta) post awarded with 9  sMerits: Case 12
c)   55 quotes - A post from February 2018 (Marketplace Gambling) post awarded with 2  sMerits (heavily nested quotes): case 13
d)   52 quotes - A post from May 2018 (Meta) post awarded with 2  sMerits (by me!): Case 14
e)   50 quotes - A post from March 2018 (Indonesian) post awarded with 2  sMerits: Case 15


4. Post Number

This one really startled me: 32,58% of merited posts are on mega threads (which I tend to ignore altogether), 40,31% if we count post position 201 onwards. Wow! This happens especially in Ann sections and Economy (The Wall Observer is the extreme case).
15% of awarded posts are Ops, but if we add up to post number 20 (which is the first page of a thread) , we get 40,21% of awarded posts. This graph actually does look like a crypto wall:




Examples of the extreme cases are:
a)   Post Nº  409230- A post from May 2018 (Economics Speculation) post awarded with 2  sMerits: Case 16   (There are a trillion in the same section/subsection).
b)   Post Nº 10049 - A post from May 2018 (Ann Altcoin) post awarded with 40  sMerits: Case 17
c)   Post Nº 10018- A post from January 2018 (Russian) post awarded with 47  sMerits: Case 18
d)   Post Nº 9324- A post from April 2018 (Ann Altcoin) post awarded with 50  sMerits: Case 19
e)   Post Nº 8912- A post from January 2018 (Ann Altcoin) post awarded with 50  sMerits: Case 20


5. Post Date

I thought there would be many more posts awarded sMerit from the days prior to the Merit System kick-off (since I has seen many cases when performing previous analytical tasks), but there really are not that many. If we consider that the system started in late February 2018, getting sMerit on posts back to January 2018 is pretty normal.
All in all, 93,89% of awarded posts are 2018 posts, 4,52% are 2017 posts and only 1,59% are posts from 2016. In terms of proportion, old awarded posts are outliers in the overall scheme of things.




Examples of the extreme cases are:
a)   A post from November 2009 (Bitcoin Discussion – Satoshi’s welcome post) post awarded with 751  sMerits: Case 21    (The oldest 5 awarded posts are all Satoshi’s)
b)   A post from January 2010 (Economy - Marketplace) post awarded with 1 sMerit: Case 22
c)   A post from January 2010 (Economy - Marketplace) post awarded with 1  sMerit: Case 23
d)   A post from January 2010 (Economy - Marketplace) post awarded with 1  sMerit: Case 24
e)   A post from May 2010 (Economy – Marketplace -> Pizza case) post awarded with 132  sMerit: Case 25



6. Time between Publishing and Meriting

I really wanted to see this one. It seems that 13,73% are merited within the first hour after posting, and another 10,05% within the second hour.
On a day scale, 56,50% of sMerit awarding occurs within the first 24 hours after posting, and an additional 20,47% gets awarded before the posts reaches an age of a tender week. Even so, 23,03% get awarded after two weeks or more since the post was published.



Note: time should be interpreted as “within the” (within the 1 (first) hour, within the second hour and so on). Also data represents number of Merit Txs, not number of posts.

Examples of the extreme cases are:

The first are all Satoshi’s posts as seen above, so I’ll give them a skip now in the examples.
a)   71222 hours: A post from March 2010 (Economy Marketplace -> must see: 10k bitcoins for 50$ and no one bought them) , awarded with 2  sMerits: Case 26
b)   70708 hours: A post from February 2010 (Economy Marketplace) , awarded with 1  sMerit: Case 27
c)   69941 hours: A post from February 2010 (Economics) , awarded with 2  sMerits: Case 28
d)   59227 hours: A post from July 2011 (Bitcoin Development and Technical Discussion) , awarded with 19  sMerits: Case 29
e)   2256 hours: A post from February  2018 (Altcoin Discussion) , awarded with 105  sMerits: Case 30


7.Post length

I’m measuring post length in words, and clustering them into groups of 100. As I’ve stated before, this part is not perfect since for example URLs get counted as words, no spaces after a full stop may cause in correct exact count, some html tags are a bother, etc. Quoted text has been removed.
On the whole, grouping posts in groups of 100, the data is pretty accurate and way better than no data at all.

It turns out that 65,07% of the sMerited posts have less than 100 words, another 18,41% have between 100 and 200 words, 6,45% between 200 and 300 words, 3,24% have between 300 and 400 words, and only 6,82% are above the 400 word barrier (somewhere near a word page in size).

I was also interested to see if longer posts get more merited, and it seems so. Looking at  the graphs, there’s hardly any difference between posts with up to 100 words (avg. 2,79 sMerits) and post with up to 200 words (avg. 2,76 sMerits), but it does build up from there. The larger the Word Group the less posts there are of the kind, so the less conclusive the related awarded sMerits become.

Nevertheless, the conclusion is not “go and create larger posts”, since the content is what makes the difference in these cases and not the post size per se (and content analysis is another world).



Note: ‘Words’ should be interpreted as within the group of x hundred words (so on the graph, ‘0’ represents between 0 and 99 words, a ‘1’ between 100 and 199 words, and so on).

Examples of the extreme cases are (MS Word and my algorithm don’t always agree on word count due to elements before pointed out):
a)   10.159 words: A post from December 2014 (Altcoin Discussion) , awarded with 20  sMerits: Case 31
b)   8.204 words: A post from March 2018 (Bitcoin Discussion) , awarded with 6 sMerits: Case 32
c)   1 word: A post from February 2018 (Economics Speculation) , awarded with 25 sMerits (for a full stop -> probably deleted text): Case 33
d)   1 word: A post from April 2018 (French) , awarded with 50 sMerits (a crypto address): Case 34
e)   0 word: A post from March 2018 (Russian) , awarded with 1 sMerits (quotes): Case 35
f)   0 word: A post from February 2018 (Economics, Speculation) , awarded with 14 sMerits (image): Case 36
Jump to: