So here is flam's data dump:
http://pastebin.com/FDmGKHmDAll I did was scrape the pages. Then Sort alphabetically and take out anything devtome puts on the page. I tried to remove the majority of your copy & paste titles (in fact I removed the majority of them) but I got fed up and moved on.
When I run wc on the alphabetized dump file it returns 32,580 words. When I hit it with uniq it goes down to 30359 (NOTE: This means you have 2000+ words that, when broken in to their own lines, are identical). That was a big catch but still felt low, especially with all the copy & paste I manually removed.
So I got creative. I took the alphabetized list and I looked at just the first 10 characters of each line. My belief was that your "template" is actually the majority of your word score.
I think I was right. Line count is: 2972, uniq line count is: 2256. A difference of 700+. That means at least 1400 of lines I broke out of the data dump start the exact same way, saying the exact same thing.
Either you have OCD or you are just copying and pasting, changing enough wordage to make it appear unique at first.
If anyone has a problem with this, admittedly, rudimentary analysis I invite you to take a look at the data set I included in the beginning of the post and check for yourself.
(As a LoL player since beta I'm going to go out on a limb and say the majority of abilities/gains per level and descriptions you include are copy & paste too.)
I had a quick look at this and I think some of the duplication you are measuring is due to the type of content. i.e. There is a lot of identical data, shared between the characters, being described. This is a failure of the Game not the author. I do not play the game and so I could be wrong here.
Similarly the titles of sections will tend to be the same since the documents are about the same thing (I guess).
These issues may be more obvious because the other content is not easily accessible to a native English speaker. I have had a lot of experience reading English work by non-native English speakers and it can be quite painful to unravel. Typically the same bad sentence structure is repeated over and over because the author only knows a few ways of expressing themselves. Given the content I would expect this to be painfully obvious to someone who knows what they are looking at.
So yes there is evidence of templates but they are a product of the language skills of the author as much as anything else. Combine this with the subject matter which seems to have large amounts of identical data in it and I can understand why you would complain.
If this were a native English speaker then I would suggest they work more on the differences between characters in order to give each document some originality. Since it is not a native English speaker I would suggest they also pick up some story books for an age range they are comfortable reading and look at how the author changes the sentence structure to make the story flow in a more interesting way. In this way they can increase the number of ways they have of expressing themselves and reduce the template feel of the documents. I would also suggest they read their work aloud, as if they were talking to a friend about the subject, and changing it to improve the flow and pick up their enthusiasm for the subject matter.
So the question is, should we penalize people because they are not native English speakers or should we help them improve there English?
This fellow seems willing to improve.
ThinkI