Note: for each language there are 12 documents with no words, indicated by the only word they have is NULL with a count of 0. I am checking this, but so far it is the case that these contain no words in the original file. This is in part because the title is NOT added into the text, just because it wasn't -- no particular reason. In all cases these are either Categories or Wikipedia documentation, and are probably uninteresting anyway. I believe what happens is these contain only a particular kind of link, which is not counted in the wch. Also note that not all words are in English/French. The format is: "number:title"\t"word1:count1"\t"word2:count2" etc. The first number is an id and meaningless to all of us.