The Zipf Mystery

  • Published: 16 September 2015
  • The of and to. A in is I. That it, for you, was with on. As have ... but be they.
    How many days have you been alive?
    random letter generator:
    Dictionary of Obscure Sorrows:
    Word frequency resources:
    [combined Wikipedia and Gutenberg]
    Great Zipf's law papers:
    Zipf’s law articles and discussions:
    other Zipf’s law PDFs
    in untranslated language:
    Zipf’s law slides:
    Pareto Principle and related ‘laws’:
    Random typing and Zipf:
    health 80/20:
    Principle of least effort: [PDF]
    self organized criticality:
    Hapax Legomenon:
    Learning curve:
    Forgetting curve:
    Experience curve effects:
    and zipf's law:
    music from:

    One hundred per cent of vcauce fans like Michael's videos, not eighty percent of them.

      These videos are always good to watch. I just wish they still uploaded vsauce videos. We need new vsauce vids

        quizzaciously became used alot more than once after this video was uploaded

          Rip Vsauce

            speaking of "the", here's a song that doesn't use that word at all:

              Wasn’t the videos name “Zipf’s Law”? Is Michael changing video titles?

                **Hapax Legomenon**

                  Had to check out your chart of the languages to see what they were, as I highly doubt Ziph's law doesn't apply to Arabic (spoken by over a billion people) or Farsi. The word "the" in Arabic is used far more profusely than in English; and the "a", as well as the verb 'to be' (or "is") basically don't. Here's how that works--if I remember correctly: to saw The Green Car in Arabic, you say "the car the green"; if you omit the "the" before the adjective, you're saying "the car (is) green"; and if you omit the "the" before the noun, you're saying "(a) green car"--car green. ("A Thousand and One Nights" in Arabic is "Alf Leyla wa Leyla"--(A) Thousand Night(s) and (a) Night." In Farsi, however, I have not seen the word "the" at all--basically doesn't exist, except that it's implied. Also the word "a" doesn't exist unless it's as the number 1, as in one thousand is the same as A thousand.

                    1:39 Chilean?, Our language are considered apart from Spanish? ¿Lo escribí bien?

                      Damn! It fucks with your head real bad!

                        Heck! I want Michael dubbed in PT-BR and I want it NOW!!!

                          the fallacy, well one of many, here is that there are not 26 1 letter words. there are 26 letters, not all of them are words

                            I think that confirms it, reality is an illusion and we are simulations the zipf mystery is the algorithm everything is programmed by. There's no other explanation

                              couldnt we also represent it as gaussian ?

                                Angels: how should we nerf humans & life in the universe? God: Give a 20/80 ratio buff across all stats & boom, roll out the update.

                                  this is proof that we are simulation, not real world. elon musk is right

                                    Wait. If it's the rate we forget information, it could be linked to that

                                      20% of vsauce's videos have 80% of his views

                                        It is not that hard really. All thoughts create a frequency that leaks from the head. It gather's in places where like frequencies gather like bubbles of memory. Basically we are in a sea of human thoughts. Frequencies attract it's pair to itself. Particles orbit they're own sounds.

                                                          A lot of the words at the top of the list are prepositions, articles, and pronouns; words of that type correspond to the ways that we process and connect bits of information, stuff that's usually represented in language by descriptors, which are situationally specific. I would say that the most commonly used words are ones that correspond to the nuts and bolts of human thought processes, and therefore, it's not surprising to see that shared across different individuals and different languages. What is the word "the"? What does it represent? How you answer that question and how you relate it to the fundamentals of cognition itself, may offer an answer to why it's the most commonly used word.

                                                                      lol, bussssssssssssssssssted! Time index 1:11 shows a "ranking" that does not match the list at time index 2:36, or the list at time index 16:15. This is the issue with what is called data mining. There are seemigly patterns because of how the data is shifted around, yet the conclusions are without support. 1:11-- the of and to a in is i that it for you was with on as have but be they 2:36-- the of and to in a was is that he for as it 16:15- the be to of and a in that have I it for not on with How "of" is shifted to make the pattern work in the third; or how "be" is near the end in the first, missing in the second, and second in the third; or "a" is fifth in all three. Just kidding, it is fifth in the first and sixth in the other two. I did this to demonstrate about the ease of shifting for data mining to support a conclusion in readers that my conclusion was wrong. lol, bussssssssssssssssssted, again! The concept of the random typing following zipfs claim, and the applying of zipfs claim to word usage distribution does not match. According to zipfs claim "the" in random typing would occur far less than zipfs claim of "the" in word usage distribution. The zipfs claim is actually a psychological effect of human though patterns "fitting" information into a biased pattern. The video even alludes to this when talking about repeating a word again near itself. A fun game to play with this is to get someone to say tin 10X, then to quickly ask them what an aluminum can is made of. There is a high chance they will say "tin." I saw a video about how people were given a series of numbers and then asked to see if they could then come up with a pattern. Once they had their pattern, they were then given a next number in the series that did not follow their pattern. Most were stumped. Some attempted to explain the next number, or to slightly alter their pattern to fit it. Yet the solution was a pattern that was nothing like their pattern. This is an issue when trying to solve problems and to predict a favorable outcome. When the pattern is not correct, it leads to real difficulties down the road. And this is why biases are problematic. Biases lead to allocation of resources towards a desired outcome based upon a biased pattern, sometimes based upon data mining, and then leading to an undesired consequence.

