This article has been imported from chorus.fm for discussion. All of the forum rules still apply. A data scientist decided to look through 22,000 metal albums to find out what words are the most “metal.” Turns out “burn” is the most metal word. And then “cries,” “veins,” “eternity,” and “breathe.” In the face of this complexity, it is not surprising that understanding natural language, in the same way humans do, with computers is still a unsolved problem. That said, there are an increasing number of techniques that have been developed to provide some insight into natural language. They tend to start by making simplifying assumptions about the data, and then using these assumptions convert the raw text into a more quantitative structure, like vectors or graphs. Once in this form, statistical or machine learning approaches can be leveraged to solve a whole range of problems. I haven’t had much experience playing with natural language, so I decided to try out a few techniques on a dataset I scrapped from the internet: a set of heavy metal lyrics (and associated genres). Expand - View Original
My roommate and I once had a bet which word appears more in Korn discography: Hurt or hate/hating. As of Paradigm Shift, Hurt wins, and so did I.