Project idea

So the project that I am working on is a text classification project. I am currently working with python and the natural language tool kit to build some basic text classifiers. I am trying to repeat Moretti’s genre classication from Graphs, Maps and Trees on page 19. He manaully classified british novels into various genres (such as Murder Mystery, Romantic, etc). Although interesting, it makes his analysis prone to errors or hidden biases. If I can repeat it with an automated and repeatable method, it will reinforce his claims. In addition, I am working with several different kinds of text classification methods, and I am hoping to improve some of them when classifying genre, which is not only broad, but often an overlapping classification (books can be both mysteries and women’s novels for example).