Python and NLTK tools

I’d like to use this space not only to record my experiences with the Natural Language Toolkit, but also with Python itself. Although I have extensive programming experience, I have never used python, and it is not similar to any other language I know.

Python is a high-level, object oriented language that has become very popular in recent years. It is generally considered easy to learn, it has extensive built in libraries for common tasks, and it is generally easy to maintain or modify existing programs. It is a “get it done” language, and is especially popular with people who are not full time programmers. As such, it may be useful to other DH practitioners who are not interested in becoming computer programmers, but do need to write their own programs on occasion.

 

The Natural Language Toolkit is a collection of python libraries for processing and working with natural language data (a library is a small program that provides a collection of basic functions that are not an official part of the language). The NLTK offers a large number of tools in the following areas: accessing corpora, strong processing, collocation discovery, part of speech tagging, classification, chunking, parsing, semantic interpretation, evaluation metric, probability and estimation, and linguistic fieldwork. I am working mostly with the “accessing corpora”, “string processing”, and “classification”.