Undergraduate Program Status
Elective | 4th year | Economics |
This course will teach you how to extract meaningful data and information from text. It will introduce the student to the basics of manipulation and mining of text. It will cover the most commonly used tools for retrieval of information and use of text as data. We will begin with a general introduction of Python and an understanding of how text is handled by Python. Then cover the use of regular expressions, cleaning text, and preparing text for use. We will then apply basic natural language processing methods and machine learning tools to demonstrate how text classification is performed. We will also explore methods for sentiment analysis, topics detection and modelling.
By the end of the course students should have a firm grasp of performing analysis on data generated from text. Also, student will know how to perform basic string processing, identify names, dates, and locations, tokenize text, convert text and words to vectors, identify the grammatical parts of a sentence, etc.