Skip to main content

Natural Language Processing

Module information>

Academic Direction
Goldsmiths, University of London
Also part of
MSc Data Science
Modes of Study

This module provides you with a grounding in both rule-based and statistical approaches to Natural Language Processing (NPL) and combines theoretical study with hands-on work employing widely used software packages.

Machine processing of natural language is a key target for the application of Data Science techniques. It has a range of specialised techniques that are being developed in a large and growing research field of NLP. This module focuses on text processing and does not deal with speech or multi-modal communication.

Topics covered

  • History of NLP and its applications
  • Language processing and Python
  • Curated corpora and raw data sources
  • Corpus readers, stemmers and taggers
  • Classification tasks: e.g. gender identification, sentiment analysis, joint/sequence classification
  • Classification methods: decision trees, Naïve Bayes, MaxEnt
  • Information extraction: chunking and NER (Named Entity Recognition)
  • Formal grammars and parsing
  • Grammars and parsing: probabilistic parsing, feature-based grammars
  • Ethical and social issues around NLP


15 (150 hours)


  • Coursework (30%)
  • Written examination (70%)