MALLET - Machine Learning for Language Toolkit

Overview

MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text. It’s particularly popular for topic modeling with Latent Dirichlet Allocation (LDA).

  • Author: Andrew McCallum, maintained by David Mimno
  • Institution: Originally UMass, now Cornell University
  • License: Apache 2.0
  • Status: Active
  • Platform: Cross-platform (Java)

Website

Repository

Guide