Leipzig Corpus Miner (LCM) - Large-Scale Text Mining Platform

Overview

Leipzig Corpus Miner (LCM) is an integrated research environment for text mining and content analysis that combines quantitative and qualitative approaches in a Software as a Service (SaaS) architecture, designed to make advanced NLP techniques accessible to humanities and social science researchers.

  • Institution: University of Leipzig, Germany
  • Type: Text mining and corpus analysis platform
  • Access: Web-based SaaS platform

Website

https://ilcm.informatik.uni-leipzig.de/

Key Features

Text Processing Pipeline

  • Sentence segmentation
  • Part-of-speech tagging
  • Named entity recognition
  • Lemmatization and stemming
  • Language detection

Corpus Management

  • Multiple corpora per user
  • Corpus sharing and collaboration
  • Incremental document addition
  • Version control for reproducibility

Analysis Methods

  • Frequency analysis
  • Co-occurrence analysis
  • Automatic key term extraction
  • Topic modeling (probabilistic models)
  • Supervised classification with active learning
  • Mixed methods integration

Data Capabilities

  • Handles millions of documents
  • Import formats: XML, CSV, HTML, DOC, DOCX, RTF, PDF, plain text
  • Metadata management
  • Structured and unstructured data support

Research Applications

Political Science

Successfully analyzed 3.5 million news articles spanning 60 years of German newspaper history for post-democracy and neoliberalism studies.

Digital Humanities

  • Distant reading of large corpora
  • Close reading integration
  • Literary analysis at scale
  • Historical text analysis

Social Sciences

  • Media discourse analysis
  • Market research
  • Content analysis
  • Longitudinal studies

Technical Architecture

  • Platform: Software as a Service (SaaS)
  • Interface: Web-based GUI for non-technical users
  • Scalability: Designed for very large collections
  • NLP Backend: State-of-the-art processing
  • Collaboration: Multi-user environment

Mixed Methods Approach

Uniquely combines:

  • “Close reading” for individual document analysis
  • “Distant reading” for large-scale patterns
  • Quantitative text mining methods
  • Qualitative content analysis
  • Interactive exploration workflows

Versions

iLCM (Interactive Leipzig Corpus Miner)

Latest development focusing on:

  • Enhanced flexibility
  • Improved scalability
  • Extensive mixed-method support
  • User-friendly interface

Research Projects

Developed through interdisciplinary collaborations including:

  • “Postdemokratie und Neoliberalismus” (ePol)
  • Various DH and social science initiatives

Advantages

  • No programming required
  • Handles very large datasets
  • Reproducible research designs
  • Collaborative features
  • Academic focus
  • Mixed methods support

Related Tools