Overview
Leipzig Corpus Miner (LCM) is an integrated research environment for text mining and content analysis that combines quantitative and qualitative approaches in a Software as a Service (SaaS) architecture, designed to make advanced NLP techniques accessible to humanities and social science researchers.
- Institution: University of Leipzig, Germany
- Type: Text mining and corpus analysis platform
- Access: Web-based SaaS platform
Website
https://ilcm.informatik.uni-leipzig.de/
Key Features
Text Processing Pipeline
- Sentence segmentation
- Part-of-speech tagging
- Named entity recognition
- Lemmatization and stemming
- Language detection
Corpus Management
- Multiple corpora per user
- Corpus sharing and collaboration
- Incremental document addition
- Version control for reproducibility
Analysis Methods
- Frequency analysis
- Co-occurrence analysis
- Automatic key term extraction
- Topic modeling (probabilistic models)
- Supervised classification with active learning
- Mixed methods integration
Data Capabilities
- Handles millions of documents
- Import formats: XML, CSV, HTML, DOC, DOCX, RTF, PDF, plain text
- Metadata management
- Structured and unstructured data support
Research Applications
Political Science
Successfully analyzed 3.5 million news articles spanning 60 years of German newspaper history for post-democracy and neoliberalism studies.
Digital Humanities
- Distant reading of large corpora
- Close reading integration
- Literary analysis at scale
- Historical text analysis
Social Sciences
- Media discourse analysis
- Market research
- Content analysis
- Longitudinal studies
Technical Architecture
- Platform: Software as a Service (SaaS)
- Interface: Web-based GUI for non-technical users
- Scalability: Designed for very large collections
- NLP Backend: State-of-the-art processing
- Collaboration: Multi-user environment
Mixed Methods Approach
Uniquely combines:
- “Close reading” for individual document analysis
- “Distant reading” for large-scale patterns
- Quantitative text mining methods
- Qualitative content analysis
- Interactive exploration workflows
Versions
iLCM (Interactive Leipzig Corpus Miner)
Latest development focusing on:
- Enhanced flexibility
- Improved scalability
- Extensive mixed-method support
- User-friendly interface
Research Projects
Developed through interdisciplinary collaborations including:
- “Postdemokratie und Neoliberalismus” (ePol)
- Various DH and social science initiatives
Advantages
- No programming required
- Handles very large datasets
- Reproducible research designs
- Collaborative features
- Academic focus
- Mixed methods support
Related Tools
- Voyant Tools - Web-based text analysis
- CATMA - Collaborative text annotation
- CorpusExplorer - Corpus analysis