Overview
Lexos is a web-based integrated workflow tool for computational text analysis designed to facilitate the study of literary and historical texts through preprocessing, analysis, and visualization in a single environment.
- Institution: Wheaton College, Massachusetts
- Type: Web-based text analysis platform
- License: Open source
- Access: Free web application, downloadable for local use
Website
GitHub Repository
Key Features
Text Processing
- Unicode support for diverse languages
- Import HTML, XML, and plain text
- Advanced “scrubbing” tools
- Punctuation and whitespace handling
- Stop word removal
- Lemmatization
- Markup tag processing
Analysis Methods
- Document Term Matrix generation
- N-gram tokenization
- Hierarchical clustering
- K-means clustering
- Cosine similarity analysis
- Z-score analysis
- Principal Component Analysis (PCA)
Visualization Tools
- Word clouds
- Comparative multiclouds
- Dendrograms
- Rolling window analysis
- Topic modeling (via MALLET)
- Statistical graphs
Workflow Integration
- Single environment for complete analysis
- Preprocessing to visualization pipeline
- Export capabilities for further analysis
- Session management
The Lexomics Approach
Lexos implements “lexomics” - applying genomic analysis techniques to texts:
- Detects subtle linguistic patterns
- Complements traditional close reading
- Supports philological analysis
- Reveals authorship patterns
- Identifies textual relationships
Research Applications
Literary Studies
- Authorship attribution
- Style analysis
- Genre classification
- Textual comparison
- Medieval manuscript analysis
Historical Research
- Document clustering
- Chronological analysis
- Source comparison
- Language evolution studies
Specialized Use Cases
- Early texts and manuscripts
- Non-Western languages
- Complex linguistic patterns
- Pedagogical applications
Technical Architecture
- Backend: Python 3.7 with Flask
- Frontend: jQuery, D3.js, Plotly
- Processing: scikit-learn, NLTK
- Server: Werkzeug, Jinja2
- Deployment: Web or local installation
Development History
- Developed over 12+ summers
- Collaboration between students and faculty
- Previously funded by NEH
- Continuous community development
- Regular updates and improvements
Educational Features
- Designed for non-technical users
- Extensive documentation
- Tutorial materials
- Sample datasets
- Classroom-friendly interface
Advantages
- No programming required
- All-in-one workflow
- Strong Unicode support
- Active development
- Educational focus
- Free and open source
Related Tools
- Voyant Tools - Web text analysis
- AntConc - Corpus toolkit
- MALLET - Topic modeling