Lexos - Integrated Text Analysis Platform

Overview

Lexos is a web-based integrated workflow tool for computational text analysis designed to facilitate the study of literary and historical texts through preprocessing, analysis, and visualization in a single environment.

  • Institution: Wheaton College, Massachusetts
  • Type: Web-based text analysis platform
  • License: Open source
  • Access: Free web application, downloadable for local use

Website

GitHub Repository

Key Features

Text Processing

  • Unicode support for diverse languages
  • Import HTML, XML, and plain text
  • Advanced “scrubbing” tools
  • Punctuation and whitespace handling
  • Stop word removal
  • Lemmatization
  • Markup tag processing

Analysis Methods

  • Document Term Matrix generation
  • N-gram tokenization
  • Hierarchical clustering
  • K-means clustering
  • Cosine similarity analysis
  • Z-score analysis
  • Principal Component Analysis (PCA)

Visualization Tools

  • Word clouds
  • Comparative multiclouds
  • Dendrograms
  • Rolling window analysis
  • Topic modeling (via MALLET)
  • Statistical graphs

Workflow Integration

  • Single environment for complete analysis
  • Preprocessing to visualization pipeline
  • Export capabilities for further analysis
  • Session management

The Lexomics Approach

Lexos implements “lexomics” - applying genomic analysis techniques to texts:

  • Detects subtle linguistic patterns
  • Complements traditional close reading
  • Supports philological analysis
  • Reveals authorship patterns
  • Identifies textual relationships

Research Applications

Literary Studies

  • Authorship attribution
  • Style analysis
  • Genre classification
  • Textual comparison
  • Medieval manuscript analysis

Historical Research

  • Document clustering
  • Chronological analysis
  • Source comparison
  • Language evolution studies

Specialized Use Cases

  • Early texts and manuscripts
  • Non-Western languages
  • Complex linguistic patterns
  • Pedagogical applications

Technical Architecture

  • Backend: Python 3.7 with Flask
  • Frontend: jQuery, D3.js, Plotly
  • Processing: scikit-learn, NLTK
  • Server: Werkzeug, Jinja2
  • Deployment: Web or local installation

Development History

  • Developed over 12+ summers
  • Collaboration between students and faculty
  • Previously funded by NEH
  • Continuous community development
  • Regular updates and improvements

Educational Features

  • Designed for non-technical users
  • Extensive documentation
  • Tutorial materials
  • Sample datasets
  • Classroom-friendly interface

Advantages

  • No programming required
  • All-in-one workflow
  • Strong Unicode support
  • Active development
  • Educational focus
  • Free and open source

Related Tools