Trafilatura is a Python library and command-line tool designed to gather text and metadata from the Web through crawling, scraping, and extraction. It provides clean text output in multiple formats, making it ideal for corpus building and web archiving.