Description
Trafilatura by Adrien Barbaresi is a Python library and CLI for extracting cleaned main text and structured metadata (title, date, author, language) from arbitrary HTML pages, with support for many languages and outputs in JSON or XML-TEI.
Reviews
0.0 (0 reviews)