Open science package for LLM-powered semantic synthesis and precise extraction of information from unstructured texts.
What is alembica
?
alembica
is an open-source tool that leverages Large Language Models (LLMs) to extract structured datasets from unstructured text corpora. It provides an automated and scalable way to process, synthesize, and transform textual data into structured formats suitable for analysis.
Key Features
- Validation of Input: Ensures input queries follow structured formats for accurate model interaction.
- Cost Assessment: Computes costs associated with token usage, considering different LLM pricing models.
- Data Extraction: Converts unstructured text into structured datasets, enabling easier analysis and integration.
Why Use alembica
?
- Automated Semantic Extraction: Streamlines the conversion of free text into structured formats.
- Replicability & Open Science: Ensures consistent, unbiased analysis of textual data.
- Cost Efficiency: Provides cost estimations based on token usage, helping users optimize processing expenses.
Get Started
License
alembica
is released under the GNU Affero General Public License v3.
Citation
Boero, R. (2025). alembica - Open Science Software for Semantic Synthesis and Extraction of Information from Unstructured Sources. Zenodo. https://doi.org/10.5281/zenodo.14899666