Logically Consistent Language Models via Neuro-Symbolic Integration
Authors: Diego Calanzone, Stefano Teso, Antonio Vergari
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically show how given incomplete factual knowledge, e.g., by providing only a limited number of known facts, the LLM can learn truth beliefs for new facts while keeping logical consistency w.r.t. prior knowledge. Moreover, our method allows LLMs to extrapolate to unseen but semantically similar factual knowledge, represented in unseen datasets, more systematically. Code available at https://github.com/ddidacus/loco-llm. In our experiments, with a single offline training session, LLMs trained with our objective outperform models relying on external solvers, and are more factual and logically consistent in low-data regimes when compared to standard supervised fine-tuning over KBs of facts. |
| Researcher Affiliation | Academia | Diego Calanzone DISI, University of Trento EMAIL Stefano Teso CIMeC & DISI, University of Trento EMAIL Antonio Vergari School of Informatics, University of Edinburgh EMAIL |
| Pseudocode | No | The paper includes a diagram in Figure 1 titled "Pipeline of our Logically Consistent (Lo Co) LLMs" which illustrates the process, but it does not contain a structured pseudocode block or an algorithm formally labeled as such. |
| Open Source Code | Yes | Code available at https://github.com/ddidacus/loco-llm. |
| Open Datasets | Yes | We train LOCO-LMS on the Belief Bank (Kassner et al., 2021). We use the three splits as in Mitchell et al. (Mitchell et al., 2022): a calibration set of 1, 072 annotated facts about 7 entities of the form (subject, property, true/false) used for training, a silver set of 12, 636 facts about 85 entities used for evaluation, and a set of 2224 valid abstract logical implications. [...] For this purpose, the Concept Net dataset (Speer et al., 2018b), is a rich source of knowledge about entity properties and relationships. [...] We evaluate LOCO-LMS on the Entailment Bank (Dalvi et al., 2022) test split, as proposed by Kassner et al. (2023) to reason on entailment trees. |
| Dataset Splits | Yes | We use the three splits as in Mitchell et al. (Mitchell et al., 2022): a calibration set of 1, 072 annotated facts about 7 entities of the form (subject, property, true/false) used for training, a silver set of 12, 636 facts about 85 entities used for evaluation, and a set of 2224 valid abstract logical implications. [...] We use 90% and 10% of T1 facts for training and validation, respectively; T2 facts for testing. |
| Hardware Specification | Yes | we fine-tune our models for 3 epochs with a learning rate fixed to γ = 3 10 4, batch size 4 with gradient accumulation (64/16 steps), on one n Vidia A30 24GB GPU. [...] we fine-tune our models for 5 epochs keeping the learning rate fixed to γ = 3 10 4, batch size 64, on 1 n Vidia A100-40GB GPU. |
| Software Dependencies | No | The paper mentions several tools and models like "Macaw-Large (Tafjord & Clark, 2021)", "LLa Ma-2 (Touvron et al., 2023)", "Adam W (Loshchilov & Hutter, 2016)", "Lo RA (Hu et al., 2021)", and "Py SDD5 (pys, 2017)", but it does not specify concrete version numbers for any of these software components or libraries, which is required for a reproducible description. |
| Experiment Setup | Yes | We fine-tune our models for 3 epochs with a learning rate fixed to γ = 3 10 4, batch size 4 with gradient accumulation (64/16 steps), on one n Vidia A30 24GB GPU. We use Adam W (Loshchilov & Hutter, 2016) as optimizer with a default weight decay λ = 10 2. [...] We limit the generation to 4 tokens following the input. We adopt a similar set of hyperparameters to Lo RA: we fine-tune our models for 5 epochs keeping the learning rate fixed to γ = 3 10 4, batch size 64, on 1 n Vidia A100-40GB GPU. We use Adam W (Loshchilov & Hutter, 2016) as optimizer with a default weight decay λ = 10 2. [...] greedy sampling strategy, temperature t = 1.0 and dropout disabled. |