From Tokens to Lattices: Emergent Lattice Structures in Language Models
Authors: Bo Xiong, Steffen Staab
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We create three datasets for evaluation, and the empirical results verify our hypothesis. In this section, we evaluate whether the formal contexts constructed from MLMs align with established gold standards and assess their capability to reconstruct concept lattices. We substantiate our findings with additional ablation studies and case analyses in Sec. 4.4. |
| Researcher Affiliation | Academia | Bo Xiong Stanford University, United States University of Stuttgart, Germany EMAIL Steffen Staab University of Stuttgart, Germany University of Southampton, United Kingdom EMAIL |
| Pseudocode | Yes | Algorithm 1 A Simple Formal Context Learning Algorithm Input: A dataset D, a set of objects G and attributes M Output: An estimated formal context incidence matrix I = [0, 1]|G| |M| initialize I = 0|G| |M| for w D do for wi w do for wj w do if wi G, wj M then I(Gwi,Mwj) = I(Gwi,Mwj) + 1 end if end for end for end for normalization: I = normalize(I) return I |
| Open Source Code | Yes | Our code and datasets are included as supplemental materials and will be made available upon acceptance. |
| Open Datasets | Yes | We construct three new datasets of formal contexts in different domains, serving as the gold standards for evaluation. Two of them are derived from commonsense knowledge, and the third one is from the biomedical domain. 1) Region-language details the official languages used in different administrative regions around the world. This dataset is extracted from Wiki44k (Ho et al., 2018)... 3) Disease-symptom describes the symptoms associated with various diseases. We extracted diseases represented by a single token and their symptoms from a dataset available on Kaggle2. Our code and datasets are included as supplemental materials and will be made available upon acceptance. |
| Dataset Splits | No | The paper describes creating three new datasets (Region-language, Animal-behavior, Disease-symptom) for evaluation and discusses performance metrics like MRR and Hit@k, as well as F1 score and mAP for concept classification. However, it does not explicitly mention any specific training, validation, or test splits for these datasets within the provided text. |
| Hardware Specification | Yes | All experiments were conducted on machines equipped with Nvidia A100 GPU. Computational resources All experiments were conducted on machines equipped with 4 Nvidia A100 GPU. |
| Software Dependencies | No | We use the PyTorch Transformer library for all model implementations.3. The paper mentions PyTorch Transformer library but does not specify a version number. |
| Experiment Setup | Yes | We instantiate our approach (i.e. Def. 8) with BERT and dub it as Bert Lattice. We consider three variants of BERT models: BERT-distill, BERT-base, and BERT-large. For each model, we use the uncased versions and compare the Average pooling and Max pooling variants, denoted as Bert Lattice (avg.) and Bert Lattice (max.), respectively. ... we apply a min-max normalization approach where, given a threshold α, the binarization is performed as follows: Ig,m = log(ˆIg,m) min log(ˆI) max log(ˆI) min log(ˆI) > α. ... Fig. 2d illustrates the resulting lattice with α = 0.5. |