Improving Cooperation in Language Games with Bayesian Inference and the Cognitive Hierarchy
Authors: Joseph Bills, Christopher Archibald, Diego Blaylock
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test this approach by constructing Bayesian agents for the game of Codenames, and show that they perform better in experiments where semantics is uncertain. Experimental evaluation of the agents will be given in Section 6. Table 2 shows the win-rate performance of the two Bayesian spymasters against different groups of guessers. |
| Researcher Affiliation | Academia | Joseph Bills, Christopher Archibald, Diego Blaylock Computer Science Department, Brigham Young University, Provo, UT, USA EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Figure 1: Bayesian Guesser evaluation of board card c |
| Open Source Code | No | The paper does not provide a direct link to a source-code repository, an explicit statement of code release, or mention of code in supplementary materials for the methodology described. |
| Open Datasets | Yes | Word2Vec (w2v) trained using a word context windows (Mikolov et al. 2013b). Dict2Vec (d2v) similar to w2v but trained on cleaned dictionary entries with an improvement on semantic similarity tasks (Tissier, Gravier, and Habrard 2017). Fast Text (ftxt) Uses bags of character n-grams with weighting by position (Mikolov et al. 2018). Glo Ve (g1, g3) trained on pre-computed statistical cooccurrence probabilities for words in a corpus (Pennington, Socher, and Manning 2014). Concept Net Numberbatch (cnnb) uses retrofitting to incorporate the Concept Net Knowledge graph into an embedding. (Speer, Chin, and Havasi 2017). ELMo (elmo) a 1024-dimensional de-contextualized embedding derived from 3 layers of a trained contextual model (Peters et al. 2018). |
| Dataset Splits | No | The paper describes playing '500 games' for each pairing, which implies simulation runs rather than a traditional dataset with specific training/test/validation splits in the machine learning sense. No explicit dataset splits are mentioned. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions several word embedding models (e.g., Word2Vec, GloVe) but does not provide specific version numbers for any software libraries, programming languages, or solvers used in its implementation. |
| Experiment Setup | Yes | The Bayesian spymasters used 10 samples, and the Bayesian guessers used 1000 or 10,000 samples. Each pairing played 500 games. To more efficiently calculate the set of possible clues, the 300 nearest neighbors of each word were precomputed. The probability that a perturbed vector would fall in the Voronoi region for any clue was precomputed using 1000 samples at each noise level. |