Solving the Tree Containment Problem Using Graph Neural Networks
Authors: Arkadiy Dushatskiy, Esther Julien, Leen Stougie, Leo van Iersel
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose to solve it approximately using Graph Neural Networks. Our algorithm demonstrates an accuracy of over 95% in solving the tree containment problem on instances with up to 100 leaves. Experiments: Our datasets consist of network-tree pairs (N, T) along with corresponding labels indicating tree containment... We compare the performance of Combine-GNN with the aforementioned baselines. Furthermore, we study the scalability of Combine-GNN in terms of the training dataset size and increasing size of the test instances (in the inductive learning setup). Finally, we specifically study how the different design choices of the GNN used in our algorithm affect the performance. The results of statistical significance testing are provided in Appendix, Table 3. |
| Researcher Affiliation | Academia | Arkadiy Dushatskiy EMAIL Centrum Wiskunde & Informatica Amsterdam, the Netherlands Esther Julien EMAIL Delft University of Technology Delft, the Netherlands Leen Stougie EMAIL Centrum Wiskunde & Informatica Vrije Universiteit Amsterdam Amsterdam, the Netherlands Leo van Iersel L.J.J.van EMAIL Delft University of Technology Delft, the Netherlands |
| Pseudocode | No | The paper describes the architecture of the GNN using mathematical notation for message passing, concatenation, and readout operations, but it does not provide a clearly labeled 'Pseudocode' or 'Algorithm' block with structured, step-by-step instructions. For example, it provides: 'mk i, AGGREGATEk i, {(xk 1 j , xk 1 i ) : (j, i) E} mk i, AGGREGATEk i, {(xk 1 j , xk 1 i ) : (i, j) E} xk i COMBINEk(xk 1 i , mk i, , mk i, )' and 'xfinal i CONCATENATE([x0 i , x1 i . . . , x L i ]).' |
| Open Source Code | Yes | The code and used tree containment instances are available at https://github.com/Arkadiy D/Phylo GNN. |
| Open Datasets | Yes | The code and used tree containment instances are available at https://github.com/Arkadiy D/Phylo GNN. For the the real-world experiments, we use the dataset of gene trees from bacterial and archaeal genomes (proposed in Beiko (2011), binarized in van Iersel et al. (2022)). |
| Dataset Splits | Yes | In each setup (defined by the number of leaves in the training and validation/test instances), we have 10000 training samples (network-tree pairs), 1000 samples used as validation dataset, and 1000 for the test dataset. |
| Hardware Specification | Yes | We use a system with Intel(R) Xeon(R) Silver 4110 CPU and Nvidia Ge Force RTX 2080 Ti GPUs (for this experiment, only one GPU is utilized). |
| Software Dependencies | No | The paper mentions several algorithms and optimizers used, such as 'Adam W optimizer (Loshchilov & Hutter, 2017)', 'XGBoost model (Chen & Guestrin, 2016)', 'GCN (Kipf & Welling, 2017)', 'GAT (Veličković et al., 2018)', and 'GIN (Xu et al., 2019)'. However, it does not specify the version numbers of any software libraries or frameworks used for their implementation (e.g., PyTorch, TensorFlow, scikit-learn versions). |
| Experiment Setup | Yes | We train GNNs using the Adam W optimizer (Loshchilov & Hutter, 2017) with the cosine annealing learning rate schedule, number of training steps is 5000; batch size is 200 (similar parameters were used in Rossi et al. (2023)). We performed a grid search to find the best performing architecture and the corresponding training hyperparameters for Combine-GNN. The tuned hyperparameters and their ranges are the following (the best values are underlined): initial learning rate: {10 4, 10 3, 10 2}; weight decay: {0, 10 4, 10 3}; dropout: {0.0, 0.2}; number of GNN layers: {3, 5}, size of node embeddings: {32, 64, 128}; type of GNN operation: {GCN, GIN, GAT}; readout operation: {mean, max, sum}. |