Nonlinear Sequence Embedding by Monotone Variational Inequality
Authors: Jonathan Y. Zhou, Yao Xie
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show the competitive performance of our method on real-world time-series data with baselines and demonstrate its effectiveness for symbolic text modeling and RNA sequence clustering. ... In our real-data experiments, the iteration cost is typically dominated by the cost of computing the gradient, which can be mitigated by stochastic approximation. ... Section 4 EXPERIMENTS We first illustrate parameter recovery using synthetic univariate time-series in Section 4.1. ... Section 4.2 describes benchmarks using real-world time-series data from the UCR Time Series Classification Archive (Dau et al., 2018). We report classification and runtime performance against a number of baselines. Section 4.3 provides two illustrations on embedding of real-world sequence data. |
| Researcher Affiliation | Academia | Jonathan Y. Zhou, Yao Xie School of Industrial & Systems Engineering Georgia Institute of Technology Atlanta, GA 30332 EMAIL, EMAIL |
| Pseudocode | Yes | We detail an extragradient scheme with backtracking for nuclear norm constrained VI in Algorithm 1 of Appendix A, which addresses the following general problem ... Algorithm 1 Extragradient Method with Backtracking for Nuclear Norm constrained VI |
| Open Source Code | Yes | The implementation is available at https://github.com/XSpace2013/Low Rank Time Series Recovery. |
| Open Datasets | Yes | Section 4.2 describes benchmarks using real-world time-series data from the UCR Time Series Classification Archive (Dau et al., 2018). ... a series of excerpts taken either from the works of Lewis Carroll or abstracts scraped from ar Xiv (Carroll, 1865; 1871; Kaggle Team, 2020). ... apply our method to the clustering of gene sequences for strains of Influenza A and Dengue viruses (Sayers et al., 2022). ... We retrieved the raw text of Alice s Adventures in Wonderland and Through the Looking Glass from Project Gutenberg 2. For the paper abstracts, we used the training portion of the ML-Ar Xiv-Papers dataset 3. ... The Influenza A virus genome data (n = 949) is acquired from the NCBI Influenza Virus Resource (Bao et al., 2008). ... We consider n = 1562 full Dengue virus genomes downloaded from the NCBI Virus Variation Resource (Hatcher et al., 2017). |
| Dataset Splits | Yes | Each dataset using its default train/test split. ... We split our data into testing and training splits according to those given by the UCR repository. |
| Hardware Specification | Yes | We evaluated all experiments and illustrations using a cluster with 24 core Intel Xeon Gold 6226 CPU (2.7 GHZ) processors, and NVIDIA Tesla V100 Graphics coprocessors (16 GB VRAM), and 384 GB of RAM. |
| Software Dependencies | No | We implement Algorithm 1, and associated subroutines (evaluation of the monotone field Ψ, as defined in Equation (8), incremental simplex/nuclear ball projection), using the Julia programming language. The implementation is available at https://github.com/XSpace2013/Low Rank Time Series Recovery. |
| Experiment Setup | Yes | We embed the data without supervision by solving (8) using the extragradient scheme given in Algorithm 1 of Appendix A with a look-back length of d = 20 , running the algorithm for 256 steps using a linear link function. The value of λ is selected via a two-step process: first, bisection identifies when the solution becomes rank-one, and then a grid search refines the choice for rank-constrained parameters. ... we perform cross-validated grid search (based on k = 5 folds) across KNNs with k = {2i | i [0, 4]} neighbors or SVMs with RBF kernels with penalty values c {2i | i [ 10, 15]} . ... To find the embedding, we run Algorithm 1 for 256 iterations. |