Nonlinear Sequence Embedding by Monotone Variational Inequality

Authors: Jonathan Y. Zhou, Yao Xie

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show the competitive performance of our method on real-world time-series data with baselines and demonstrate its effectiveness for symbolic text modeling and RNA sequence clustering. ... In our real-data experiments, the iteration cost is typically dominated by the cost of computing the gradient, which can be mitigated by stochastic approximation. ... Section 4 EXPERIMENTS We first illustrate parameter recovery using synthetic univariate time-series in Section 4.1. ... Section 4.2 describes benchmarks using real-world time-series data from the UCR Time Series Classification Archive (Dau et al., 2018). We report classification and runtime performance against a number of baselines. Section 4.3 provides two illustrations on embedding of real-world sequence data.
Researcher Affiliation Academia Jonathan Y. Zhou, Yao Xie School of Industrial & Systems Engineering Georgia Institute of Technology Atlanta, GA 30332 EMAIL, EMAIL
Pseudocode Yes We detail an extragradient scheme with backtracking for nuclear norm constrained VI in Algorithm 1 of Appendix A, which addresses the following general problem ... Algorithm 1 Extragradient Method with Backtracking for Nuclear Norm constrained VI
Open Source Code Yes The implementation is available at https://github.com/XSpace2013/Low Rank Time Series Recovery.
Open Datasets Yes Section 4.2 describes benchmarks using real-world time-series data from the UCR Time Series Classification Archive (Dau et al., 2018). ... a series of excerpts taken either from the works of Lewis Carroll or abstracts scraped from ar Xiv (Carroll, 1865; 1871; Kaggle Team, 2020). ... apply our method to the clustering of gene sequences for strains of Influenza A and Dengue viruses (Sayers et al., 2022). ... We retrieved the raw text of Alice s Adventures in Wonderland and Through the Looking Glass from Project Gutenberg 2. For the paper abstracts, we used the training portion of the ML-Ar Xiv-Papers dataset 3. ... The Influenza A virus genome data (n = 949) is acquired from the NCBI Influenza Virus Resource (Bao et al., 2008). ... We consider n = 1562 full Dengue virus genomes downloaded from the NCBI Virus Variation Resource (Hatcher et al., 2017).
Dataset Splits Yes Each dataset using its default train/test split. ... We split our data into testing and training splits according to those given by the UCR repository.
Hardware Specification Yes We evaluated all experiments and illustrations using a cluster with 24 core Intel Xeon Gold 6226 CPU (2.7 GHZ) processors, and NVIDIA Tesla V100 Graphics coprocessors (16 GB VRAM), and 384 GB of RAM.
Software Dependencies No We implement Algorithm 1, and associated subroutines (evaluation of the monotone field Ψ, as defined in Equation (8), incremental simplex/nuclear ball projection), using the Julia programming language. The implementation is available at https://github.com/XSpace2013/Low Rank Time Series Recovery.
Experiment Setup Yes We embed the data without supervision by solving (8) using the extragradient scheme given in Algorithm 1 of Appendix A with a look-back length of d = 20 , running the algorithm for 256 steps using a linear link function. The value of λ is selected via a two-step process: first, bisection identifies when the solution becomes rank-one, and then a grid search refines the choice for rank-constrained parameters. ... we perform cross-validated grid search (based on k = 5 folds) across KNNs with k = {2i | i [0, 4]} neighbors or SVMs with RBF kernels with penalty values c {2i | i [ 10, 15]} . ... To find the embedding, we run Algorithm 1 for 256 iterations.