Divergence of Neural Tangent Kernel in Classification Problems

Authors: Zixiong Yu, Songtao Tian, Guhan Chen

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 6 NUMERICAL EXPERIMENTS. We conducted experiments on a synthetic dataset, using the previously mentioned fully connected network and residual network architectures. [...] We conducted an experiment on the MNIST dataset, using parity (odd or even) as the criterion for binary classification.
Researcher Affiliation Collaboration Zixiong Yu Noah s Ark Lab Huawei Technologies Ltd. Shenzhen, Guangdong, China EMAIL Songtao Tian Department of Mathematical Sciences Tsinghua University Haidian District, Beijing, China EMAIL Guhan Chen Department of Statistics and Data Science Tsinghua University Haidian District, Beijing, China EMAIL
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks. It primarily uses mathematical notation and descriptive text to explain methodologies.
Open Source Code No The paper does not contain any explicit statements or links indicating that source code for the methodology is openly available.
Open Datasets Yes We conducted an experiment on the MNIST dataset, using parity (odd or even) as the criterion for binary classification.
Dataset Splits No For the synthetic dataset, the paper states: "For the training set, we generate 6 input vectors uniformly distributed on the unit sphere Sd, i.e., the points {xi}i [6] are (cos θi, sin θi), where θi = iπ 3 . The labels of the points are (0, 1, 0, 1, 0, 1), respectively." This describes the dataset generation, not specific training/test/validation splits. For the MNIST dataset, it mentions using parity for binary classification but provides no specific split information.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments, such as GPU or CPU models.
Software Dependencies No The paper does not specify any software dependencies or their version numbers used in the experiments.
Experiment Setup Yes The learning rate is set to 0.1, and the network is trained for 10, 000 epochs. [...] The network has a width of m = 500, with a learning rate of lr = 0.5, and was trained for epoch = 100, 000.