Sum-of-norms clustering does not separate nearby balls
Authors: Alexander Dunlap, Jean-Christophe Mourrat
JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we supplement our theoretical results with some numerical experiments; see also Figures 1.1 and 1.2. The code is available at https://github.com/ajdunlap/son-clustering-experiments. Our experiments were performed using the algorithm of Jiang and Vavasis (Preprint, 2020). This algorithm provides a certificate that the ouput clustering is correct. When λ is very close to a value at which the number of clusters changes, limitations on computer time and numerical accuracy make it difficult to perform the calculations to sufficient accuracy to obtain the certificate. In particular, for situations such as that described by Theorem 1.1, the SON clustering algorithm becomes numerically very challenging to resolve for λ close to λc, while the clustering structures that are produced for other values of λ are not the expected partition into two parts. This further clarifies how the SON algorithm fails to resolve this clustering problem successfully in practice. Further work would be required to numerically probe the behavior of the algorithm very close to these critical values of λ. The results are shown in Figure 3.2. |
| Researcher Affiliation | Academia | Alexander Dunlap EMAIL Courant Institute of Mathematical Sciences New York University New York, NY 10012, USA Current address: Duke University, Durham, NC 27708, USA. Jean-Christophe Mourrat EMAIL Ecole Normale Sup erieure de Lyon and CNRS Lyon, France, and Courant Institute of Mathematical Sciences New York University New York, NY 10012, USA. |
| Pseudocode | No | The paper describes methods and algorithms in prose and through mathematical formulations, but it does not include any explicitly labeled pseudocode blocks or algorithm listings in a structured format. |
| Open Source Code | Yes | All figures in this paper were generated using an implementation (by the present authors) of the algorithm described in Jiang and Vavasis (Preprint, 2020). The code is available at https://github.com/ajdunlap/son-clustering-experiments. |
| Open Datasets | No | The paper uses synthetic data generated based on theoretical models, such as points sampled from uniform measures on balls, spheres, or vertices of polygons, and points on a rectangular lattice. It does not refer to or provide access information for any pre-existing publicly available datasets. For example: "sum-of-norms clustering of the stochastic ball model with N = 200 datapoints drawn from B( -1.05e1, 1) B(1.05e1, 1)", "µ be a uniform measure on the unit ball B1(0)", and "µ be a uniform measure on the vertices of the regular n-gon". |
| Dataset Splits | No | The paper generates synthetic data for its experiments or uses specific geometric configurations (e.g., points on n-gons, points sampled from a union of balls). It does not involve splitting an existing dataset into training, validation, or test sets; therefore, dataset split information is not applicable or provided. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the numerical experiments or simulations. |
| Software Dependencies | No | The paper mentions that figures were generated using an "implementation (by the present authors) of the algorithm described in Jiang and Vavasis (Preprint, 2020)", but it does not specify any particular software dependencies (e.g., programming languages, libraries, frameworks) with version numbers. |
| Experiment Setup | Yes | In Section 3, "Numerical experiments", the paper specifies parameters for its simulations. For polygons: "We test this numerically with n = 8 and r = 1.7. In this case, 2n tan π/(2n) 3.18. We perform simulations with λ = 3.1, 3.3, 3.5 (noting that 3.1 < 2n tan π/(2n) < 3.3 < 2r < 3.5)". For the d-ball: "We approximate the interior of the ball by the set of all points on a rectangular lattice with spacing δ lying inside the ball, i.e. {x δZ2 | |x| 1}, and compute the number of clusters for varying choices of λ. The results are shown in Figure 3.2." The figure captions further specify δ values (e.g., "δ = 0.1150"). |