Joint Continuous and Discrete Model Selection via Submodularity
Authors: Jonathan Bunton, Paulo Tabuada
JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we numerically validate our theoretical results with several proof-of-concept examples with synthetic and real-world data, comparing against state-of-the-art algorithms. |
| Researcher Affiliation | Academia | Jonathan Bunton EMAIL Paulo Tabuada EMAIL Department of Electrical & Computer Engineering University of California, Los Angeles 420 Westwood Plaza, Box 951594 Los Angeles, CA 90095-1554, USA |
| Pseudocode | No | The paper describes algorithms conceptually and refers to existing algorithms (e.g., 'projected subgradient descent', 'minimum-norm point algorithm') but does not present any pseudocode or structured algorithm blocks explicitly within its text. |
| Open Source Code | No | The paper discusses various algorithms used for comparison and their implementations, such as 'IBM s CPLEX 12.8 constrained quadratic program solver in MATLAB' and 'the minimum-norm point algorithm from Fujishige and Isotani (2011) as implemented in MATLAB by Krause (2010)'. However, it does not explicitly state that the authors' own code for the methodology described in the paper is available or provide a link to a repository. |
| Open Datasets | Yes | We create this scenario with real retail sales data collected from a UK-based online retail store available in the UCI Machine Learning Repository (Dua and Graff, 2017; Chen et al., 2012). |
| Dataset Splits | No | The paper mentions using 'synthetic and real-world data' and refers to 'random problem instances' and 'noisy measurements'. For the retail sales data, it states 'we can use historical data to build a local linear approximation'. However, it does not provide specific details on how these datasets were split into training, validation, or test sets (e.g., percentages, sample counts, or explicit splitting methodology). |
| Hardware Specification | Yes | The experiments were all run on a laptop with an AMD Ryzen 9 4900HS CPU and 16GB of RAM. |
| Software Dependencies | Yes | In our implementation, we use the Pairwise Frank-Wolfe algorithm to solve this convex problem, with all relevant results plotted in blue and labeled Cont Submodular. The projected subgradient method is known to provide approximation guarantees even in the non-submodular case (El Halabi and Jegelka, 2020), but as shown in Section 4.2, amounts to a specific choice of algorithms in our theory. The algorithm operates by solving an equivalent convex optimization problem in particular, minimizing the Lov asz extension of g + H over [0, 1]n using projected subgradient descent. To implement this approach, we use IBM s CPLEX 12.8 constrained quadratic program solver in MATLAB to evaluate the function H (as expressed in (16)) and use Polyak s rule for updating the step size. The relevant results are plotted in red, and labeled PGD + CPLEX in figures. Our approach is agnostic to the choice of convex optimization and submodular set function minimization routines, so we also use CPLEX to evaluate H. To highlight the utility of an algorithm-agnostic approach, we also implement an active-set method for fast non-negative quadratic programming to evaluate H (Bro and De Jong, 1997). For the submodular set function minimization algorithm, we use the minimum-norm point algorithm from Fujishige and Isotani (2011) as implemented in MATLAB by Krause (2010), coupled with the semi-gradient lattice pruning strategy proposed by Iyer et al. (2013) which has quadratic complexity and drastically reduces the problem size. |
| Experiment Setup | Yes | In our examples, we consider the domain [0, 1]n Rn 0 and set the discretization level to k = 51 unless otherwise specified. ... we set the regularization strength to λ = 0.05 so that both the functions f and g play nontrivial roles in the combined objective function. ... We then let µ = 0.8 in (56) and λ = 0.05 in (57) so that the overall problem s cost function has nontrivial contributions from both the smoothness-promoting function and the sparsityinducing regularizer. ... The various methods are given identical cost functions to minimize, and are run until either convergence to suboptimality below 10 4 or a maximum of 100 iterations. |