Classifier Cascades and Trees for Minimizing Feature Evaluation Cost
Authors: Zhixiang (Eddie) Xu, Matt J. Kusner, Kilian Q. Weinberger, Minmin Chen, Olivier Chapelle
JMLR 2014 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we evaluate CSTC on a synthetic cost-sensitive learning task and compare it with competing algorithms on two large-scale, real world benchmark problems. Additionally, we discuss the differences between our models for several learning settings. We provide further insight by analyzing the features extracted on a these data sets and looking at how CSTC tree partitions the input space. We judge the effect of the cost-sensitive regularization by looking at how removing terms and varying parameters affects CSTC on real world data sets. We also present detailed results of CSTC on a cost-sensitive version of the MNIST data set, demonstrating that it extracts intelligent per-instance features. |
| Researcher Affiliation | Collaboration | Zhixiang (Eddie) Xu EMAIL Matt J. Kusner EMAIL Kilian Q. Weinberger EMAIL Department of Computer Science Washington University 1 Brookings Drive St. Louis, MO 63130, USA Minmin Chen EMAIL Olivier Chapelle EMAIL Criteo 411 High Street Palo Alto, CA 94301, USA |
| Pseudocode | Yes | Algorithm 1 CSTC global optimization |
| Open Source Code | No | The paper does not contain an explicit statement about the release of source code, nor does it provide a link to a code repository or supplementary materials containing code. |
| Open Datasets | Yes | To evaluate the performance of CSTC on real-world tasks, we test it on the Yahoo! Learning to Rank Challenge (LTR) data set. The set contains 19,944 queries and 473,134 documents. Each query-document pair xi consists of 519 features. We created a cost-sensitive binary MNIST data set by first extracting all images of digits 3 and 8. We resized them to four different resolutions: 4 x 4, 8 x 8, 16 x 16, and 28 x 28 (the original size), and concatenated all features, resulting in d = 1120 features. |
| Dataset Splits | Yes | We compute the validation error of the initialized CSTC tree at each node. Starting with the leaf nodes, we then prune away nodes that, upon removal, do not decrease the validation performance (in the case of ranking data, we even can use validation NDCG (Järvelin and Kekäläinen, 2002) as our pruning criterion). For CSTC we evaluate eight settings, λ = {1/2, 1, 2, 3, 4, 5, 6}. In the case of stage-wise regression, which is not cost-sensitive, the curve is simply a function of boosting iterations. We include CSTC with and without fine-tuning. To select hyperparameters C (SVM cost) and γ (kernel width) we used 100 rounds of Bayesian optimization on a validation set (we found C = 753.1768, γ = 0.0198). An RBF-SVM trained with these hyperparameters achieves a test error of 0.005 for cost c = 1120. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments. |
| Software Dependencies | No | The paper mentions techniques like "gradient boosted regression trees" and "support vector machine (SVM) with a radial basis function (RBF) kernel" but does not specify the software libraries or their version numbers used for implementation. |
| Experiment Setup | Yes | To introduce non-linearity, we transform the input features into a non-linear feature space x -> φ(x) with the boosting trick (see Section 5) with T = 3000 iterations of gradient boosting and CART trees of maximum depth 4. Unless otherwise stated, we determine the CSTC depth by validation performance (with a maximum depth of 10). For CSTC we evaluate eight settings, λ = {1/2, 1, 2, 3, 4, 5, 6}. We set the maximum number of Cronus nodes to 10, and set all other parameters (e.g., keep ratio, discount, early-stopping) based on a validation set. To select hyperparameters C (SVM cost) and γ (kernel width) we used 100 rounds of Bayesian optimization on a validation set (we found C = 753.1768, γ = 0.0198). |