Measuring Dependence Powerfully and Equitably
Authors: Yakir A. Reshef, David N. Reshef, Hilary K. Finucane, Pardis C. Sabeti, Michael Mitzenmacher
JMLR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We then introduce an efficiently computable consistent estimator of our population measure of dependence, and we empirically establish its equitability on a large class of noisy functional relationships. This new statistic has better bias/variance properties and better runtime complexity than a previous heuristic approach. After studying the bias/variance properties of MICe, we then demonstrate via simulation that it outperforms currently available methods in terms of equitability with respect to R2 on a broad set of noisy functional relationships. |
| Researcher Affiliation | Academia | Yakir A. Reshef EMAIL School of Engineering and Applied Sciences Harvard University Cambridge, MA 02138, USA; David N. Reshef EMAIL Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Cambridge, MA 02139, USA; Hilary K. Finucane EMAIL Department of Mathematics Massachusetts Institute of Technology Cambridge, MA 02139, USA; Pardis C. Sabeti EMAIL Department of Organismic and Evolutionary Biology Harvard University Cambridge, MA 02138, USA; Michael Mitzenmacher EMAIL School of Engineering and Applied Sciences Harvard University Cambridge, MA 02138, USA. |
| Pseudocode | No | The paper describes algorithms such as "Optimize XAxis" (Section 3.5) and "Equichar Clump" (Appendix H) through prose, explaining their functionality and complexity. However, it does not present these algorithms in a structured, pseudocode-like format with explicit steps or dedicated algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements about the release of source code, links to code repositories, or mentions of code being available in supplementary materials for the methodology described. |
| Open Datasets | No | The paper describes the generation of data for its experiments based on sets of functional relationships (e.g., "Q = {(x + εσ, f(x) + ε σ) : x Xf, εσ, ε σ N(0, σ2), f F, σ R 0}" from Section 4.4, or relationships from Simon and Tibshirani (2012)). It references the source of these functions or relationships but does not provide concrete access information (like a direct link, DOI, or repository) to a pre-existing, publicly available dataset used for evaluation. |
| Dataset Splits | No | The paper describes a simulation-based evaluation strategy: "For each relationship Z Q that we examined... We then simulated 500 independent samples from Z, each of size n = 500..." (Section 4.4). This refers to the generation of samples for repeated statistical testing, not the splitting of a pre-existing dataset into distinct training, validation, or test sets. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware used for running its experiments, such as particular CPU or GPU models, memory specifications, or cloud computing environments. |
| Software Dependencies | No | The paper does not explicitly mention any specific software libraries, frameworks, or programming language versions that were used to implement or run the experiments. |
| Experiment Setup | Yes | For each relationship Z Q that we examined, we used the algorithm from Theorem 18 with very conservative values of k0 and ℓ0 to compute MIC . We then simulated 500 independent samples from Z, each of size n = 500, and computed both Approx-MIC and MICe on each one to obtain estimates of the sampling distributions of the two statistics. The results... demonstrate that for a typical usage parameter of B(n) = n0.6, MICe performs substantially better than Approx-MIC overall. Second, the results show that different values of the exponent in B(n) = nα give good performance in different signal-to-noise regimes due to a bias-variance trade-offrepresented by this parameter. |