Learning Supervised PageRank with Gradient-Based and Gradient-Free Optimization Methods
Authors: Lev Bogolubsky, Pavel Dvurechenskii, Alexander Gasnikov, Gleb Gusev, Yurii Nesterov, Andrei M. Raigorodskii, Aleksey Tikhonov, Maksim Zhukovskii
NeurIPS 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In the experiments, we apply our algorithms to learning Supervised Page Rank on a real ranking task. Summing up, both two-level methods, unlike the state-of-the-art [21], have theoretical guarantees on convergence rate, and outperform it in the ranking quality in experiments. In this section, we compare our gradient-free and gradient-based methods with the state-of-the-art gradient-based method [21] on the web page ranking problem. In the next section, we describe the dataset. In Section 6.2, we report the results of the experiments. |
| Researcher Affiliation | Collaboration | Yandex1, Moscow State University2, Buryat State University8 EMAIL Pavel Dvurechensky3,4, Alexander Gasnikov4,5 Weierstrass Institute3, Institute for Information Transmission Problems RAS4, Moscow Institute of Physics and Technology5 EMAIL, EMAIL Yurii Nesterov6,7 Center for Operations Research and Econometrics6, Higher School of Economics7 EMAIL |
| Pseudocode | Yes | Algorithm 1 Gradient-type method Input: Point x0 X, stepsize h > 0, number of steps M. Set k = 0. repeat Generate ξk and calculate corresponding gτ(xk, δ). Calculate xk+1 = ΠX(xk hgτ(xk, δ)) (ΠX( ) Euclidean projection onto the set X). Set k = k + 1. until k > M Output: The point y M = arg minx{f(x) : x {x0, . . . , x M}}. Algorithm 2 Adaptive projected gradient algorithm Input: Point x0 X, number L0 > 0. Set k = 0, z = + . repeat Set Mk = Lk, flag = 0. repeat Set δ = ε 16Mk . Calculate f(xk, δ) and g(xk, δ). Find wk = arg minx Q { g(xk, δ), x + Mk V (x, xk) + h(x)} and calculate f(wk, δ). If the inequality f(wk, δ) f(xk, δ) + g(xk, δ), wk xk + Mk 2 wk xk 2 + ε 8Mk holds, set flag = 1. Otherwise set Mk = 2Mk. until flag = 1 Set xk+1 = wk, Lk+1 = Mk 2 . If Mk(xk xk+1) < z, set z = Mk(xk xk+1) , K = k. Set k = k + 1. until z ε Output: The point x K+1. |
| Open Source Code | No | The paper does not include an unambiguous statement that the authors are releasing the code for the work described in this paper, nor does it provide a direct link to a source-code repository. |
| Open Datasets | No | All experiments are performed with data of a popular commercial search engine Yandex2. We chose a random set of 600 queries Q and collected user sessions started with them. |
| Dataset Splits | No | We divide our data into two parts. On the first part Q1 (50% of the set of queries Q) we train the parameters and on the second part Q2 we test the algorithms. (No explicit mention of a separate validation set, only training and testing splits are described). |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library or solver names with version numbers. |
| Experiment Setup | Yes | The values of hyperparameters are the following: the Lipschitz constant L = 10 4 in GFN (and L0 = 10 4 in GBN), the accuracy ε = 10 6 (in both GBN and GFN), the radius R = 0.99 (in both GBN and GFN). We choose L in GFN to be equal to L0 (we show how the choice of L influences the output of the gradient-free algorithm, see supplementary materials, Figure 2). Moreover, we evaluate both our gradient-based and gradient-free algorithms for different values of the accuracies. |