Explicit Document Modeling through Weighted Multiple-Instance Learning
Authors: Nikolaos Pappas, Andrei Popescu-Belis
JAIR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our model achieves state-of-the-art performance on multi-aspect sentiment analysis, improving over several baselines. Moreover, the predicted saliency weights are close to human estimates obtained by crowdsourcing, and increase the performance of lexical and topical features for review segmentation and summarization. ... Figure 3 displays the performance of the proposed model for aspect rating prediction. ... Table 2 displays the mean squared error (MSE) on a test set with 1,000 reviews from Beer Advocate, using 260k reviews for training. ... Table 3 shows the performance of the models with various MIR assumptions for aspect rating prediction (columns 1 to 5). ... Figure 5: Accuracy on review segmentation (top) and on summarization (bottom) of the CRF models with BOW+MIR features, compared to several baselines. |
| Researcher Affiliation | Academia | Nikolaos Pappas EMAIL Andrei Popescu-Belis EMAIL Idiap Research Institute, Rue Marconi 19 CH-1920 Martigny, Switzerland |
| Pseudocode | Yes | Algorithm 1: SGDWeights: jointly learning the parameters of the objective in Eq. 6. |
| Open Source Code | Yes | Our code is available at https://github.com/idiap/wmil-sgd. |
| Open Datasets | Yes | We use eight public datasets (Table 1). ... The Beer Advocate, Ratebeer (ES), Rate Beer (FR), Audiobooks and Toys & Games datasets include aspect ratings assigned by the authors of the reviews, with 3 to 5 aspect dimensions. ... on the TED talks that we gathered and released earlier (Pappas & Popescu Belis, 2013),2 we aim to predict the 12-dimensional talk-level emotion ratings assigned by viewers through voting... 2. Available at https://www.idiap.ch/dataset/ted/. ... we designed a new dataset called HATDOC (Pappas & Popescu-Belis, 2016).4 ... 4. We make this dataset available at https://www.idiap.ch/paper/hatdoc/. |
| Dataset Splits | Yes | We use the same protocol as Mc Auley et al., i.e. a uniform split of the data into 50% for training and 50% for testing. ... All the models are optimized (when applicable) on a development set, i.e. a 25% subset of the training data... We experiment with 5-fold c.-v. on equal-size samples of 1,200 instances per dataset. ... For segmentation and summarization, we report the average scores of each method over five runs (Section 9.2). We compare our model with the methods used by Mc Auley et al. (2012). As Lei et al. (2016) used a modified version of Mc Auley s segmentation task to evaluate their word-based selection method, this is not directly comparable with Mc Auley s or our method. ... we evaluate them in Section 9.3 over five random splits, 80% for training and 20% for testing. |
| Hardware Specification | No | No specific hardware details such as GPU/CPU models, processor types, or memory amounts are mentioned. The paper only discusses computational complexity and potential for parallelization without specifying the actual hardware used for experiments. |
| Software Dependencies | No | For the regression models and evaluation, we use the scikit-learn library (Pedregosa et al., 2012). ... we computed sentence features based on 300-dimensional word embeddings trained on Wikipedia with word2vec (Mikolov et al., 2013). |
| Experiment Setup | Yes | The hyper-parameters to optimize for the various MIR assumptions are the regularization terms λ2 and λ1 of their regression model f. ... The hyper-parameters to optimize for APWeights are the three regularization terms ϵ1, ϵ2, ϵ3 of the ℓ2-norm for the f1, f2 and f3 regression models. ... for the Clustering MIR assumption (Wagstaffet al., 2008), we use the f2 regression model, which relies on ϵ2 and the number of clusters k, optimized over {5, ..., 50} with step 5, for its clustering algorithm, which is here the k-Means one. All the regularization terms are optimized over the same range of possible values, noted a 10b with a {1, . . . , 9} and b { 4, . . . , +4}, hence 81 values per term. The hyper-parameters for SGDWeights are the same ones as for APWeights, plus the learning rate or step size ϵ, the minibatch size m, and the gradient step strategy (learning rate decay, ADAGRAD, or ADAM). ... minibatch size (set to 50 here) ... Based on tests over a development subset, our model is trained with SGDWeights and ADAGRAD (see Section 4.2 above), with a step size of 0.001. |