reproducibilityindex.ai

Run 20260225

Model: gemini-2.5-flash
Temperature: -1.0
Top P: -1.0

Prompt

You will be provided with a research paper, and your task is to answer questions about the contents of the research paper.

Now, please analyze the following research paper:

Questions

## Questions for Paper Analysis

Return the results of your analysis as a valid JSON object with the following structure. Each JSON key must contain two sub-elements:

### Output Format Requirements

Return your analysis as a valid JSON object with the following structure. Each field must contain two sub-elements:
- **result**: An Int with the specific value as defined for each question (varies by question - see below)
- **paper_text**: A string containing the relevant text excerpt from the paper that supports your answer, or a brief explanation if not found

The questions and JSON keys are as follows:


The academic paper that has been converted from PDF to text, give me a list of the authors: 

- first name
- last name 
- email address 
- institution name or company name
- department
- city
- country (ISO 3166-1 alpha-3 code)
- first author (True or False)
 
Consider that there might be errors with the PDF to text conversion, and the comma and spaces 
might not have been translated properly. Spaces and commas might be seperating "Last_name, 
First_name" instead of sperating names the individual names. Verify the order makes sense, it 
could be First Last or it could be Last, First. 

**Do not include the department in the institution or company name.** 
For example, if the author is from "Computer Science Department, Stanford University", 
the institution is "Stanford University" and the department is "Computer Science Department".

**Do not use the email address to assign the department.**

**Only use the ISO 3166-1 alpha-3 code for the country.** **DO NOT use ISO 3166-1 alpha-2 
IOC, or any other codes.** For example, use "USA" instead of "US" for United States, use "CHE"
instead of "SUI" for Switzerland, and use "DEU" instead of "GER" for Germany.

Only use the email address to help identify the institution or company, or country.

If it is a **Known** **University** and you are positive of the city and country, add the 
city and country. Do not try to guess the city or country if it is **NOT** a university. 
For example, if the authors only list "Brown University" as the institution. with no city 
or country listed, include "Providence" as the city and "USA" as the country, but if 
the authors only list "Google Research" as the instution do not include a city or country, 
unless it was included by the authors.

If the authors only include a **common and unique** acronmn for a **University** as the 
institution change it to the full name. For example, if the authors state UIUC is the 
institution, then use University of Illinois Urbana-Champaign as the institution. **Do
not guess if you are unsure or if it is not a common and unique acronmn for a University**
**Do not change the institution name unless you are sure it is a University.**

**ALWAYS use title case for the first name, last name, institution, and department.**

**ALWAYS use lowercase for the email address.**

**ALWAYS use uppercase for the country.**

If an author has multiple institutional affiliations, **only use the first one.**

If one of the author's details are missing, return an empty string for that field.
 Return the results as JSON with a single root key authors_list, whose value is an array of 
author objects containing the following keys:      
- first_name 
- last_name 
- email
- institution
- department
- city
- country
- first_author (True or False)

Determine whether the paper is based on **experimental research** or **theoretical research**.

Return **Experimental** (0) if:
- It conducts empirical studies, including running experiments, analyzing data, reporting metrics, or validating hypotheses
- Indicators include evaluation on datasets, comparisons to baselines, test/train/validation splits, ablation studies, and performance tables or graphs

Return **Theoretical** (1) if:
- It focuses only on conceptual or mathematical contributions without empirical validation
- Examples include algorithm design, proofs, lemmas, or symbolic derivations
- These papers may include pseudocode but do not analyze results from actual experiments

If the paper includes both theoretical and empirical components, classify it as **Experimental**.

Quote the text from the paper that supports your decision.

**Does the paper conduct EMPIRICAL STUDIES WITH DATA ANALYSIS (experiments, dataset evaluation, performance metrics, or hypothesis validation) rather than purely theoretical work?**
 Return 0 for experimental research or 1 for theoretical framework and use research_type as the JSON key.
Determine whether the paper's authors have **industry**, **academic**, or **collaborative** affiliations.

The authors' affiliations and email addresses are usually listed in the first few paragraphs of the paper. Use this information to classify the affiliation type:

Return **2** (classify as Industry) if:
- All authors are affiliated with corporations or private-sector labs
- Email addresses end in domains like `.com`, `.ai`, `.tech`, or have company names like "IBM," "Google," "DeepMind," etc.

Return **0** (classify as Academia) if:
- All authors are from universities or public research institutions
- Email domains include `.edu`, `.ac.uk`, `.edu.cn`, or subdomains like `cs.cmu.edu`, `eecs.mit.edu`, etc.

Return **1** (classify as Collaboration) if:
- There is a mix of academic and industry affiliations based on the institutional names or email domains

Quote the text from the paper that supports your decision. If the answer is No, explain briefly why the information is insufficient.

**Does the paper provide CLEAR INSTITUTIONAL AFFILIATIONS (university names, company names, or email domains) that allow classification of author affiliation types?**
 Return 0 for academia only, 1 for collaboration of both academia and industry, or 2 for industry only and use affiliation as the JSON key.
Determine whether the paper contains **pseudocode** or a clearly labeled algorithm block.

Return **Yes** if:
- The paper includes a figure, block, or section labeled "Pseudocode", "Algorithm", or "Algorithm X"
- Structured steps for a method or procedure formatted like code or an algorithm (even if not explicitly called "pseudocode")

Return **No** if:
- The paper only describes steps in regular paragraph text without structured formatting

Quote the text from the paper that supports your decision. If the answer is No, explain briefly why the information is insufficient.

**Does the paper contain STRUCTURED PSEUDOCODE OR ALGORITHM BLOCKS (clearly labeled algorithm sections or code-like formatted procedures)?**
 Return 1 for yes or 0 for no and use pseudocode as the JSON key.
Determine whether the paper provides **open-source code for the methodology it describes**.

Return **Yes** if:
- The paper includes an unambiguous sentence where the authors state they are releasing the code **for the work described in this paper** (e.g., "We release our code...", "The source code for our method is available at...")
- A direct link to a source-code repository (e.g., GitHub, GitLab, Bitbucket) that contains the code for the paper's methodology
- A clear statement that code is provided in **supplementary material**, **appendices**, or via an **anonymous review link**

Return **No** if:
- The code is promised for the future using phrases like "we plan to release...", "code will be made available...", or "our tool will be publicly available..."
- The code is only available "upon request"
- The text discusses the source code of a **third-party tool or platform that the authors used**, but does not provide their own implementation code
- The link points to a resource that is explicitly a `dataset`, `benchmark`, `corpus`, `taxonomy`, or `data`, and does not also clearly host the source code
- The URL provided is for a general domain, a personal homepage, **a project demonstration page, or a high-level project overview page** instead of a specific code repository
- **The text is ambiguous or lacks a clear, affirmative statement of release**

Quote the text from the paper that supports your decision. If the answer is No, explain briefly why the information is insufficient.

**Does the paper provide CONCRETE ACCESS TO SOURCE CODE (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described in this paper?**
 Return 1 for yes or 0 for no and use open_source_code as the JSON key.
Determine whether the paper explicitly states that the **dataset** used in the experiments is **publicly available** or an **open** dataset.

Return **Yes** if:
- The paper uses a well-known public dataset (e.g., "CIFAR-10", "MNIST", "ImageNet", "COCO", "Penn Treebank")
- Provides a direct URL, DOI, or specific repository name (e.g., GitHub, Zenodo, Figshare) where the dataset can be accessed
- Cites a published paper or resource that contains the dataset with proper bibliographic information **including author names and year in brackets or parentheses**
- States the dataset is in supplementary material with specific file names or section references
- **References standard academic datasets with citations** or **mentions datasets from well-known repositories or benchmarks with proper attribution**

Return **No** if:
- The paper mentions the dataset but gives no indication of availability
- The dataset is proprietary, private, or internal
- The authors created their own dataset but do not provide public access (no link, DOI, repository, or citation)
- The paper describes or mentions a dataset but does not provide any source, link, citation, or repository information for accessing it
- The dataset is described as "available upon request" or "available from authors" without permanent public access
- The paper only describes the dataset characteristics, collection process, or statistics without providing access information
- The paper mentions using "publicly available data" but does not specify the exact source or provide access details
- **The dataset name is mentioned but no citation, link, repository, or author attribution is provided**

Quote the text from the paper that supports your decision. If the answer is No, explain briefly why the information is insufficient.

**Does the paper provide CONCRETE ACCESS INFORMATION (specific link, DOI, repository name, formal citation with authors/year, or reference to established benchmark datasets) for a publicly available or open dataset?**
 Return 1 for yes or 0 for no and use open_datasets as the JSON key.
Determine whether the paper explicitly provides **training/test/validation dataset splits** needed to reproduce the experiment.

Return **Yes** if:
- The paper specifies exact split percentages (e.g., "80/10/10 split", "70% training, 15% validation, 15% test")
- Provides absolute sample counts for each split (e.g., "40,000 training samples, 5,000 validation, 5,000 test")
- References predefined splits with citations (e.g., "we use the standard train/test split from [Author et al., 2020]")
- Mentions specific file names or URLs for custom splits (e.g., "train.txt, val.txt, test.txt available at...")
- Describes stratified or group-based splitting methodology (e.g., "stratified by class", "split by subject ID", "temporal split")
- Provides random seed with splitting strategy (e.g., "random split with seed 42")
- Specifies cross-validation setup (e.g., "5-fold cross-validation", "leave-one-out cross-validation")
- Uses standard benchmark splits that are well-defined (e.g., "CIFAR-10 standard split", "ImageNet validation set")

Return **No** if:
- References datasets without mentioning splits (e.g., "we use the XYZ dataset")
- Splits are mentioned vaguely (e.g., "we split the data appropriately")
- Only mentions total dataset size without split information
- Defers split details to supplementary materials or other papers without providing access
- Uses phrases like "standard split" or "typical split" without specification

Quote the text from the paper that supports your decision. If the answer is No, explain briefly why the information is insufficient.

**Does the paper provide SPECIFIC DATASET SPLIT INFORMATION (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning?**
 Return 1 for yes or 0 for no and use dataset_splits as the JSON key.
Determine whether the paper **explicitly describes the hardware used** to run its experiments.

Return **Yes** if:
- The paper mentions any specific hardware setup, including:
  - **Specific GPU models** (e.g., "NVIDIA A100", "RTX 2080 Ti", "Tesla V100")
  - **Specific CPU models or processors** (e.g., "Intel Core i7-4770K", "Intel Xeon E5-2630", "AMD Ryzen 9 5950X", "Intel Core i5-3210M", "Intel i7-2600")
  - **TPU or other accelerator references** (e.g., "TPU v2", "Google TPU", "Intel Neural Compute Stick")
  - **Cloud or cluster resources with specs** (e.g., "AWS p3.8xlarge with V100 GPUs", "Google Cloud TPU v3")
  - **Computer specifications with processor details** (e.g., "laptop with Intel Core i5", "workstation with Intel Xeon", "desktop PC with Intel Core i7")
- Direct statements about where experiments were run, such as:
  - "We trained our models using..."
  - "Experiments were performed on..."
  - "Hardware specifications include..."
  - "All experiments ran on..."
  - "Our machine has..."
  - "Results are obtained on..."

Return **No** if:
- No hardware is mentioned
- The paper only discusses software, datasets, or **vague terms like "on a GPU", "using a High Performance Computing Resource", or "on a server" without any specific model numbers, processor types, or memory details**
- Any mention of hardware is disconnected from the experimental process
- The hardware mentioned (like mobile phones, tablets, or IoT devices) is not used for training or inference in the experiments
- **Only general computing environments are mentioned without any specific hardware details (e.g., "on a cluster", "using cloud computing" with no specifications)**

Quote the text from the paper that supports your decision. If the answer is No, explain briefly why the information is insufficient.

**Does the paper provide SPECIFIC HARDWARE DETAILS (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments?**
 Return 1 for yes or 0 for no and use hardware_specification as the JSON key.
Determine whether the paper provides a reproducible description of the ancillary software. A reproducible description **must include specific version numbers** for key software components.

Return **Yes** if the paper meets one of the following criteria:
- It lists multiple key software components with their versions (e.g., "Python 3.8, PyTorch 1.9, and CUDA 11.1")
- It names a self-contained solver, simulation environment, or specialized package with a specific version number (e.g., "CPLEX 12.4", "Gecode 4.2.0", "Choco 2.1.5")

Return **No** if the paper only mentions:
- Software names without version numbers (e.g., "using Caffe", "the scikit-learn package")
- A programming language, even with a version, without listing any versioned libraries or solvers (e.g., "implemented in Java 7" by itself is not enough)

Quote the text from the paper that supports your decision. If the answer is No, explain briefly why the information is insufficient.

**Does the paper provide SPECIFIC ANCILLARY SOFTWARE DETAILS (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment?**
 Return 1 for yes or 0 for no and use software_dependencies as the JSON key.
Determine whether the paper explicitly provides details about the **experimental setup**, especially hyperparameters or system-level training settings.

Return **Yes** if:
- The paper contains specific hyperparameter values (e.g., learning rate, batch size, number of epochs, optimizer settings)
- Details on model initialization, dropout rate, or training schedules
- A clearly labeled table or paragraph describing training settings
- A section titled "Experimental Setup" or similar with configuration information

Return **No** if:
- The paper only mentions that "we trained a model" or refers to "standard settings" without elaboration
- The only details about training are deferred to supplemental materials, code, or prior work
- There is no mention of hyperparameters, optimizer settings, or explicit configuration steps

Quote the text from the paper that supports your decision. If the answer is No, explain briefly why the information is insufficient.

**Does the paper contain SPECIFIC EXPERIMENTAL SETUP DETAILS (concrete hyperparameter values, training configurations, or system-level settings) in the main text?**
 Return 1 for yes or 0 for no and use experiment_setup as the JSON key.