reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

JsonGrinder.jl: automated differentiable neural architecture for embedding arbitrary JSON data

Authors: Šimon Mandlík, Matěj Račinský, Viliam Lisý, Tomáš Pevný

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Table 1 shows that the default setting of our framework, where the JSON embedding is followed by a simple feed-forward classiﬁcation network, reaches a very good performance oﬀ-the-shelf (Default), while further tuning (Tunned) allows reaching the performance of competing approaches (Comp.).
Researcher Affiliation	Collaboration	ˇSimon Mandl ık EMAIL AIC, FEE, Czech Technical University in Prague Avast Software s.r.o. Matˇej Raˇcinsk y EMAIL Avast Software s.r.o. Viliam Lis y EMAIL AIC, FEE, Czech Technical University in Prague Avast Software s.r.o. Tom aˇs Pevn y EMAIL AIC, FEE, Czech Technical University in Prague Avast Software s.r.o.
Pseudocode	No	The paper describes steps for creating a model in Section 3 and Figure 3, but these are descriptive textual steps and a flowchart, not structured pseudocode or algorithm blocks.
Open Source Code	Yes	Experimental details can be found at https://github.com/CTUAvast Lab/Json Grinder Examples. The complete example is available at https://github.com/CTUAvast Lab/Json Grinder.jl/blob/ master/examples/mutagenesis.jl. Json Grinder.jl is registered and can be added by typing Pkg.add("Json Grinder") command.
Open Datasets	Yes	In the Device ID challenge (CSP, 2019) hosted by kaggle.com, the samples originate from a network scanning tool. In EMBER (Anderson and Roth, 2018), the samples were produced by a binary ﬁle analyzer. Mutagenesis (Debnath et al., 1991) describes molecules trialed for mutagenicity on Salmonella typhimurium.
Dataset Splits	No	The paper mentions dataset sizes in Table 1 (e.g., 'Device ID 0.1k-0.3M'), but does not provide specific details on how these datasets were split into training, validation, or test sets.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The framework is written in the Julia language (Bezanson et al., 2017), and it is fully integrated with the Julia ecosystem. It uses Flux.jl for the implementation of neural networks and allows to use any automatic diﬀerentiation engine interfacing with Chain Rules Core.jl. However, specific version numbers for these software components are not provided.
Experiment Setup	No	The paper mentions 'tuned hyperparameters' in Table 1, but does not provide concrete values for any hyperparameters, training configurations, or system-level settings in the main text.