JsonGrinder.jl: automated differentiable neural architecture for embedding arbitrary JSON data

Authors: Šimon Mandlík, Matěj Račinský, Viliam Lisý, Tomáš Pevný

JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Table 1 shows that the default setting of our framework, where the JSON embedding is followed by a simple feed-forward classification network, reaches a very good performance off-the-shelf (Default), while further tuning (Tunned) allows reaching the performance of competing approaches (Comp.).
Researcher Affiliation Collaboration ˇSimon Mandl ık EMAIL AIC, FEE, Czech Technical University in Prague Avast Software s.r.o. Matˇej Raˇcinsk y EMAIL Avast Software s.r.o. Viliam Lis y EMAIL AIC, FEE, Czech Technical University in Prague Avast Software s.r.o. Tom aˇs Pevn y EMAIL AIC, FEE, Czech Technical University in Prague Avast Software s.r.o.
Pseudocode No The paper describes steps for creating a model in Section 3 and Figure 3, but these are descriptive textual steps and a flowchart, not structured pseudocode or algorithm blocks.
Open Source Code Yes Experimental details can be found at https://github.com/CTUAvast Lab/Json Grinder Examples. The complete example is available at https://github.com/CTUAvast Lab/Json Grinder.jl/blob/ master/examples/mutagenesis.jl. Json Grinder.jl is registered and can be added by typing Pkg.add("Json Grinder") command.
Open Datasets Yes In the Device ID challenge (CSP, 2019) hosted by kaggle.com, the samples originate from a network scanning tool. In EMBER (Anderson and Roth, 2018), the samples were produced by a binary file analyzer. Mutagenesis (Debnath et al., 1991) describes molecules trialed for mutagenicity on Salmonella typhimurium.
Dataset Splits No The paper mentions dataset sizes in Table 1 (e.g., 'Device ID 0.1k-0.3M'), but does not provide specific details on how these datasets were split into training, validation, or test sets.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The framework is written in the Julia language (Bezanson et al., 2017), and it is fully integrated with the Julia ecosystem. It uses Flux.jl for the implementation of neural networks and allows to use any automatic differentiation engine interfacing with Chain Rules Core.jl. However, specific version numbers for these software components are not provided.
Experiment Setup No The paper mentions 'tuned hyperparameters' in Table 1, but does not provide concrete values for any hyperparameters, training configurations, or system-level settings in the main text.