Banner Banner

Cross-Language Differential Testing of JSON Parsers

Jonas Möller
Felix Weißberg
Lukas Pirch
Thorsten Eisenhofer
Konrad Rieck

July 01, 2024

JSON is a widely used format for representing data on the Internet. Unfortunately, the format is imprecisely specified, which poses the risk of confusion and ambiguity when processing sensitive data. While previous work has focused on manual analysis of parsers, an automatic analysis of the interplay of multiple parsers resulting from this imprecision has received little attention so far. In this paper, we address this problem and propose a framework for differential testing of JSON parsers tailored towards discovering semantic discrepancies. To spot these differences automatically, we overcome two challenges: First, we introduce a consensus-based normalization of JSON that enables us to analyze data semantics in absence of a precise specification. Second, we propose a novel mechanism for tracking test coverage across runtime environments, so that confusions between parsers written in C, C++, Rust, Java, and Python can be detected simultaneously. In a comparative analysis of 22 JSON parsers, we uncover various semantic discrepancies, ranging from minor inconsistencies in the representation of numbers and strings to severe confusions in the handling of object keys and values. We illustrate the security impact of these discrepancies in different case studies, echoing recent efforts to enforce a stricter specification for JSON in security applications.