Banner Banner

BIFOLD researchers present four papers at ASIACCS 2024

The BIFOLD research group Machine Learning and Security, led by Prof. Konrad Rieck, will present four papers at the upcoming ACM ASIA Conference on Computer and Communications Security. The group focuses on fundamental research at the intersection of computer security and machine learning.

The ACM Asia Conference on Computer and Communications Security (ASIACCS) is an annual event by the ACM Special Interest Group on Security, Audit, and Control (SIGSAC). With attendees including information security researchers, practitioners, developers, and users from around the world, the conference provides a platform to explore innovative ideas and findings, facilitate intellectual discussions, and establish itself as a prominent research conference.

Below are the four BIFOLD research papers, along with their respective abstracts.

SoK: Where to Fuzz? Assessing Target Selection Methods in Directed Fuzzing 
Abstract: A common paradigm for improving fuzzing performance is to focus on selected regions of a program rather than its entirety. While previous work has largely explored how these locations can be reached, their selection, that is, the where, has received little attention so far. In this paper, we fill this gap and present the first comprehensive analysis of target selection methods for fuzzing. To this end, we examine papers from leading security and software engineering conferences, identifying prevalent methods for choosing targets. By modeling these methods as general scoring functions, we are able to compare and measure their efficacy on a corpus of more than 1,600 crashes from the OSS-Fuzz project. Our analysis provides new insights for target selection in practice: First, we find that simple software metrics significantly outperform other methods, including common heuristics used in directed fuzzing, such as recently modified code or locations with sanitizer instrumentation. Next to this, we identify language models as a promising choice for target selection. In summary, our work offers a new perspective on directed fuzzing, emphasizing the role of target selection as an orthogonal dimension to improve performance.
Authors: Felix Weissberg, Jonas Möller, Tom Ganz, Erik Imgrund, Lukas Pirch, Lukas Seidel, Moritz Schloegel, Thorsten Eisenhofer, Konrad Rieck

On the Role of Pre-trained Embeddings in Binary Code Analysis
Abstract: Deep learning has enabled remarkable progress in binary code analysis. In particular, pre-trained embeddings of assembly code have become a gold standard for solving analysis tasks, such as measuring code similarity or recognizing functions. These embeddings are capable of learning a vector representation from unlabeled code. In contrast to natural language processing, however, label information is not scarce for many tasks in binary code analysis. For example, labeled training data for function boundaries, optimization levels, and argument types can be easily derived from debug information provided by a compiler. Consequently, the main motivation of embeddings does not transfer directly to binary code analysis.
In this paper, we explore the role of pre-trained embeddings from a critical perspective. To this end, we systematically evaluate recent embeddings for assembly code on five downstream tasks using a corpus of 1.2 million functions from the Debian distribution. We observe that several embeddings perform similarly when sufficient labeled data is available, and that differences reported in prior work are hardly noticeable. Surprisingly, we find that end-to-end learning without pre-training performs best on average, which calls into question the need for specialized embeddings. By varying the amount of labeled data, we eventually derive guidelines for when embeddings offer advantages and when end-to-end learning is preferable for binary code analysis.
Authors: Alwin Maier, Felix Weißberg, Konrad Rieck

Cross-Language Differential Testing of JSON Parsers
Abstract: JSON is a widely used format for representing data on the Internet. Unfortunately, the format is imprecisely specified, which poses the risk of confusion and ambiguity when processing sensitive data. While previous work has focused on manual analysis of parsers, an automatic analysis of the interplay of multiple parsers resulting from this imprecision has received little attention so far. In this paper, we address this problem and propose a framework for differential testing of JSON parsers tailored towards discovering semantic discrepancies. To spot these differences automatically, we overcome two challenges: First, we introduce a consensus-based normalization of JSON that enables us to analyze data semantics in absence of a precise specification. Second, we propose a novel mechanism for tracking test coverage across runtime environments, so that confusions between parsers written in C, C++, Rust, Java, and Python can be detected simultaneously. In a comparative analysis of 22 JSON parsers, we uncover various semantic discrepancies, ranging from minor inconsistencies in the representation of numbers and strings to severe confusions in the handling of object keys and values. We illustrate the security impact of these discrepancies in different case studies, echoing recent efforts to enforce a stricter specification for JSON in security applications.
Authors: Jonas Möller, Felix Weißberg, Lukas Pirch, Thorsten Eisenhofer, Konrad Rieck

Battle of Wits: To What Extent Can Fraudsters Disguise Their Tracks in International bypass Fraud?*
Abstract: International bypass fraud, also known as SIMBox fraud, involves diverting international cellular voice traffic from regulated routes and rerouting it as local calls in the destination country. It has significantly affected cellular networks worldwide, generating $3.11 Billion of losses annually and threats to national security. Yet, SIMBox fraud remains an ongoing challenge, eluding operators detection due to the continual refinement of fraudulent behavior that is often overlooked in the design and validation of detection methods. This paper introduces a game-based formalization of the SIMBox fraud problem, delineating two key players-the adversary and the investigator-along with their strategies and a set of metrics gauging their efficacy in the game. We develop a practical framework for the empirical evaluation of the fraud, incorporating current adversary and investigator capabilities and accommodating seamless adaptation to the evolving nature of fraud. Our analysis identifies up to 345,600,000 possible adversary strategies from in-market SIMBox appliances functionalities. The most sophisticated strategies decisively outperform the most efficient existing detection methods, underscoring the literature's lack of awareness of fraud capabilities. Furthermore, we uncover fraud vulnerabilities and discuss their implications for enhancing future detection strategies in practice. In essence, our work introduces a novel paradigm in SIMBox fraud detection that adapts seamlessly to the ever-changing landscape of fraud, treating it as a fundamental aspect of the detection strategy.
Authors: Anne Josiane Kouam, Aline Carneiro Viana, Alain Tchana
* Dr. Anne Josiane Kouam is a postdoctoral researcher in the machine learning and security group. The research of the paper “Battle of Wits: To What Extent Can Fraudsters Disguise Their Tracks in International bypass Fraud?” was not done within BIFOLD.