Exploring the Potential of LLMs for Code Deobfuscation

David Beste

Grégoire Menguy

Hossein Hajipour

Mario Fritz

Antonio Emanuele Cinà

Sébastien Bardin

Thorsten Holz

Thorsten Eisenhofer & Lea Schönherr

July 10, 2025

Code obfuscation alters software code to conceal its logic while retaining functionality, aiding intellectual property protection but hindering security audits and malware analysis. To address this, automated deobfuscation techniques have been developed, though existing approaches remain constrained by limited scope and specificity. Motivated by these challenges, this paper explores a novel approach for code deobfuscation based on Large Language Models (LLMs). First, we investigate the general capabilities of LLMs in reducing code complexity by choosing five different source-to-source obfuscation methods. Despite challenges regarding semantical correctness, our findings indicate that LLMs can be very effective in this task. Building on this, we fine-tune two versatile models capable of simplifying code obfuscated through up to seven different chained obfuscation transformations while consistently outperforming deobfuscation based on compiler optimizations and general-purpose LLMs. Our best model demonstrates an average Halstead metric program length reduction of 89.21% for our most challenging scenario. Finally, we conduct a memorization test to assess if performance stems from memorized code rather than true deobfuscation capabilities, which our models pass.

https://doi.org/10.1007/978-3-031-97620-9_15

Exploring the Potential of LLMs for Code Deobfuscation

BIFOLD AUTHORS