--- license: mit --- SK²Decompile: LLM-based Two-Phase Binary Decompilation from Skeleton to Skin SK²Decompile is a novel two-phase framework for binary decompilation using Large Language Models (LLMs). Our approach decomposes the complex decompilation task into two manageable phases: Phase 1 Structure Recovery (Skeleton): Transform binary/pseudo-code into obfuscated intermediate representations (current model) Phase 2 Identifier Naming (Skin): Generate human-readable source code with meaningful identifiers 🤗 [HF Link](https://huggingface.co/LLM4Binary/sk2decompile-ident-6.7) Usage: 0. Install `vllm` and `transformers` via `pip`; install `clang-format` via `apt`. 1. Prepare a Linux-x64 executable file (ELF). 2. Use IDA to decompile it (you can also simply use this website: [https://dogbolt.org/](https://dogbolt.org/)). 3. Convert the data into the corresponding format ([https://huggingface.co/LLM4Binary/sk2decompile-struct-6.7b/blob/main/reverse_sample.json](https://huggingface.co/LLM4Binary/sk2decompile-struct-6.7b/blob/main/reverse_sample.json)) ```bash python normalize_pseudo.py --input_json reverse_sample.json --output_json reverse_sample.json ``` 4. Run inference: ```bash python sk2decompile.py --dataset_path reverse_sample.json \ --model_path LLM4Binary/sk2decompile-struct-6.7b \ --recover_model_path LLM4Binary/sk2decompile-ident-6.7b ``` **Project overview:** [https://github.com/albertan017/LLM4Decompile/tree/main/sk2decompile](https://github.com/albertan017/LLM4Decompile/tree/main/sk2decompile) **Notes:** * IDA decompilation results should be preprocessed before inference. * Use `vllm` to recover function structure (`sk2decompile-struct`) and variable names (`sk2decompile-ident`) step by step. * Training was done on C language Linux-x64 code with IDA pseudocode; performance may degrade for other languages or architectures.