Hopper is a tool for generating fuzzing test cases for libraries automatically using interpretative fuzzing. It transforms the problem of library fuzzing into the problem of interpreter fuzzing, enabling exploration of a vast range of API usages for library fuzzing out of the box. Some key features of Hopper include:
- Interpretative API invoking without any fuzz driver.
- Type-aware mutation for arguments.
- Automatic intra- and inter-API constraints learning.
- Binary instrumentation support.
To learn more about Hopper, check out our paper at CCS '23.
- Linux-amd64 (Tested on Ubuntu 20.04 and Debian Buster)
- Rust stable (>= 1.60), can be obtained using rustup
- Clang (>= 5.0, Install Clang), rust-bindgen leverages libclang to preprocess, parse, and type check C and C++ header files.
./build.sh
The script will create a install
directory in hopper's root directory, then you can use hopper
.
To use the command anywhere, you can set the project directory in your PATH variable.
You can choose to use the Dockerfile, which build the requirements and Hopper.
docker build -t hopper ./
docker run --name hopper_dev --privileged -v /path-to-lib:/fuzz -it --rm hopper /bin/bash
Take csjon
for example (More examples).
hopper compile --header ./cJSON.h --library ./libcjson.so --output output
Use hopper compile --help
to see detailed usage. If the compiling reports errors about header file, refer to the usage of rust-bindgen, which we used for parsing header file.
You may wrap the header file with the missing definitions.
Hopper uses E9Patch to instrument binaries by default. Optionally, you can use LLVM for source code instrumentation.
After running compile
, you will find that it generates the following files in the output directory:
bin/hopper-fuzzer
: generates inputs, maintains states, and usesharness
to execute the inputs.bin/hopper-harness
: executes the inputs.bin/hopper-translate
: translates inputs to C source code.bin/hopper-generator
: replays the generate process.bin/hopper-sanitizer
: sanitize and minimize crashes.
- If there are multiple header files, you can create a new header file, and include all of them.
- If header files are compiled depending on specific environment variables. You can set it by :
BINDGEN_EXTRA_CLANG_ARGS
. - If the header file includes API functions that you do not want to test, use
--func-pattern
to filter them while running the fuzzer.
HOPPER_MAP_SIZE_POW2
: controls the size of coverage path. The default value is 16, and it should be in the range of [16, 20]. e.g.HOPPER_MAP_SIZE_POW2=18
.HOPPER_INST_RATIO
: controls how likely a block will be chosen for instrumentation. The default value is 100, and it should be in the range of (0, 100]. e.g.HOPPER_INST_RATIO=75
.HOPPER_INCLUDE_SEARCH_PATH
: includes the search path of file in header files. e.g.HOPPER_INCLUDE_SEARCH_PATH=../
.HOPPER_FUNC_BLACKLIST
: includes function blacklists that hopper won't compile.bindgen
will not generate code for the functions. e.g.HOPPER_FUNC_BLACKLIST=f1,f2
.HOPPER_TYPE_BLACKLIST
: includes type blacklists that hopper won't compile.bindgen
will not generate code for the types. e.g.HOPPER_TYPE_BLACKLIST=type1,type2
.HOPPER_ITEM_BLACKLIST
: includes item(constants/variables) blacklists that hopper won't compile.bindgen
will not generate code for the items. e.g.HOPPER_ITEM_BLACKLIST=IPPORT_RESERVED
HOPPER_CUSTOM_OPAQUE_LIST
: includes custom opaque types we defined. e.g.HOPPER_CUSTOM_OPAQUE_LIST=type1
.HOPPER_FUZZ_INLINE_FUNCTION
: includes inline function as our targets, see FAQ in bindgen.
-
You can set the arguments and environment variables for compiling and running in a configuration file named
hopper.config
, seeexamples/*
for details. -
Reduce density: If density is larger than 20%, the IDs of edges are likely to have hash-collisions. We can a) increase
HOPPER_MAP_SIZE_POW2
or b) reduceHOPPER_INST_RATIO
. -
Multiple libraries: (1) merge the archives into one shared library, e.g.
gcc -shared -o c.so -Wl,--whole-archive a.a b.a -Wl,--no-whole-archive
; (2) pass all of them into hopper compiler by--library a.so b.so
.
hopper fuzz output --func-pattern cJSON_*
Use hopper fuzz output --help
to see detailed usage.
After running fuzz
, it will generate following directories.
queue
: generated normal inputs.hangs
: generated timeout inputs.crashes
: generated crash inputs.misc
: store some temporal files or stats.
DISABLE_CALL_DET
: disables call's deterministic mutating.DISABLE_GEN_FAIL
: disables generating programs for functions that have been failed to invoke.HOPPER_SEED_DIR
: provides seeds for byte-like arguments (default: output/seeds if t exists).HOPPER_DICT
: provides dictionary for byte-like arguments. The grammar is the same as AFL's.HOPPER_API_INSENSITIVE_COV
: disables API-sensitive branch counting.HOPPER_FAST_EXECUTE_LOOP
: number of programs executed (in a loop) for each fork, set as 0 or 1 to break the loop. e.g.HOPPER_FAST_EXECUTE_LOOP=10
.
Set system core dumps as AFL (on the host if you execute Hopper in a Docker container).
echo core | sudo tee /proc/sys/kernel/core_pattern
Hopper generates inputs for all functions that appear in both headers and libraries by default. However, there are two ways to filter functions in Hopper: excluding functions or including functions. This way, it can be focus on interesting functions.
hopper fuzz output --func-pattern @cJSON_parse,!cJSON_InitHook,cJSON_*
- The pattern can be a function name, e.g.
cJSON_parse
, or a simple pattern, e.g.cJSON_*
. - If you have multiple patterns, use
,
to join them, e.gcJSON_*,HTTP_*
. - You can use
@
prefix to limit the fuzzer to only fuzz specific function, while the others can be candidates that provide values for fields or arguments, e.g.@cJSON_parse,cJSON_*
. !
is used as prefix for excluding some specific functions, e.g!cJSON_InitHook,cJSON_*
.
The patterns can be defined in the file passed by --custom-rules
.
// hopper fuzz output --custom-rules path-to-file
func_target cJSON_parse
func_exclude cJSON_InitHook
func_include cJSON_*,HTTP_*
Hopper infers both intra- and inter-API constraints to invoking the APIs correctly.
The constraints are written in output/misc/constraint.config
. You can remove the file to reset the constraints.
Additionally, users can define a file that describes custom constraints for API invocations, which is passed by --custom-rules
. The constraints will override the inferred ones.
// hopper fuzz output --custom-rules path-to-file
// Grammar:
// func, type : prefix for adding a rule for function or type
// $[0-9]+ : function's i-th argument, or index in array
// [a-zA-Z_]+ : object field
// 0, 128 .. : integer constants
// "xxxx" : string constants
// methods : $len, $range, $null, $non_null, $need_init, $read_file, $write_file, $ret_from, $cast_from, $use, $arr_len, $opaque, $len_factors
// others : pointer(&) , option(?), e.g &.$0.len, `len` field in the pointer's first element
//
// Set one argument in a function to be specific constant
func test_add[$0] = 128
// One argument must be the length of another one
func test_arr[$1] = $len($0)
// Or one field must be the length of another field
func test_arr[$0][len] = $len([$0][name])
// One argument must be in a certain range
func test_arr[$1] = $range(0, $len($0))
// Argument should be non-null
func test_non_null[$0] = $non_null
// Argument should be null
func test_null[$0] = $null
// Argument should be specific string
func test_magic[$0] = "magic"
// Argument should be a file and the file will be read
func test_path[$0] = $read_file
// Argument should be use the value of specific function's return
func test_use[$0] = $ret_from(test_create)
// Argument should be specific type for void pointer. The type should start with *mut or *const.
func test_void[$0] = $cast_from(*mut u8)
// The array suppose has a minimal array length
func test_void[$0][&] = $arr_len(256)
// The array's length is formed by the factors
func fread[$0][&] = $len_factors(1, $2)
// Or
func gzfread[$0][&] = $len_factors($1, $2)
// Field in argument should be specific constant
func test_field[$0][len] = 128
// Deeper fields
func test_field[$0][&.elements.$0] = 128
// One field `len` in a type must be the length of another field `p`
type ArrayWrap[len] = $len(p)
// One nested union `inner_union` in a type must be set to `member2`
type ComplicatedStruct[inner_union] = $use(member2)
// Type is opaque that used as an opaque pointer
type Partial = $opaque
// A type should be init with specific function
type Partial = $init_with(test_init, 0)
// ctx: set context for specific function
// Add a context for function
ctx test_use[$0] <- test_init
// Add implicit context
ctx test_use[*] <- test_init
// Add optional context that preferred to use
ctx test_use[$0] <- test_init ?
// Add forbidden context
ctx test_use[$0] <- ! test_init
// alias: alias types across different function
alias handleA <- useA($0),createA($ret),freeA($0)
// assert: adding specific assertions for calls
assert test_one == 1
assert test_non_zero != 0
If there is a seeds
directory (Set by HOPPER_SEED_DIR
), Hopper will try to read files inside it and uses them as the seeds for bytes arguments (e.g. char*). Also, you can indicate the seeds for specific argument via its parameter names, e.g make the subdirectory as @buf
for parameter whose name is buf
.
Hopper uses Rust's log crate to print log information. The default log level is INFO
. If you want to print all logging information (DEBUG
and TRACE
), you can set the environment LOG_TYPE
during running Hopper, e.g. LOG_TYPE=trace ./hopper
.
The detailed logging will be written at output/fuzzer_r*.log
and output/harness_r*.log
.
Hopper can reproduce the execution of programs at output directories.
hopper-harness
can parse and explain the inputs by Hopper's runtime. It will print the internal states during execution in detail.
./bin/hopper-harness ./queue/id_000000
hopper-translate
can translate the input to C source code. The C files can be a witness for reporting issues.
./bin/hopper-translate --input ./queue/id_000000 --header path-to/xx.h --output test.c
# then compile it with specific library
gcc -I/path-to-head -L/path-to-lib -l:libcjson.so test.c -o test
hopper-generator
is able to replay input generation except execution. You can use it to analyse how the input was generated or mutated.
./bin/hopper-generator ./queue/id_000000
hopper-sanitizer
can minimize and verify the crashes generated by Hopper. It excludes crashes that violate constraints and de-duplicate crashes according to call stacks.
./bin/hopper-sanitizer
- Run all testcases
RUST_BACKTRACE=1 cargo test -- --nocapture
- Compile the libraries' source code with LLVM source-based code sanitizer. You should set the compiling flags, e.g.
export CFLAGS="${CFLAGS:-} -fprofile-instr-generate -fcoverage-mapping -gline-tables-only -g"
make
- Compile the libraries with
cov
instrumentation mode. e.g.
hopper compile --instrument cov --header ./cJSON.h --library ./libcjson_cov.so --output output_cov
- Run the interpreter with all generated seed inputs (SEED_DIR).
# run hopper and use llvm-cov to compute the coverage.
SEED_DIR=./output/queue hopper cov output_cov
We have listed some tasks in Roadmap. If you are interested, please feel free to discuss with us and contribute your code.
- Zero
cargo check
warning - Zero
cargo clippy
warning - Zero
FAILED
incargo test
- Try to write tests for your code
perf record --call-graph=dwarf ./bin/hopper-fuzzer
# use flamegraph directly
perf script | stackcollapse-perf.pl | rust-unmangle | flamegraph.pl > flame.svg
# use inferno
perf script | inferno-collapse-perf | inferno-flamegraph > flamegraph.svg
perf will produce huge intermediate data for analysis, so do not run fuzzer more than 2 minutes.