Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

experimental: add test-to-harness conversion logic #495

Merged
merged 4 commits into from
Jul 18, 2024

Conversation

DavidKorczynski
Copy link
Collaborator

@DavidKorczynski DavidKorczynski commented Jul 17, 2024

Adds a fuzz harness heuristic that relies on converting existing tests. At this stage, it's done without relying on FI, we simply (1) find tests files in the target project; (2) read them; (3) for each test file we use a simple prompt to convert it into a harness.

At this stage, it already out-performs on some existing projects, e.g: https://github.com/jkuhlmann/cgltf/blob/master/test/main.c

In this case, we have a harness generated that looks quite nice:

// Heuristic: TestConverterPrompt :: Target: 
#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>

#define CGLTF_IMPLEMENTATION
#include "cgltf.h"

extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
    if (size < 1) {
        return 0;
    }

    cgltf_options options;
	memset(&options, 0, sizeof(cgltf_options));
	cgltf_data* parsed_data = NULL;
	cgltf_result result;

    // Parse input data
    result = cgltf_parse(&options, data, size, &parsed_data);

    if (result == cgltf_result_success) {
        result = cgltf_validate(parsed_data);
    }

    if (result == cgltf_result_success) {
        // Use the parsed data in some way
        // For example, print file type and mesh count
		printf("Type: %u\n", parsed_data->file_type);
		printf("Meshes: %u\n", (unsigned)parsed_data->meshes_count);
    }

    cgltf_free(parsed_data);

    return 0;
}

Ref: #494

@DavidKorczynski
Copy link
Collaborator Author

/gcbrun skip

Signed-off-by: David Korczynski <[email protected]>
@DavidKorczynski
Copy link
Collaborator Author

/gcbrun skip

Copy link
Collaborator

@DonggeLiu DonggeLiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Some nits:

experimental/c-cpp/manager.py Outdated Show resolved Hide resolved
experimental/c-cpp/manager.py Outdated Show resolved Hide resolved
name = 'TestConverterPrompt'

def __init__(self, introspector_report: Dict[str, Any],
all_header_files: List[str], test_dir: str):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: dict[...] and list[...] in lower cases. Same below.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't do, see here: #454 (comment)

experimental/c-cpp/manager.py Outdated Show resolved Hide resolved
experimental/c-cpp/manager.py Show resolved Hide resolved
# was found empirically to be valuable.
macros_defined_in_test = []
for line in test_case.test_content.split('\n'):
if '#define' in line and len(line.split(' ')) == 2:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Maybe exclude commented lines later?

// #define a b
/*
#define a b
*/

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave this for now -- it works well but ultimately I would want some stronger logic for macros (e.g. using IR/AST stuff). Will leave as is and monitor if it shows up as a limitation.

experimental/c-cpp/manager.py Outdated Show resolved Hide resolved
experimental/c-cpp/manager.py Outdated Show resolved Hide resolved
experimental/c-cpp/manager.py Show resolved Hide resolved
@DavidKorczynski
Copy link
Collaborator Author

/gcbrun skip

Signed-off-by: David Korczynski <[email protected]>
@DavidKorczynski DavidKorczynski merged commit 4309def into main Jul 18, 2024
6 checks passed
@DavidKorczynski DavidKorczynski deleted the enable-test-writer branch July 18, 2024 10:34
arthurscchan pushed a commit to arthurscchan/oss-fuzz-gen that referenced this pull request Jul 24, 2024
Adds a fuzz harness heuristic that relies on converting existing tests.
At this stage, it's done without relying on FI, we simply (1) find tests
files in the target project; (2) read them; (3) for each test file we
use a simple prompt to convert it into a harness.

At this stage, it already out-performs on some existing projects, e.g:
https://github.com/jkuhlmann/cgltf/blob/master/test/main.c

In this case, we have a harness generated that looks quite nice:

```c
// Heuristic: TestConverterPrompt :: Target: 
#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>

#define CGLTF_IMPLEMENTATION
#include "cgltf.h"

extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
    if (size < 1) {
        return 0;
    }

    cgltf_options options;
	memset(&options, 0, sizeof(cgltf_options));
	cgltf_data* parsed_data = NULL;
	cgltf_result result;

    // Parse input data
    result = cgltf_parse(&options, data, size, &parsed_data);

    if (result == cgltf_result_success) {
        result = cgltf_validate(parsed_data);
    }

    if (result == cgltf_result_success) {
        // Use the parsed data in some way
        // For example, print file type and mesh count
		printf("Type: %u\n", parsed_data->file_type);
		printf("Meshes: %u\n", (unsigned)parsed_data->meshes_count);
    }

    cgltf_free(parsed_data);

    return 0;
}
```

Ref: google#494

---------

Signed-off-by: David Korczynski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants