Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect fuzzing issues by considering past results #2054

Open
phi-go opened this issue Feb 4, 2025 · 4 comments
Open

Detect fuzzing issues by considering past results #2054

phi-go opened this issue Feb 4, 2025 · 4 comments

Comments

@phi-go
Copy link
Contributor

phi-go commented Feb 4, 2025

Hello, as part of some research we analyzed fuzzer performance degradation by looking at the reasons why fuzzing coverage reduces for C/C++ projects in OSS-Fuzz. We found that there are several types of issues that are easier to detect by comparing to past reports.

I would be happy to implement these metrics if you are interested.

  • Detecting coverage drops would be a generic way to detect degradation, this is already discussed here: idea: treat a major coverage drop an issue! google/oss-fuzz#11398. Here a threshold would need to be decided, maybe percentage or absolute number of lines.
  • A common reason for large coverage drops is the vendoring of third-party library code, though, sometimes also project specific code. If you agree that library code should not be included in the coverage measurement, large changes should cause an alert and be ignored. See grpc-httpjson-transcoding as an example, which is by itself a few hundred lines of code with close to 100% coverage but vendored 100k lines of library code.
  • Compare the fuzz targets over time. It sometimes happens that a project starts to have a partial build failure that only stops one (or few) fuzz target from building, while not necessarily causing a build failure issue to be created for the project. For example this happened with curl: idea: treat a major coverage drop an issue! google/oss-fuzz#11398 (comment)
  • The number of corpus entries is normally quite stable. But due to the way coverage is collected can fluctuate and drop to a fraction of the real size: Reported coverage results do not match corpus google/oss-fuzz#12986 and Understanding inconsistent coverage reports google/oss-fuzz#11935. So this could be detected by looking at past corpus sizes. Though, if I understand correctly the seed corpus is combined across fuzz targets? Alternatively, a expected number of corpus entries for covered code branches/lines could be decided. For example covering 10k lines with five corpus entries does not seem like effective fuzzing.

This is also related to diffing runs: #734

I can also provide more examples if you want, just wanted to keep it short.

@DavidKorczynski
Copy link
Contributor

I like these ideas a lot and would be more than happy to review PRs.

Regarding third-party code, then my personal position is that any third-party code in your target is from a security standpoint the same as your own code, as longs as it's reachable/triggerable from untrusted input. So I think it's a bit more nuanced than just excluding third-party code.

In general I like the direction of these ideas and would happy to land them. I think these would require most changes to be done in the webapp rather than core, but am happy in either case to review and get PRs landed.

@phi-go
Copy link
Contributor Author

phi-go commented Feb 11, 2025

Happy to hear you are interested. It will take a bit before I have some real results as I'm still getting familiar with the code.

Regarding third-party code, then my personal position is that any third-party code in your target is from a security standpoint the same as your own code, as longs as it's reachable/triggerable from untrusted input. So I think it's a bit more nuanced than just excluding third-party code.

I understand your point to be that, third party code included in the project can have the same impact on security as project code. I definitely agree, however, what I am not quite sure about is who is responsible for testing/fuzzing the third-party code. So maybe we can discuss this a bit.

Thinking about this some more, we could differentiate between:

  • (1.) Code that is actually vendored, so copied into the repo
  • Code that is included only as a dependency, this code can be split in two again:
    • (2.) a dependency that is already fuzzed separately
    • (3.) a dependency that is not fuzzed separately

I would only exclude code coverage for category 2. I guess the alternative would be to duplicate the fuzzer harnesses for this dependency, which seem wasteful to me. There is however the argument that the project might use the library code in a specific way that is not already tested for.

For me the big reason to exclude code coverage of these dependencies is to make the coverage metric more meaningful. Coming back to the grpc-httpjson-transcoding example, I actually made a mistake and the code is not vendored but should be of category 2. So if the "real" coverage of this project drops we would not really know, a current introspector report also seems to suggest that there is hardly any fuzzing going on. Is this just because the runtime coverage is higher than static reachable code?

@DavidKorczynski
Copy link
Contributor

however, what I am not quite sure about is who is responsible for testing/fuzzing the third-party code. So maybe we can discuss this a bit.

This is more of a policy question and is not a technical security question as such. Security issues are security issues independently of who wrote/maintain the code. To this end, it depends on what your goal is with your efforts IMO.

I think a great feature would be able to show coverage of various categories. For example, currently we have 1 code coverage number (actually a bit more, since we have all harnesses combined and code coverage of each harness individually). However, we could easily imagine reporting multiple coverage numbers, e.g. for all the types of dependencies you mention above. I think that would be very useful -- in short, I would not exclude something, but try and give more refined data about the coverage, e.g. multiple different code coverage statistics.

Going with the grpc-httpjson-transcoding example, then it would be great to get some form of "the code you maintain has (x) cocverage, the dependencies have (y) code coevrage" etc. It would further be cool if we can draw connections with amongst projects, e.g. "libpng is being fuzzed across 50 different OSS-Fuzz projects, project x has x1% coverage of libpng, project y has y1% coverage of libpng" etc. etc.

Note that the grpc-httpjson-transcoding example could almost be classified as a bazel build system problem rather than a "grpc-httpjson-transcoding problem". For example, Envoy, which is also compiled with bazel has almost a million lines of code in it's dependencies, which are all included in the coverage report: https://storage.googleapis.com/oss-fuzz-coverage/envoy/reports/20240919/linux/proc/self/cwd/report.html

Note that maintainers can at will exclude code coverage from reports if they so desire., the grpc-httpjson-transcoding project. Maintainers can exclude instrumentation of parts (including in coverage only) and also excludes paths e.g. https://github.com/google/oss-fuzz/blob/2772bd6f5d11a44d068b86b8ce7732ab9aff6eb2/projects/libsndfile/project.yaml#L11 So already at this stage maintainers can have flexibility to adjust code coverage reports based on what they like to see code coverage of. We could naturally look at automation here nonetheless.

@phi-go
Copy link
Contributor Author

phi-go commented Feb 13, 2025

This is more of a policy question and is not a technical security question as such. Security issues are security issues independently of who wrote/maintain the code. To this end, it depends on what your goal is with your efforts IMO.

My goal in general is to make it more visible to maintainers when something might be wrong with the fuzzing setup, to avoid situations where the setup is ineffective but there is no feedback. Which is why I wanted to make coverage more meaningful, as in focus on code coverage the maintainers can actually improve. But I think your suggestion of splitting coverage solves this in a better way.

I think a great feature would be able to show coverage of various categories.

Agreed, this is a better approach than my suggestion. Though, I expect it will complicate things in some places. Some things that should be decided:

  • Should we warn of coverage drops for each dependency?
  • How should this be handled for the bounty program?

Going with the grpc-httpjson-transcoding example, then it would be great to get some form of "the code you maintain has (x) cocverage, the dependencies have (y) code coevrage" etc. It would further be cool if we can draw connections with amongst projects, e.g. "libpng is being fuzzed across 50 different OSS-Fuzz projects, project x has x1% coverage of libpng, project y has y1% coverage of libpng" etc. etc.

Yeah, this would be nice. Do you already have a plan on how to identify that the dependency is the same across projects? If not, I would probably start by looking at common ways code is structured, such as for bazel for example.

Also just to mention it, the dependencies might have different versions or have been patched for the project. So I think we should not aggregate over these reports (you did not suggest this, I just wanted to mention it).

Note that maintainers can at will exclude code coverage from reports if they so desire.

Yeah, that is one reason why I suggested this originally.

We could naturally look at automation here nonetheless.

Automating this seems a tad dangerous to me. When would you want to automate excluding code coverage?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants