Skip to content

Commit

Permalink
Introduce a source generator and end-to-end compile benchmarks (carbo…
Browse files Browse the repository at this point in the history
…n-language#4124)

The big addition here is a very, very rough and very early skeleton of a
source code generator framework. This builds upon the lexers identifier
synthesis logic, improving on its framework and wiring it up with the
most rudimentary of source file generation. This is just enough to
roughly replicate my "big API file" source code benchmarks.

The source generation works *very* hard to both vary the structure and
content of the source as much as possible while ensuring the same
*total* amount of each construct is in use, from bytes in identifiers to
line breaks, parameters, etc. This lets us generate randomly structure
inputs that should consistently take the exact same amount of total work
to compile.

The complex identifier synthesis logic from the lexer's benchmark is
moved over here and the lexer uses APIs in the source generator for
identifiers. The other source synthesis in the lexer's benchmark isn't
yet moved over, but should likely be slowly absorbed here as it can be
refactored into a more principled and re-usable form. Some bits may stay
of course if they're just too lexer-specific.

Next, this adds a simple end-to-end compile benchmark for the driver
that directly and much more clearly reproduces all the measurements I've
done manually up until now. It should also be easy to extend to more
patterns over time as we add support to the source generator to produce
those patterns.

Last but not least, I've added a tiny CLI to the source generator so
that you can generate source code manually. This is especially nice for
generating demo source code to actually run through the driver or look
at in an editor. The CLI can also generate C++ source code which lets us
do some minimal comparative benchmarking between Carbon and C++/Clang.

There are huge number of TODOs in the source generation framework. This
is going to be a large ongoing effort I suspect.

There are also a bunch of rough edges I've left to try and get this out
for review sooner. I've left TODOs for refactorings that really need to
be done here, but hoping these can maybe be follow-ups. If not, please
flag and I'll try to layer them on here.

Sample compile benchmark output, nicely showing where we are w.r.t. our
goal speeds (2x behind on lex and check, 5x on parse) at least on a
recent AMD server CPU:
```
------------------------------------------------------------------------------------------------------
Benchmark                                                 Time             CPU   Iterations      Lines
------------------------------------------------------------------------------------------------------
BM_CompileAPIFileDenseDecls<Phase::Lex>/256           29420 ns        29419 ns        22860 6.62847M/s
BM_CompileAPIFileDenseDecls<Phase::Lex>/1024         146130 ns       146128 ns         4840 6.69959M/s
BM_CompileAPIFileDenseDecls<Phase::Lex>/4096         601584 ns       601577 ns         1020 6.69573M/s
BM_CompileAPIFileDenseDecls<Phase::Lex>/16384       2547578 ns      2547313 ns          280   6.404M/s
BM_CompileAPIFileDenseDecls<Phase::Lex>/65536      10816591 ns     10816389 ns           80 6.05193M/s
BM_CompileAPIFileDenseDecls<Phase::Lex>/262144     52191320 ns     52189828 ns           20 5.02261M/s
BM_CompileAPIFileDenseDecls<Phase::Parse>/256        101706 ns       101698 ns         6900 1.91745M/s
BM_CompileAPIFileDenseDecls<Phase::Parse>/1024       512161 ns       512162 ns         1380  1.9115M/s
BM_CompileAPIFileDenseDecls<Phase::Parse>/4096      2078426 ns      2078430 ns          340   1.938M/s
BM_CompileAPIFileDenseDecls<Phase::Parse>/16384     8795786 ns      8795583 ns          100 1.85468M/s
BM_CompileAPIFileDenseDecls<Phase::Parse>/65536    35073596 ns     35072973 ns           20 1.86639M/s
BM_CompileAPIFileDenseDecls<Phase::Parse>/262144  151100688 ns    151097370 ns           20 1.73483M/s
BM_CompileAPIFileDenseDecls<Phase::Check>/256        957059 ns       957049 ns          740 203.751k/s
BM_CompileAPIFileDenseDecls<Phase::Check>/1024      1956134 ns      1955985 ns          360 500.515k/s
BM_CompileAPIFileDenseDecls<Phase::Check>/4096      5797864 ns      5797417 ns          120 694.792k/s
BM_CompileAPIFileDenseDecls<Phase::Check>/16384    21219608 ns     21217584 ns           40 768.843k/s
BM_CompileAPIFileDenseDecls<Phase::Check>/65536    96311116 ns     96302334 ns           20 679.734k/s
BM_CompileAPIFileDenseDecls<Phase::Check>/262144  371637963 ns    371609964 ns           20 705.387k/s
```

Lest someone think this is *bad*, the fact that we're already within 2x
of our rather audacious goals makes me quite happy. =D

---------

Co-authored-by: Jon Ross-Perkins <[email protected]>
Co-authored-by: Richard Smith <[email protected]>
  • Loading branch information
3 people authored Aug 13, 2024
1 parent e62973a commit a9c815c
Show file tree
Hide file tree
Showing 13 changed files with 1,562 additions and 181 deletions.
3 changes: 3 additions & 0 deletions common/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,10 @@ cc_library(
cc_library(
name = "benchmark_main",
srcs = ["benchmark_main.cpp"],
hdrs = ["benchmark_main.h"],
deps = [
":check",
":exe_path",
":init_llvm",
"@abseil-cpp//absl/flags:parse",
"@google_benchmark//:benchmark",
Expand Down
24 changes: 24 additions & 0 deletions common/benchmark_main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,41 @@
// Exceptions. See /LICENSE for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

#include "common/benchmark_main.h"

#include <benchmark/benchmark.h>

#include <string>

#include "absl/flags/parse.h"
#include "common/check.h"
#include "common/exe_path.h"
#include "common/init_llvm.h"
#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/StringRef.h"

static bool after_main = false;
static llvm::StringRef exe_path;

namespace Carbon::Testing {

auto GetBenchmarkExePath() -> llvm::StringRef {
CARBON_CHECK(after_main)
<< "Must not query the executable path until after `main` is entered!";
return exe_path;
}

} // namespace Carbon::Testing

// TODO: Refactor this to share code with `gtest_main.cpp`.
auto main(int orig_argc, char** orig_argv) -> int {
// Do LLVM's initialization first, this will also transform UTF-16 to UTF-8.
Carbon::InitLLVM init_llvm(orig_argc, orig_argv);

std::string exe_path_storage = Carbon::FindExecutablePath(orig_argv[0]);
exe_path = exe_path_storage;
after_main = true;

// Inject a flag to override the defaults for benchmarks. This can still be
// disabled by user arguments.
llvm::SmallVector<char*> injected_argv_storage(orig_argv,
Expand Down
22 changes: 22 additions & 0 deletions common/benchmark_main.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
// Part of the Carbon Language project, under the Apache License v2.0 with LLVM
// Exceptions. See /LICENSE for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

#ifndef CARBON_COMMON_BENCHMARK_MAIN_H_
#define CARBON_COMMON_BENCHMARK_MAIN_H_

#include "llvm/ADT/StringRef.h"

// When using the Carbon `main` function for benchmarks, we export some extra
// information about the test binary that can be accessed with this header.
//
// TODO: Refactor this to share code with `gtest_main.h`.

namespace Carbon::Testing {

// Returns the executable path of the benchmark binary.
auto GetBenchmarkExePath() -> llvm::StringRef;

} // namespace Carbon::Testing

#endif // CARBON_COMMON_BENCHMARK_MAIN_H_
45 changes: 44 additions & 1 deletion testing/base/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# Trivial, single-file testing libraries. More complex libraries should get
# their own directory.

load("@rules_cc//cc:defs.bzl", "cc_library", "cc_test")
load("@rules_cc//cc:defs.bzl", "cc_binary", "cc_library", "cc_test")

package(default_visibility = ["//visibility:public"])

Expand Down Expand Up @@ -40,6 +40,49 @@ cc_test(
],
)

cc_library(
name = "source_gen_lib",
testonly = 1,
srcs = ["source_gen.cpp"],
hdrs = ["source_gen.h"],
deps = [
"//common:check",
"//common:map",
"//common:set",
"//toolchain/lex:token_kind",
"@abseil-cpp//absl/random",
"@llvm-project//llvm:Support",
],
)

cc_test(
name = "source_gen_test",
size = "small",
srcs = ["source_gen_test.cpp"],
deps = [
":gtest_main",
":source_gen_lib",
"//common:set",
"//toolchain/driver",
"@googletest//:gtest",
"@llvm-project//llvm:Support",
],
)

cc_binary(
name = "source_gen",
testonly = 1,
srcs = ["source_gen_main.cpp"],
deps = [
":source_gen_lib",
"//common:bazel_working_dir",
"//common:command_line",
"//common:init_llvm",
"//common:ostream",
"@llvm-project//llvm:Support",
],
)

cc_library(
name = "test_raw_ostream",
testonly = 1,
Expand Down
Loading

0 comments on commit a9c815c

Please sign in to comment.