Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One Generation Step For All Files #8

Open
wants to merge 11 commits into
base: dev
Choose a base branch
from

Conversation

LittleCoinCoin
Copy link

[Warning] this pull request is bqsed on the modifications under consideration in #7. Hence, commits f52d6ce, 11982dc, 4b9506c, and 996fe0f do not include modifications directly related to this pull request.

Motivation

Kodgen is currently powerful to generate (at least) one new file for each parsed file, but is impractical to generate (at least) one new file for all parsed files. The need for such generation feature might arise when one wishes to automatically generate a classes that will tally logic related to many other classes defined in different files.
The use case that motivated this pull request is to be able to have a class containing vectors of objects to store and manipulate them.

Goal

Modify (improve?) Kodgen's API to be able to switch between the current (unique) generation scheme that is One-Generate-For-Each-File and a new generation strategy that would be One-Generate-For-All-Files.

[Note] One-Generate should not be understood as "Output One File", but rather as the generation step in its entirety that might output several files depending on the user's code.

Suggested solution

The solution in this pull request relies on the existing API, but slightly modifies the internal execution flow. The proposed changes are based on the observation that preGenerateCode and postGenerateCode names may suggest a different behavior from what is actually implemented. Indeed, both methods appear in the CodeGenUnit::generateCode in the following order:

generateCode(...)
{
  ...
  preGenerateCode(...)
  ...
  initialGenerateCodeInternal(...) --> // different function body
    for each CodeGenerator:
     initialGenerateCode(...)
    <-- // going back to generateCode body
  ...
  foreachCodeGenEntityPair(...)
  ...
 finalGenerateCodeInternal(...) --> // different function body
    for each CodeGenerator:
      finalGenerateCode(...)
    <-- // going back to generateCode body
  ...
  postGenerateCode(...)
}

Note that in the original API documentation of preGenerateCode and postGenerateCode, it says that they are run right before and after foreachCodeGenEntityPair when this description actually matches initialGenerateCode and finalGenerateCode.

In any case, both functions run every time generateCode is called, that is for each parsed file (see CodeGenManager::processFile). Which means preGenerateCode and postGenerateCode are not strictly executed outside a generation step associated to a parsed file, hence are not truly pre and post steps.

Trying to use the functions in a way closer to what their names suggest is helpful to easily access to both One-Generate-For-Each-File and One-Generate-For-All-Files. In fact, the solution moves preGenerateCode and postGenerateCode outside of generateCode, associated with a new enum called EGenerationStrategies to support both the legacy generation scheme One-Generate-For-Each-File, and the new generation scheme One-Generate-For-All-Files.

Major changes to Code Generation Process

Modifications to CodeGenUnit

  • BREAKING CHANGE: createCodeGenEnv, preGenerateCode, and postGenerateCode are public and not protected anymore. See 54dfac8
  • BREAKING CHANGE: preGenerateCode is not taking FileParsingResult anymore. See 3c1c709
    • Because in the case of One-Generate-For-All-Files, the question arises to know which FileParsingResult we could be using in the preGenerateCode function.
    • Instead it FileParsingResult is set in the CodeGenEnv at the top of generateCode.
    • Thus, a way to fix this change for One-Generate-For-Each-File is simply to move the code using FileParsinResult from preGenerateCode to initialGenerateCode.
  • Corresponding declarations in the derived class MacroCodeGenUnit were updated to reflect these changes.

New Generation Strategies

  • Introduced EGenerationStrategies enum to define various generation strategies, including ForceReparseAll, ForceRegenerateAll, OneGenerateForEachFile, and OneGenerateForAllFiles. (Kodgen/Include/Kodgen/CodeGen/EGenerationStrategies.h). See 5ea0e22
    • ForceReparseAll, ForceRegenerateAll are notably used to assume in finer detail the parsing and generation behavior when going over all detected files in CodeGenManager::identifyFilesToProcess. This was previously controlled by a unique boolean value forceRegenerateAll. See 5ea0e22

Modifications to CodeGenManager

See 3c52973 for all bellow:

  • processFile was renamed oneGenerateForEachParsedFile
  • Added oneGenerateForAllParsedFiles to handle the new strategy. It is based upon oneGenerateForEachParsedFile.
    • The distribution of the parsing is also multi-threaded
    • The generation process (reading the parsed code and making std::string before assembling everything in a file), is NOT multi-threaded anymore. Given that we have only one generation step centralized by a unique CodeGenUnit, multi-threading would lead to data race where the std::string in the code generation unique might get appended at the same time by different generation tasks. A solution to resolve the data race was not investigated at all.
  • Updated the run method to pass a EGenerationStrategies argument instead of the original boolean forRegenerateAll that could only influence the identification of source files on which the generation process should be applied.

Demonstration

You can check commit 84caea5

The example project in kodgen was updated to include a demonstration of the usage of the Generation Strategies. In particular, a new set of objects deriving from CodeGenUnit, CodeGenUnitSettings, CodeGenEnvironment, CodeGenModule, and PropertyCodeGen was implemented. These are respectively called DataStateCGU, DataStateCGUS, DataStateCGE, DataStateCGM, and DataPCG. They also serve as a demonstration of how it is possible to build one's own code generation unit and its dependences in addition to the existing MacroCodeGenUnit & Co.

DataStateCGU & Co are used to generate a class DataState (which can be renamed in the settings) that stores a variable number of data. In fact, every class (or struct) marked with kodgen's corresponding macro and that is further marked with the property "Data" will be added to the data state.

"Data" was added to the already existing example classes SomeClass and SomeOtherClass. As a result, the generated data state contains two std::vectors to store instances of these classes (that can be added with a generated DataState::EmplaceBackData), and that can, of course, be accessed with a getter.

However, the content of the data state is note as simple as it could be, simply to exemplify how to complex we can get with kodgen. In particular, the getter is a function pointer in an array one can access thanks to an enum. In addition, the getter and emplace back functions have constexpr if statements to help the compiler.

LittleCoinCoin added 11 commits November 18, 2024 12:51
Major:
- The toml dependency of Kodgen extensively uses the value of macro *__cplusplus* in order to discriminate which version of the *std* is used.
- However, [since Visual Studio 2017 15.7](https://learn.microsoft.com/en-us/cpp/build/reference/zc-cplusplus?view=msvc-170), Microsoft leaves the value of *__cplusplus* equal to 199711L unless the compile option `/Zc:__cplusplus` is used.
- Should `/Zc:__cplusplus` be used, the value of *__cplusplus* will be updated depending on the value of `/std:c++XX` (from /std:c++14 onward)
- Therefore, if we want Cmake instruction `target_compile_features(${KodgenTargetLibrary} PUBLIC cxx_std_17)` to have the expected reach when using MSVC (after 15.7), we MUST add the option `/Zc:__cplusplus`
- Here is the table to check that 1914 is equivalent to VS 2017 15.7 : https://learn.microsoft.com/en-us/cpp/overview/compiler-versions?view=msvc-170#version-macros
Major:
- When user is specifying to use msvc as the compiler for the code generated by Kodgen, `wswhere` is used to check that msvc is indeed available on the user's system.
- In particular, it is searching for the latest x86/x64 build tools indicated by the ID: `Microsoft.VisualStudio.Component.VC.Tools.x86.x64`
- However, in case such component was intalled through _Microsoft BuildTools_ instead of _Visual Studio Community_, _Professional_, or _Entreprise_, the command would not pick up the component as expected. See issue for the reason: microsoft/vswhere#22
- In that case, users must optin by passing the `-products` argument. Considering the goal is to find at least one such MSVC build tools, passing `-products *` seems like the most general method.
Major:
- Current version of vswhere in the project was 2.8
- The latest build released is [version 3.1.7](https://github.com/microsoft/vswhere/releases/tag/3.1.7)
Major:
- `preGenerateCode` and `postGenerateCode` are now supposed to be run outside of `generateCode`
  - this makes sense semantically as we also find function names `initialGenerateCode` and `finalGenerateCode` inside `generateCode` which roles are currently no different.
    - Moreover, the documentation for `preGenerateCode` was saying that "Called just before CodeGenUnit::foreachModuleEntityPair." when this description actually corresponds to `initialGenerateCode` (same for `postGenerateCode`/`finalGenerateCode`)
  - In addition, this will unlock the possibility to run the said functions independently of everything in `generateCode`. Thus, in future commit, allowing to have stuff run only once per CodeGenUnit before and after the `generateCode`
- This has some consequences:
  - `preGenerateCode` and `postGenerateCode` are now public
  - `createCodeGenEnv` also needed to be taken outside of `generateCode` so in order for the same environment to be used for all three generation functions.
  - The override of the corresponding functions in MacroCodeGenUnit were also made public

- This was confirmed to induce NO GENERATION DIFFERENCES on the example project. It is, for now, a refactoring that does not impact the generated code and should therefore be backward compatible.
Major:
- An enum to set high level flags that will orient the way the generation is handled from a very high level point.
  - The formating of the enum follows the one from `EEntityType`
- The usefullness of this enum will reveal in future commits:
  - we will use it to decide whether a file should be generated for every parsed file.
  - or a unique file will be generated from all parsed file.
- In the current state, this enum is used to replace the boolean previously called `forceRegenerateAll` that was eventually passed down to `CodeGenManager::identifyFilesToProcess` through `CodeGenManager::run`

Remark:
- We could add a field `generationStrategies` to the code manager. It could also be part of the settings the user can set in the TOML file
Major:
- We are using the information in the `generationStrategies` to indicate whether we want to follow the legacy On-File-Generated-For-Each-Parsed-File
- Renamed the function name (`processFiles` --> `oneGenerateForEachParsedFile`) to better match the intent.
Major:
- The function which parses all files and runs the generation only once for all
- The parsing follows the same multi-threaded approach as before
- But the generation is single threaded and `preGenerateCode` and `postGenerateCode` are ran only once per CodeGenUnit as their semantic meaning implies.
- Thanks to this, if we keep following the recommendation of the API to use `postGenerateCode` as the place where the files are written (i.e. where the generation is finalized), this will result in generating the files only once per CodeGenUnit.
Major:
- `CodeGenUnit::preGenerationCode` was requiring FileParsingResult as an argument for the sole purpose of setting its value in the CodeGenEnv of the generation unit
- Given the presence of this argument was posing a problem in how we wanted to use `preGenerateCode` in the new function `CodeGenManager::oneGenerateForAllParsedFiles`, we decided to remove it from there and instead put it at the beginning of the implementation of `CodeGenUnit::generateCode`
- This means `preGenerateCode` cannot use information from FileParsingResult anymore, BUT:
  - As noted in previous commits messages, the semantic usage of `preGenerateCode` was redundant with the function `initialGenerateCode`
  - meaning that any user that was using FileParsingResult in `preGenerateCode` can "simply" copy-paste what he was doing inside the body of `initialGenerateCode` with the same effect
  - this is a compromise to also have access to the new function `CodeGenManager::oneGenerateForAllParsedFiles`

Minor:
- The declaration and definitions of `preGenerateCode` were adapted in `MacroCodeGenUnit` to match the changes

Remark:
- Following this modification, we could manage to generate code as expected with `CodeGenManager::oneGenerateForAllParsedFiles` on an example project external to this repo.
  - Example internal to this repo to come in future commits.
Major:
- Nothing big, we are simply relocating existing source files of the example in preparation of the files we are going to add to demonstrate the generation scheme of One-Gen-For-All-Files
Major:
- The purpose of the example is to reuse the existing `SomeClass` and `SomeOtherClass`, and generate another class `DataState` which take the names of these entities for the purpose of storing them in vectors.
  - I made it overly complicated for the purpose of examplifying possible code generation
  - The data vectors can be accessed with a generated enum
  - The data can be emplaced back and retrieved with function pointers to the getters
  - The function pointers are in an array that can be accessed with the enum
  - The getters and emplace functions are templated and `constexpr` tests are used to reduce the code the compiler has to process.
- For the generation, we added a custom `CodeGenUnit` with its settings, environment, one module and one property code gen.
  - The settings include the choice for the namespace, class name, filename and extension.
  - The environment is used to share when an entity was first to be used for generation as the final code contains `if/else if/else statements`. The source file also contains an enum to indicate locations for the generated code.
  - The module controls where to put the generated code thanks to the code location enum in the environment source file. This is copying the system in the `MacroCodeGenUnit`
  - The property code gene is using a new keyword `Data` that we added to mark `SomeClass` and `SomeOtherClass`
  - The generator's `main.cpp` includes the new code gen unit using the strategy `kodgen::EGenerationStrategies::OneGenerateForAllFiles`
- The example `main.cpp` was updated to include the datastate and emplace `SomeClass` and `SomeOtherClass`, and use these.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant