Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(source-nodes): optimize node sourcing and transformation for improved performance #703

Conversation

demirgazetic
Copy link
Contributor

@demirgazetic demirgazetic commented Jul 9, 2024

Changes made:

  • Implement concurrent fetching of pages using Promise.all to speed up data retrieval.
  • Refactor variable names for better readability and maintainability.
  • Enhance node processing logic to handle content fields more efficiently.
  • Ensure robust handling of datasource entries and their dimensions.
  • Maintain support for local assets with improved caching mechanism.
  • Remove unused code.
  • Fix linting issues.
  • Update README.md.

Reason for the Change

Our goal was to enable incremental builds in our Gatsby using the official gatsby-source-storyblok plugin. During implementation, we noticed that the source and transform nodes step was significantly slow, leading us to reconsider using this plugin.

After reviewing the source code of the Storyblok plugin, we identified several potential improvements. This pull request (PR) details the enhancements we've made.

Key Changes and Benefits

  1. Sequential Fetching to Concurrent Fetching:

    • Improved Performance: Parallel requests significantly boost performance, especially for large datasets.
    • Rate Limit Handling: Concurrent fetching allows for more careful management of rate limits and server load.
    • High-Performance Applications: Though more complex, this approach is necessary for optimizing high-performance applications.

    By implementing concurrent fetching, we observed a 20%-30% speed increase for large spaces with over 5000 pages and more than 20 datasources, some containing over 10,000 values.

  2. Optimizing Datasource Fetching:

    • Focused Fetching: Some datasources, in our case such as icons are used internally in Storyblok, are not required on the frontend. By selectively fetching only the needed datasources, we further optimized performance.
    • New Option includeDatasources: This option allows specifying which datasources to fetch, avoiding unnecessary data retrieval.

    Example configuration in gatsby-config.js:

    {
      resolve: 'gatsby-source-storyblok',
      options: {
        accessToken: 'YOUR_TOKEN',
        version: 'draft',
        resolveRelations: [''],
        includeLinks: false,
        includeDatasources: ['datasource1', 'datasource2', 'datasource3']
      }
    }

    Implementation logic:

    if (options.includeDatasources === undefined) {
      datasources = await fetchAllDatasources();
    } else if (options.includeDatasources.length > 0) {
      datasources = options.includeDatasources;
    }
  3. Fetching Tags
    Since not everyone uses Storyblok Tags, having the option to disable this feature would also save time.

Summary

These changes result in significant performance improvements, making the plugin more suitable for large projects. By implementing concurrent fetching and selective datasource fetching, we have optimized the source and transform nodes step, making the Gatsby build process more efficient.

…roved performance

- Implement concurrent fetching of pages using Promise.all to speed up data retrieval.
- Refactor variable names for better readability and maintainability.
- Enhance node processing logic to handle content fields more efficiently.
- Ensure robust handling of datasource entries and their dimensions.
- Maintain support for local assets with improved caching mechanism.
- Remove unused code.
- Fix linting issues.
- Update README.md.
@schabibi1 schabibi1 added the investigation [Issue] The Storyblok team is investigating label Jul 30, 2024
lib/src/sync.js Show resolved Hide resolved
lib/src/sync.js Show resolved Hide resolved
@schabibi1
Copy link
Contributor

@demirgazetic Thank you for creating a PR! Also, thank you for providing me with the details above.

I wrote a few questions for review. Please feel free to comment there, as I have a few things I would like to hear more details about.
In my opinion, changing from sequential fetching to concurrent fetching sounds good for scalable projects with Gatsby to minimize the risk of hitting the rate limit unnecessarily.
As it's limited from the Gatsby side to enhance the performance, going with Promise.all to only resolve when required arrays of Promises (i.e. stories, datasources, & tags) seems what you can do from your side.

You also removed the old code from gatsby-node.js and reduced the scope levels with getAll when possible.
It's a good approach to make includeDatasources and includeTags opt-in options in gatsby-config, as we already did for the links with includeLinks.

As Gatsby users need to constantly pay attention to reducing the rate limit and performance cost, these opt-in options object provided by Gatsby looks like the way to go. Again, that's what we also did the same for the includeLinks.

…ation logic

- Updated the pagination logic to use a single `page` variable for both the initial response and the for loop.
- Changed the calculation of `lastPage` by dividing the total number by 100 instead of 25 to optimize the page count.
@schabibi1 schabibi1 merged commit 2adeafa into storyblok:master Aug 13, 2024
2 checks passed
Copy link

🎉 This issue has been resolved in version 7.1.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Copy link

Merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
investigation [Issue] The Storyblok team is investigating released
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants