Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential performance improvement: "Fast forward content streams" #5264

Open
Tracked by #4388
bwaidelich opened this issue Sep 25, 2024 · 1 comment
Open
Tracked by #4388

Comments

@bwaidelich
Copy link
Member

bwaidelich commented Sep 25, 2024

TL;DR: Today we always copy all edges when bringing a workspace in sync even if it does not contain any changes. It might be a worthwhile performance improvement to only "patch" the existing content stream

Imagine the following scenario:

With this simplified graph:

graph TD
    root --> a
    root --> b
    a --> a1
    a --> a2
    b --> b1

    root --> a
    root --> b
    a --> a1
    a --> a2
    b --> b1
    
    linkStyle 5,6,7,8,9 stroke:blue;
Loading

(black = live content stream, blue = user content stream)

When node a is modified in the live workspace, a copy is created and the edges moved like this:

graph TD
    root --> a'
    root --> b
    a' --> a1
    a' --> a2
    b --> b1

    root --> a
    root --> b
    a --> a1
    a --> a2
    b --> b1

    linkStyle 0,2,3 stroke-width: 2px;
    linkStyle 5,6,7,8,9 stroke:blue;
Loading

(bold = new edges)

As a result, the user workspace is out of date and needs to be synced.
Today this is done by creating a new content stream, i.e. copying all edges from the target (live) workspace:

graph TD
    root --> a'
    root --> b
    a' --> a1
    a' --> a2
    b --> b1

    root --> a
    root --> b
    a --> a1
    a --> a2
    b --> b1

    root --> a'
    root --> b
    a' --> a1
    a' --> a2
    b --> b1

    linkStyle 5,6,7,8,9 stroke:blue;
    linkStyle 10,11,12,13,14 stroke:red, stroke-width: 2px;
Loading

(red = edges for the new user content stream)

In reality this means that loads of edges are now the same in all three content streams.

The resulting events are

%%{init: { 'gitGraph': {'showBranches': true, 'showCommitLabel':true,'mainBranchName': 'live'}} }%%
      gitGraph
        commit id:"Root WS Created"
        branch contentstream-u1
        commit id:"CS Forked"
        branch workspace-u
        commit id:"WS Created"
        checkout live
        commit id:"Node Created"
        branch contentstream-u2
        commit id:"CS Forked'"
        checkout workspace-u
        commit id:"WS Rebased"
Loading

Suggestion

Instead of starting a new content stream with the ContentStreamWasForked event we could publish some new event (e.g. ContentStreamWasSynced.. that contains the new versionOfSourceContentStream) and then "only" add/remove edges that are affected:

graph TD
    root --> a'
    root --> b
    a' --> a1
    a' --> a2
    b --> b1

    root --> a'
    root --> b
    a' --> a1
    a' --> a2
    b --> b1

    linkStyle 6,9 stroke:blue;
    linkStyle 5,7,8 stroke:blue, stroke-width: 2px;
Loading

And the resulting events:

%%{init: { 'gitGraph': {'showBranches': true, 'showCommitLabel':true,'mainBranchName': 'live'}} }%%
      gitGraph
        commit id:"Root WS Created"
        branch contentstream-u1
        commit id:"CS Forked"
        branch workspace-u
        commit id:"WS Created"
        checkout live
        commit id:"Node Created"
        checkout contentstream-u1
        commit id:"CS Synced"
Loading

Of course this can only work if the content stream itself does not contain any changes.

Considerations

Most/all places that currently react to ContentStreamWasForked events (currently that is the ContentGraph-, ContentStream and AssetUsage-projection) need to also handle the new ContentStreamWasSynced (or similar) event.
Furthermore the WorkspaceWasRebased event is no longer published in these cases – So probably the Neos content cache flusher needs to handle the new event, too.

The most complex part is probably the actual performance optimization to create only the missing edges. In a first implementation we could simplify this by always removing and re-creating all edges, allowing the main performance gain to be done in a non-breaking manner

Related: #4388

@bwaidelich
Copy link
Member Author

bwaidelich commented Sep 26, 2024

Example

Imagine there are two nodes a, b and c in live (black) and user workspace (blue):

graph TD
    root --> a
    root --> b
    root --> c

    root --> a
    root --> b
    root --> c
    
    linkStyle 2,3,4 stroke:blue;
Loading

In live, someone

  • edits node a (so a copy a' is created and the edge is moved)
  • deletes node b (so the edge is removed)
  • adds a node d (so a new node d is added and the corresponding edge)

the resulting hierarchy relations are (bold = new edges, dashed = removed edges)

graph TD
    root --> a
    root --> a'
    root --> b
    root --> c
    root --> d

    root --> a
    root --> b
    root --> c
    
    linkStyle 0,2 stroke-dasharray: 2;
    linkStyle 1,4 stroke-width: 2px;
    linkStyle 5,6,7 stroke:blue;
Loading

To sync the user workspace, we first find and remove all edges that are exclusive to the user workspace:

DELETE
FROM
  cr_default_p_graph_hierarchyrelation
WHERE
  (dimensionspacepointhash, parentnodeanchor, childnodeanchor, contentstreamid)
  IN (
    SELECT
      dimensionspacepointhash, parentnodeanchor, childnodeanchor, contentstreamid
    FROM
      cr_default_p_graph_hierarchyrelation h_source
    WHERE
      h_source.contentstreamid = '<user-cs-id>'
      AND NOT EXISTS (
        SELECT
          h_target.*
        FROM
          cr_default_p_graph_hierarchyrelation h_target
        WHERE
          h_target.contentstreamid = '<live-cs-id>'
          AND h_target.dimensionspacepointhash = h_source.dimensionspacepointhash
          AND h_target.parentnodeanchor = h_source.parentnodeanchor
          AND h_target.childnodeanchor = h_source.childnodeanchor
      )
  );

and get the resulting graph:

graph TD
    root --> a'
    root --> c
    root --> d

    root --> a
    root --> b
    root --> c
    
    linkStyle 3,4 stroke:blue, stroke-width: 2px, stroke-dasharray: 2;
    linkStyle 5 stroke:blue;
Loading

..and then copy all edges that are exclusive to the live workspace:

INSERT INTO cr_default_p_graph_hierarchyrelation (position, dimensionspacepointhash, parentnodeanchor, childnodeanchor, contentstreamid, subtreetags)
SELECT
      position, dimensionspacepointhash, parentnodeanchor, childnodeanchor, '<user-cs-id>' contentstreaid, subtreetags
    FROM
      cr_default_p_graph_hierarchyrelation h_target
    WHERE
      h_target.contentstreamid = '<live-cs-id>'
      AND NOT EXISTS (
        SELECT
          h_source.*
        FROM
          cr_default_p_graph_hierarchyrelation h_source
        WHERE
          h_source.contentstreamid = '<user-cs-id>'
          AND h_source.dimensionspacepointhash = h_target.dimensionspacepointhash
          AND h_source.parentnodeanchor = h_target.parentnodeanchor
          AND h_source.childnodeanchor = h_target.childnodeanchor
      );

which leads to the graph:

graph TD
    root --> a'
    root --> c
    root --> d

    root --> a'
    root --> c
    root --> d

    a
    b
    
    linkStyle 3,5 stroke:blue, stroke-width: 2px;
    linkStyle 4 stroke:blue;
Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Prioritized 🔥
Development

No branches or pull requests

1 participant