You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here is a summary of my understanding of The Graph, please correct any possible misunderstandings on my part:
The Graph is
a standard for specifying a GraphQL API, and associated WASM blockchain data processing routines called mappings, for maintaining the underlying data for this API.
A particular instantiation of this two concepts is called a subgraph, hence there would for example eventually be a Joystream subgraph. Currently this standard
only covers Ethereum.
a set of tools, the centrepiece of which is a Rust based API serving node, which can load a subgraph dynamically.
Currently, this tooling only works for Ethereum.
a future network of node operators which will operate infrastructure for different subgraphs. The key goal here
is to incentivise these operators to provide quality service at scale, and also to provide honest query results.
How this is to happen is yet to be resolved. All current uses of The Graph rely on a trusted operator, e.g. such as the
DApp developer.
Using The Graph for our query node
There is a good chance that The Graph, both as a standard and the tools, is coming to Substrate.
The timeline for when anything production ready would be available is however very uncertain.
There are a number of plausible benefits of relying on The Graph, rather than rolling our own full stack bespoke solution
Better tooling: They are writing a high performance query node, and have a large team (15+) working on improving and maintaining it, as well as substantial community buying, even at an early stage. Our own solution is entirely bespoke,
and written largely in Python and Typescript, and has much less surrounding tooling and documentation.
Outsourcing unresolved hard problems: There are some important hard problems that need to be resolved, such as how to deal with in-flight runtime upgrades, or how to authenticate the responses of the query node. There is a much greater chance
that The Graph will solve this problem better than us, and even if not, we have other areas of focus which are worth trading off against investing in the query node.
Follow a standard: It will be easier for new developers in the Substrate ecosystem to contribute and improve our query infrastructure, if it follows some familiar standard. If The Graph comes to Substrate, many will adopt it, and thus there
will be a larger pool of trained developers who can improve the query node at a lower barrier to entry.
Free features: Things like filtering, sorting, pagination and in the future aggregate functions with grouping, are part of the well designed framework, you get them for free without any extra coding. We would have to replicate this in each query by hand, or
at least replicate some reusable abstraction we can inject in our manually written query resolvers, such as The Graph has already done.
Impositions of The Graph
This is the current main design constraints we must respect in order to have our API and blockchain data processors maximally transferrable to a future Substrate The Graph.
Join free queries: The Graph requires that each query exactly one entity type at the data layer, and accepts no user defined type arguments, or allows the developer to write query resolvers.
There is an automatic query resolver supplied which simply looks up across instances of the single entity type in the data layer. This means that if we have a desired query which needs to do an implicit join operation access to multiple different entities in the Substrate
storage layer, then the entity type in The Graph be this join product itself. Critically, even with this, we cannot replicate any conceivable join query at this stage, because aggregators are not currently ready.
Pure mappings: It appears that The Graph allows you to write mappings that key off one of the following: contract calls, block arrival, contract event. This means that each one of these must contain all relevant information to perform the
required mapping. E.g. if a particular event occurs, the event parameters defined by the contract author must have included all information that is needed for the query node mapping author to figure out what side-effect this event will
have on the set of entities in the API. This is not the case for many events that we have currently defined in the Substrate runtime. This has so far not been a problem in our own bespoke node, because Substrate events exposed by the Harvester
will include information about the initial call that was part of triggering it, and together this has always been sufficient.
No filtering, sorting, pagination: This is not really a requirement per say, its just that, if we try to add this by hand, we will be duplicating work we get for free. So perhaps the best approach is to only add this by hand if we
absolutely need it for our UX in the interim.
Write mappings with Assemblyscript in mind: The Graph has tooling for compiling a subset of Typescript down to WASM. We should write our data processors in a way which has this in mind, by sticking as close as possible to the subset of Typescript
available in Assemblyscript.
Risks
The Graph may never arrive for Substrate, and some of the constraints may have had some costs which will not then in the end made up for.
The Graph for Substrat may end up being materially different from the existing The Graph for Ethereum, in which case some of the listed constraints may be false, or there may be other new constraints we have not taken into account, which all conspire to raise the cost of the transition.
The text was updated successfully, but these errors were encountered:
Clarification: It appears that event handlers do indeed allow you to recover the initial transaction, and corresponding payload, responsible for the event. This means that just processing events should be sufficient to construct state for any query we would like, so long as events supply all required information about side-effects. They need not copy over tx parameters.
The Graph
Here is a summary of my understanding of The Graph, please correct any possible misunderstandings on my part:
The Graph is
a standard for specifying a GraphQL API, and associated WASM blockchain data processing routines called mappings, for maintaining the underlying data for this API.
A particular instantiation of this two concepts is called a subgraph, hence there would for example eventually be a Joystream subgraph. Currently this standard
only covers Ethereum.
a set of tools, the centrepiece of which is a Rust based API serving node, which can load a subgraph dynamically.
Currently, this tooling only works for Ethereum.
a future network of node operators which will operate infrastructure for different subgraphs. The key goal here
is to incentivise these operators to provide quality service at scale, and also to provide honest query results.
How this is to happen is yet to be resolved. All current uses of The Graph rely on a trusted operator, e.g. such as the
DApp developer.
Using The Graph for our query node
There is a good chance that The Graph, both as a standard and the tools, is coming to Substrate.
The timeline for when anything production ready would be available is however very uncertain.
There are a number of plausible benefits of relying on The Graph, rather than rolling our own full stack bespoke solution
Better tooling: They are writing a high performance query node, and have a large team (15+) working on improving and maintaining it, as well as substantial community buying, even at an early stage. Our own solution is entirely bespoke,
and written largely in Python and Typescript, and has much less surrounding tooling and documentation.
Outsourcing unresolved hard problems: There are some important hard problems that need to be resolved, such as how to deal with in-flight runtime upgrades, or how to authenticate the responses of the query node. There is a much greater chance
that The Graph will solve this problem better than us, and even if not, we have other areas of focus which are worth trading off against investing in the query node.
Follow a standard: It will be easier for new developers in the Substrate ecosystem to contribute and improve our query infrastructure, if it follows some familiar standard. If The Graph comes to Substrate, many will adopt it, and thus there
will be a larger pool of trained developers who can improve the query node at a lower barrier to entry.
Free features: Things like filtering, sorting, pagination and in the future aggregate functions with grouping, are part of the well designed framework, you get them for free without any extra coding. We would have to replicate this in each query by hand, or
at least replicate some reusable abstraction we can inject in our manually written query resolvers, such as The Graph has already done.
Impositions of The Graph
This is the current main design constraints we must respect in order to have our API and blockchain data processors maximally transferrable to a future Substrate The Graph.
Join free queries: The Graph requires that each query exactly one entity type at the data layer, and accepts no user defined type arguments, or allows the developer to write query resolvers.
There is an automatic query resolver supplied which simply looks up across instances of the single entity type in the data layer. This means that if we have a desired query which needs to do an implicit join operation access to multiple different entities in the Substrate
storage layer, then the entity type in The Graph be this join product itself. Critically, even with this, we cannot replicate any conceivable join query at this stage, because aggregators are not currently ready.
Pure mappings: It appears that The Graph allows you to write mappings that key off one of the following: contract calls, block arrival, contract event. This means that each one of these must contain all relevant information to perform the
required mapping. E.g. if a particular event occurs, the event parameters defined by the contract author must have included all information that is needed for the query node mapping author to figure out what side-effect this event will
have on the set of entities in the API. This is not the case for many events that we have currently defined in the Substrate runtime. This has so far not been a problem in our own bespoke node, because Substrate events exposed by the Harvester
will include information about the initial call that was part of triggering it, and together this has always been sufficient.
No filtering, sorting, pagination: This is not really a requirement per say, its just that, if we try to add this by hand, we will be duplicating work we get for free. So perhaps the best approach is to only add this by hand if we
absolutely need it for our UX in the interim.
Write mappings with Assemblyscript in mind: The Graph has tooling for compiling a subset of Typescript down to WASM. We should write our data processors in a way which has this in mind, by sticking as close as possible to the subset of Typescript
available in Assemblyscript.
Risks
The Graph may never arrive for Substrate, and some of the constraints may have had some costs which will not then in the end made up for.
The Graph for Substrat may end up being materially different from the existing The Graph for Ethereum, in which case some of the listed constraints may be false, or there may be other new constraints we have not taken into account, which all conspire to raise the cost of the transition.
The text was updated successfully, but these errors were encountered: