Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Make CollectionDescription.Name Option #2223

Merged
merged 6 commits into from
Jan 19, 2024

Conversation

AndrewSisley
Copy link
Contributor

@AndrewSisley AndrewSisley commented Jan 16, 2024

Relevant issue(s)

Partially resolves #2198

Description

Makes CollectionDescription.Name Option. Changes secondary indexes to be indexed by Collection ID. Moves view queries into Sources property.

Each change has been described in the commit body of the respective commit.

Sorry if this PR feels I bit disjointed, I did plan on fully migrating all Sources to the new Sources property (the changes to Name are required for that), but the PR grew a fair bit when I found the need to change the secondary index index, and given that others are actively working in this space I thought it best to try and get this merged earlier.

  • Do not merge this until after 0.9 is cut.

@AndrewSisley AndrewSisley added area/query Related to the query component area/collections Related to the collections system refactor This issue specific to or requires *notable* refactoring of existing codebases and components labels Jan 16, 2024
@AndrewSisley AndrewSisley requested a review from a team January 16, 2024 17:40
@AndrewSisley AndrewSisley self-assigned this Jan 16, 2024
Copy link

codecov bot commented Jan 16, 2024

Codecov Report

Attention: 87 lines in your changes are missing coverage. Please review.

Comparison is base (316c8d7) 74.42% compared to head (3610f78) 74.14%.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2223      +/-   ##
===========================================
- Coverage    74.42%   74.14%   -0.28%     
===========================================
  Files          256      256              
  Lines        25508    25627     +119     
===========================================
+ Hits         18983    19000      +17     
- Misses        5246     5321      +75     
- Partials      1279     1306      +27     
Flag Coverage Δ
all-tests 74.14% <57.35%> (-0.28%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
cli/collection.go 58.90% <100.00%> (ø)
client/errors.go 62.71% <ø> (ø)
db/backup.go 59.64% <100.00%> (ø)
db/collection_update.go 71.28% <100.00%> (ø)
db/view.go 64.71% <100.00%> (+0.71%) ⬆️
planner/datasource.go 88.00% <100.00%> (ø)
planner/scan.go 89.20% <100.00%> (-1.41%) ⬇️
planner/select.go 83.50% <100.00%> (+0.05%) ⬆️
planner/type_join.go 79.10% <100.00%> (ø)
planner/view.go 78.43% <100.00%> (+0.88%) ⬆️
... and 9 more

... and 6 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 316c8d7...3610f78. Read the comment docs.

Copy link
Contributor

@islamaliev islamaliev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I just have an improvement suggestion.

@@ -32,12 +33,11 @@ type CollectionDescription struct {
// The ID of the schema version that this collection is at.
SchemaVersionID string

// BaseQuery contains the base query of this view, if this collection is a view.
// Sources is the set of sources from which this collection draws data from.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: 2 times "from"

Copy link
Contributor Author

@AndrewSisley AndrewSisley Jan 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice spot, thanks 😁

  • Fix wording

@@ -313,6 +313,11 @@ func findFilteredByIndexedField(scanNode *scanNode) immutable.Option[client.Fiel
}

func (n *selectNode) initFields(selectReq *mapper.Select) ([]aggregateNode, error) {
var isQuerySource bool
if n.collection != nil && len(n.collection.Description().Sources) != 0 {
_, isQuerySource = n.collection.Description().Sources[0].(*client.QuerySource)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: don't you think it makes sense to extract this method to CollectionDescription?

It is used in quite some places and [0] indexing is a bit bothering. Would be nice to have it all in one place.

Maybe it should be GetAllQuerySources or HasQuerySource

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to make the [0] indexing more documented or tucked in one place like this suggestion

Copy link
Contributor Author

@AndrewSisley AndrewSisley Jan 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: don't you think it makes sense to extract this method to CollectionDescription?

Yeah, I thought about it and then shied away as I didn't want to clutter the public object, is also only actually used in two places I think - there are a couple of other similar but necessarily different places as well that make it look more duplicated than it is.

Any function added (e.g. HasQuerySource) would really need to be duplicated per type of source.
Your GetAllQuerySources gave me an idea whilst I wrote that last bit out, I'm probably going to add a GetSources[Type]() Type function - thanks!

  • Add GetSources[Type]() Type func

Would be nice to make the [0] indexing more documented or tucked in one place like this suggestion

I'm not sure a comment would add anything here given the name of the variable and the nature of the current code. The usage of the isQuerySource is also documented already.

Copy link
Contributor Author

@AndrewSisley AndrewSisley Jan 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetSources[Type]() Type

Golang doesn't actually support this yet... The func will not be declared on the CollectionDescription object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Golang doesn't actually support this yet... The func will not be declared on the CollectionDescription object.

I added func (col CollectionDescription) QuerySources() []*QuerySource (which calls a private generic func) as workaround, preferring a larger number of funcs on CollectionDescription over a static one floating about in the public package.

@@ -81,7 +81,7 @@ func MakeCollectionCommand(cfg *config.Config) *cobra.Command {
fetchedCols := cols
cols = nil
for _, c := range fetchedCols {
if c.Name() == name {
if c.Name().Value() == name {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Why have you skipped HasValue() here before calling Value(). If IRRC the value returned is invalid unless HasValue() == true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name cannot be "" here, and at the moment fetchedCols cannot contain any nameless collections.

The code that populates fetchedCols will likely need restructuring somewhat in the followup PR that sorts out the version stuff, this line will be looked at again at that point.

@@ -32,7 +34,7 @@ type CollectionDefinition struct {
// Many functions on this object will interact with the underlying datastores.
type Collection interface {
// Name returns the name of this collection.
Name() string
Name() immutable.Option[string]

// ID returns the ID of this Collection.
ID() uint32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Will ID always be there even if Name() isn't? or should this also be optional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ID will always be there.

@@ -78,13 +80,22 @@ func (col CollectionDescription) GetFieldByRelation(
schema *SchemaDescription,
) (FieldDescription, bool) {
for _, field := range schema.Fields {
if field.RelationName == relationName && !(col.Name == otherCollectionName && otherFieldName == field.Name) {
if field.RelationName == relationName && !(col.Name.Value() == otherCollectionName && otherFieldName == field.Name) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Why have you skipped HasValue() here before calling Value(). If IRRC the value returned is invalid unless HasValue() == true

question: Same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar answer to the above in that Name will always have a value here, and this function will probably to change to work with ID instead anyway.

@@ -291,22 +292,28 @@ func NewCollectionSchemaVersionKeyFromString(key string) (CollectionSchemaVersio
}

// NewCollectionIndexKey creates a new CollectionIndexKey from a collection name and index name.
func NewCollectionIndexKey(colID, indexName string) CollectionIndexKey {
return CollectionIndexKey{CollectionName: colID, IndexName: indexName}
func NewCollectionIndexKey(colID immutable.Option[uint32], indexName string) CollectionIndexKey {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Related to my question above, since ID is optional (due to I guess name being optional) then should probably change on Collection interface as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you misunderstand, if colID here is optional it will create an empty key that allows scanning over the entire prefix. CollectionDescription.ID is not optional.

@@ -137,7 +137,7 @@ func (db *db) basicExport(ctx context.Context, txn datastore.Txn, config *client
}
colNameCache := map[string]struct{}{}
for _, col := range cols {
colNameCache[col.Name()] = struct{}{}
colNameCache[col.Name().Value()] = struct{}{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Why have you skipped HasValue() here before calling Value(). If IRRC the value returned is invalid unless HasValue() == true

question: Same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name can't be empty here. The code is exporting queryable (i.e. named) data.

Comment on lines +184 to +185
fmt.Sprintf("\"%s\":[", col.Name().Value()),
fmt.Sprintf(" \"%s\": [\n", col.Name().Value()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Why have you skipped HasValue() here before calling Value(). If IRRC the value returned is invalid unless HasValue() == true

question: Same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name can't be empty here. The code is exporting queryable (i.e. named) data.

@@ -989,7 +1018,7 @@ func (c *collection) save(
if isSecondaryRelationID {
primaryId := val.Value().(string)

err = c.patchPrimaryDoc(ctx, txn, c.Name(), relationFieldDescription, primaryKey.DocID, primaryId)
err = c.patchPrimaryDoc(ctx, txn, c.Name().Value(), relationFieldDescription, primaryKey.DocID, primaryId)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Why have you skipped HasValue() here before calling Value(). If IRRC the value returned is invalid unless HasValue() == true

question: Same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

save can only be called on named collections, if a collection has no name data can't be saved into it.

ErrFailedToUnmarshalCollection = errors.New(errFailedToUnmarshalCollection)
ErrOperationNotPermittedOnNamelessCols = errors.New(errOperationNotPermittedOnNamelessCols)
ErrFieldNotObject = errors.New("trying to access field on a non object type")
ErrValueTypeMismatch = errors.New("value does not match indicated type")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: would be nice to just remove ErrValueTypeMismatch error as it is not being used anywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • See if ErrValueTypeMismatch can be removed

Copy link
Member

@shahzadlone shahzadlone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good, mainly 2 concerns (most are repetitive comments).

  1. The 0 index being tucked in one spot with documentation would be nice.
  2. Since we have move to using the Option type then just ensuring we use it with the HasValue()

indexes[indexKey.CollectionName] = append(
indexes[indexKey.CollectionName],

col, err := description.GetCollectionByID(ctx, txn, indexKey.CollectionID.Value())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Why have you skipped HasValue() here before calling Value(). If IRRC the value returned is invalid unless HasValue() == true

question: Same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CollectionID can't be nil here

Comment on lines +87 to +88
indexes[col.Name.Value()] = append(
indexes[col.Name.Value()],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Why have you skipped HasValue() here before calling Value(). If IRRC the value returned is invalid unless HasValue() == true

question: Same as above

Copy link
Contributor Author

@AndrewSisley AndrewSisley Jan 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nameless/dataless collections can't be indexed.

@@ -402,7 +402,7 @@ func (c *collection) makeSelectionPlan(
return nil, ErrInvalidFilter
}

f, err = c.db.parser.NewFilterFromString(c.Name(), fval)
f, err = c.db.parser.NewFilterFromString(c.Name().Value(), fval)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Why have you skipped HasValue() here before calling Value(). If IRRC the value returned is invalid unless HasValue() == true

question: Same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nameless collections can't be queryed

@@ -432,7 +432,7 @@ func (c *collection) makeSelectionPlan(
func (c *collection) makeSelectLocal(filter immutable.Option[request.Filter]) (*request.Select, error) {
slct := &request.Select{
Field: request.Field{
Name: c.Name(),
Name: c.Name().Value(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Why have you skipped HasValue() here before calling Value(). If IRRC the value returned is invalid unless HasValue() == true

question: Same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nameless collections can't be queryed

@@ -136,7 +141,7 @@ func newIndexTestFixture(t *testing.T) *indexTestFixture {
func (f *indexTestFixture) createCollectionIndex(
desc client.IndexDescription,
) (client.IndexDescription, error) {
return f.createCollectionIndexFor(f.users.Name(), desc)
return f.createCollectionIndexFor(f.users.Name().Value(), desc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Why have you skipped HasValue() here before calling Value(). If IRRC the value returned is invalid unless HasValue() == true

question: Same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'user' collection has a name :)

@@ -54,7 +54,7 @@ func (m configsMap) AddForField(typeStr, fieldName string, conf genConfig) {
func validateConfig(types map[string]client.CollectionDefinition, configsMap configsMap) error {
for typeName, typeConfigs := range configsMap {
typeDef := types[typeName]
if typeDef.Description.Name == "" {
if typeDef.Description.Name.Value() == "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Why have you skipped HasValue() here before calling Value(). If IRRC the value returned is invalid unless HasValue() == true

question: Same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@@ -71,7 +72,7 @@ func getDocIDsFromDocs(docs []*client.Document) []string {
func filterByCollection(docs []GeneratedDoc, name string) []*client.Document {
var result []*client.Document
for _, doc := range docs {
if doc.Col.Description.Name == name {
if doc.Col.Description.Name.Value() == name {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Why have you skipped HasValue() here before calling Value(). If IRRC the value returned is invalid unless HasValue() == true

question: Same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@@ -31,7 +31,7 @@ func parseSDL(gqlSDL string) (map[string]client.CollectionDefinition, error) {
}
result := make(map[string]client.CollectionDefinition)
for _, col := range cols {
result[col.Description.Name] = col
result[col.Description.Name.Value()] = col
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Why have you skipped HasValue() here before calling Value(). If IRRC the value returned is invalid unless HasValue() == true

question: Same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@@ -344,15 +344,15 @@ func createGenerateDocs(s *state, docs []gen.GeneratedDoc, nodeID immutable.Opti
if err != nil {
s.t.Fatalf("Failed to generate docs %s", err)
}
createDoc(s, CreateDoc{CollectionID: nameToInd[doc.Col.Description.Name], Doc: docJSON, NodeID: nodeID})
createDoc(s, CreateDoc{CollectionID: nameToInd[doc.Col.Description.Name.Value()], Doc: docJSON, NodeID: nodeID})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Why have you skipped HasValue() here before calling Value(). If IRRC the value returned is invalid unless HasValue() == true

question: Same as above, and other places in this file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can only create docs for named collections

@@ -31,7 +31,7 @@ func parseSDL(gqlSDL string) (map[string]client.CollectionDefinition, error) {
}
result := make(map[string]client.CollectionDefinition)
for _, col := range cols {
result[col.Description.Name] = col
result[col.Description.Name.Value()] = col
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Why have you skipped HasValue() here before calling Value(). If IRRC the value returned is invalid unless HasValue() == true

question: Same as above, and other places in this file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These collections are returned from a parsed SDL, I would be curious as to how one could declare a nameless collection in an SDL doc :)

CollectionDescription.Sources will shortly accomodate two other types:

- CollectionSource - where the source is another CollectionDescription, this will be used for schema versioning - it will point to the CollectionDescription of the previous schema version.
- DataStoreSource - indicates that items are stored directly in the database.

The names of those two additional types are very rough, they may be changed, however I am confident in the need for a Sources property with a QuerySource type populating it.
When making CollectionDescription.Name optional (and eventually mutable) I saw that the secondary indexes definitions are themselves indexed by collection name. This causes problems if we allow collection name to mutate, or be empty, so now it is indexed by the immutable, mandatory, collection id.
This change only changes the type, it does not allow users to mutate it, or set it to none - this will be exposed later.  This is done deliberately to limit the scope of the changes in this commit/PR.  Code referencing the property *should* have been tweaked to handle None values, however this is not going to be properly tested until we expose the functionality.

Name needs to be optional, as this is the mechanic by which we will allow collections at different schema versions to be hidden (by default) from the clients (e.g. gql) - only 'default' schema version collections will have names.  Multiple versions of the same root collection can be named, and they may have different names (e.g. Users_Beta, Users, Users_Deprecated).  Users are free to mutate these names when they like.
Annoyingly Go doesn't permit the generic func to be hosted on the object, and I didn't want to clutter the public space by making the underlying generic function public
@AndrewSisley AndrewSisley merged commit d07bfd1 into sourcenetwork:develop Jan 19, 2024
30 of 31 checks passed
@AndrewSisley AndrewSisley deleted the 2198-col-sources branch January 19, 2024 15:06
shahzadlone pushed a commit to shahzadlone/defradb that referenced this pull request Jan 22, 2024
## Relevant issue(s)

Partially resolves sourcenetwork#2198

## Description

Makes CollectionDescription.Name Option. Changes secondary indexes to be
indexed by Collection ID. Moves view queries into Sources property.
nasdf pushed a commit to nasdf/defradb that referenced this pull request Jan 23, 2024
## Relevant issue(s)

Partially resolves sourcenetwork#2198

## Description

Makes CollectionDescription.Name Option. Changes secondary indexes to be
indexed by Collection ID. Moves view queries into Sources property.
shahzadlone pushed a commit to shahzadlone/defradb that referenced this pull request Feb 23, 2024
## Relevant issue(s)

Partially resolves sourcenetwork#2198

## Description

Makes CollectionDescription.Name Option. Changes secondary indexes to be
indexed by Collection ID. Moves view queries into Sources property.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/collections Related to the collections system area/query Related to the query component refactor This issue specific to or requires *notable* refactoring of existing codebases and components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CollectionDescription Source refactor
3 participants