-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unified schema GitHub #45
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM plus one suggestion and one question.
github/store/db.go
Outdated
if err != nil { | ||
return fmt.Errorf("failed to drop unified view pull_request_reviews: %v", err) | ||
} | ||
// TODO: state value rename COMMENTED to something else? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not add this as an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense to fix it here. I just want to hear somebody else's opinion on the new name.
Current statuses are approved/rejected/commented. I would rename it to "viewed" (in one of bitbuckets review actually have a boolean variable "approved", which would map nicely on "viewed" but not "commented")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the benefit of "viewed", because I'd say that the purpose of a review is either approve, reject or comment the PR.
Also, I didn't understand you when mentioning BB reviews; but if it means that BB reviews can only happen to approve or not, I'd map them as approved
/commented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can keep "commented" if you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a strong opinion, for sure, but I'd vote for commented
if you ask ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smacker could you please recap the values for the 3 different providers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
github: APPROVED/CHANGES_REQUESTED/COMMENTED
bbserver: APPROVED/REVIEWED
for bbcould&gitlab there is a boolean field "approved" (according to the table) but maybe if we fetch data from activity endpoints we can get some text values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If have to go for a duality, I'd go for APPROVED/COMMENTED
. Simply because a comment could also not be a review, but a review should always be a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
APPROVED/COMMENTED
sounds good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
github/store/db.go
Outdated
_, err := s.DB.ExecContext(ctx, "DROP MATERIALIZED VIEW IF EXISTS owners") | ||
if err != nil { | ||
return fmt.Errorf("failed to drop unified view owners: %v", err) | ||
} | ||
_, err = s.DB.ExecContext(ctx, fmt.Sprintf(`CREATE MATERIALIZED VIEW owners AS | ||
SELECT login, name | ||
FROM github_organizations_versioned WHERE %v = ANY(versions)`, v)) | ||
if err != nil { | ||
return fmt.Errorf("failed to create unified view owners: %v", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about something like the example below.
Imo, that way it could be reviewed more easily, and it could be also useful for documentation purposes because it will be more clear the mapping between the view, and its underlying data.
var views map[string]string = map[string]string{
"owners": `SELECT login, name
 FROM github_organizations_versioned WHERE %v = ANY(versions)`,
"users": `SELECT login, name
 FROM github_users_versioned WHERE %v = ANY(versions)`,
"issues": `SELECT repository_owner, repository_name, repository_owner || '/' || repository_name AS repository_full_name,
 number, state, title, body,
 created_at, closed_at, updated_at, comments, user_id, user_login, htmlurl AS html_url, labels
 FROM github_issues_versioned WHERE %v = ANY(versions)`,
}
func createMaterializedVire(viewname string, viewFilter, version int) err {
_, err := s.DB.ExecContext(ctx, fmt.Printf("DROP MATERIALIZED VIEW IF EXISTS %s", viewname))
if err != nil {
return fmt.Errorf("failed to drop unified view %s: %v", viewname, err)
}
_, err = s.DB.ExecContext(ctx, fmt.Sprintf(`CREATE MATERIALIZED VIEW %s AS %s`,
fmt.Printf(viewFilter, version),
viewname,
))
if err != nil {
return fmt.Errorf("failed to create unified view %s: %v", viewname, err)
}
return nil
}
func (s *DB) SetActiveVersion(ctx context.Context, v int) error {
for name, query := range views {
if err := createMaterializedVire(name, query, v); err != nil {
return err
}
}
return nil
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, if we can do something like what @dpordomingo proposed I think it would more readable, instead of having a wall of sql statements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I change to a map it would still be a wall of sql queries. Let me think about how to improve it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't come up with anything better than this: 59c1805
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mhmmm... doesn't change that much in terms of visual perception 😂. I'm ok in both ways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really prefer this other version, because now the SQL code and logic are clearly separated.
thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmm, can you help me to understand the join
?
github/store/db.go
Outdated
SELECT c.repository_owner, c.repository_name, c.repository_owner || '/' || c.repository_name AS repository_full_name, | ||
c.issue_number, c.created_at, c.body, c.user_id, c.user_login, c.htmlurl AS html_url | ||
FROM github_issue_comments_versioned AS c | ||
JOIN github_issues_versioned AS i ON |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why can't we use only the content of github_issue_comments_versioned
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
github_issue_comments_versioned
contains both comments for PRs and Issues.
In unified schema issue_comments
contains only comments for issues and pull_request_comments
only for pull requests. To distinguish them we need this join.
github/store/db.go
Outdated
_, err = s.DB.ExecContext(ctx, fmt.Sprintf(`CREATE MATERIALIZED VIEW pull_request_comments AS | ||
SELECT c.repository_owner, c.repository_name, c.repository_owner || '/' || c.repository_name AS repository_full_name, | ||
issue_number AS pull_request_number, c.created_at, c.body, c.user_id, c.user_login, c.htmlurl AS html_url | ||
FROM github_issue_comments_versioned AS c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why can't we use only the content of github_pull_request_comments_versioned
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for myself: UNION
is missing here, will need to fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was fixed
@se7entyse7en Yes, I'll rebase and do something with the tests. But if you take a quick look at the changes to make sure you don't have any radical suggestions for improvements that would require rewriting everything, that would be nice. |
@smacker sure! I'll have a look 👍 |
github/store/db.go
Outdated
|
||
_, err := s.DB.ExecContext(ctx, fmt.Sprintf(`CREATE OR REPLACE VIEW organizations AS | ||
_, err := s.DB.ExecContext(ctx, "DROP MATERIALIZED VIEW IF EXISTS owners") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Premature optimization is the root of all evil, but could it be that using https://www.postgresql.org/docs/10/sql-refreshmaterializedview.html could be beneficial in terms of performances?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also once we have multiple providers we need to think about this, as we cannot make each provider's storer run these independently. We would end up with only the data of the last provider that finished if I didn't miss anything. I guess that we would need a single db committer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added drop here to be able to develop first of all. Also, we might want to keep it for the first (maybe unstable) versions to allow changing the views without migration. We will have to improve it at some point but I don't think we should do it right now.
single db committer
at the time of writing this code there was single commiter. It changed only in master. I'll fix it during rebase of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh. maybe you mean distributed mode here, not multiple downloaders that we use in our cli.
yes, there is no support for distributed mode yet. imo better implement it in a separate pr later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, I meant multiple downloaders in terms of multiple storers. If we implement a similar thing for BitBucket, for example, we will have a storer
of a downloader for the first provider that would run SetCurrent
which drops the materialized views and puts data, then a storer
of a downloader for the second provider tha would do the same. So, in the end, we will have only data for the second provider, unless I missed something.
But for sure not something to be addressed now. Just a note for the nex things to do to keep in mind. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we will have only data for the second provider
As we agreed in the DD, for now, we support only 1 provider.
For the problem with distributed mode I have created an issue: #86
github/store/db.go
Outdated
_, err := s.DB.ExecContext(ctx, "DROP MATERIALIZED VIEW IF EXISTS owners") | ||
if err != nil { | ||
return fmt.Errorf("failed to drop unified view owners: %v", err) | ||
} | ||
_, err = s.DB.ExecContext(ctx, fmt.Sprintf(`CREATE MATERIALIZED VIEW owners AS | ||
SELECT login, name | ||
FROM github_organizations_versioned WHERE %v = ANY(versions)`, v)) | ||
if err != nil { | ||
return fmt.Errorf("failed to create unified view owners: %v", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, if we can do something like what @dpordomingo proposed I think it would more readable, instead of having a wall of sql statements.
682f070
to
36ab9cc
Compare
@smacker I've seen that you've changed the name of the views in the tests. Does it need anything else? If yes I could refactor whatever is needed. |
We have new views like this one: https://github.com/src-d/metadata-retrieval/pull/45/files#diff-c52e0395f3662ad3a85b0fd2f30ecf39R77 |
35af75e
to
a69be0e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
Ref: #35 Signed-off-by: Maxim Sukharev <[email protected]>
Signed-off-by: Maxim Sukharev <[email protected]>
Signed-off-by: Maxim Sukharev <[email protected]>
Signed-off-by: Maxim Sukharev <[email protected]>
Signed-off-by: Maxim Sukharev <[email protected]>
Signed-off-by: Maxim Sukharev <[email protected]>
Signed-off-by: Maxim Sukharev <[email protected]>
Signed-off-by: Maxim Sukharev <[email protected]>
a69be0e
to
a3a7e1f
Compare
Should I merge it now? |
yes let's merge! 🎉 |
Fix: #35
This breaks backward compatibility. Don't merge yet.
Tests check only provider-specific schema for now.