Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sourcegraph: multi-tenant zoekt #858

Closed
wants to merge 10 commits into from
Closed

Conversation

stefanhengl
Copy link
Member

@stefanhengl stefanhengl commented Nov 8, 2024

This updates webserver and sourcegraph-indexserver to support multi-tenancy.

The change is behind an ENV feature-flag. Apart from changes to the grpc message format for IndexOptions and ListResponse (-> requires update in Sourcegraph), this change is a noop if multi-tenancy is not enabled. Especially, other users of Zoekt should not be affected.

Key changes:

  • sourcegraph-indexserver
    • The list call now returns a map "tenant id -> list of repo ids". Before, we returned just a list of repo ids.
    • Iterate over map to get index options per tenant
    • Set index dir per tenant
    • Note: Changes are limited to sourcegraph-indexserver. "gitindex" and "builder" are not affected.
  • webserver
    • watcher: watch data/index/tenants for new tenant dirs
    • check if path of index matches tenant

Notes:

  • This design will not scale well beyond a hundred tenants, mostly because we iterate over all tenants to get IndexOptions. For better performance we might have to switch the communication and let Sourcegraph push index jobs to Zoekt.
  • The debug pages need more thought. I had disabled them initially but in this PR they are unchanged. Things like the reindex button on the indexserver debug page will not work, because we don't have the tenant information.

Future:

  • We might consider skipping entire tenant directories instead of checking shard by shard
  • Enable shard merging per tenant
  • Make "force reindex" work if the requests comes via the socket connection

Test plan:

  • new units tests
  • manual testing:
    • I ran this PR together with a multi-tenant instance of Sourcegraph and confirmed that I can run indexed search per tenant as expected.
    • I confirmed backward compatibility by running this PR against a Sourcegraph instance without multi-tenancy.

@cla-bot cla-bot bot added the cla-signed label Nov 8, 2024
@@ -0,0 +1,114 @@
package propagator
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is 100% copy&paste from Sourcegraph.

notify()

toAdd, toRemove := addOrRemove(watcher.WatchList(), tenant.ListIndexDirs(s.dir))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is pretty robust and simple. However, in the worst case we don't notice a new tenant for 1 minute. Once a tenant has been added to the watcher, new shards are picked up instantly.

Alternatively we could listen to events of data/index/tenants and add new tenants as they appear. I tried both versions and this one makes it more obvious that it reduces to the old code if tenancy is not enforced.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternatively as part of scan you can just always check the difference between WatchList and the dirs to watch? WatchList itself just holding a read-only mutex + creating in memory slice. ListIndexDirs has the same (or better) perf as the glob you do in scan.

@@ -131,7 +132,7 @@ type sourcegraphClient struct {
}

func (s *sourcegraphClient) List(ctx context.Context, indexed []uint32) (*SourcegraphListResult, error) {
repos, err := s.listRepoIDs(ctx, indexed)
reposIter, repos, err := s.listRepoIDs(ctx, indexed)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in List and s.listRepoIDs are probably the core of the whole change.

@stefanhengl stefanhengl requested review from a team and eseliger November 8, 2024 13:07
@stefanhengl stefanhengl marked this pull request as ready for review November 8, 2024 13:07
Copy link
Member

@eseliger eseliger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine from my non-zoekt-contributor side. Left a few smaller comments inline.

I think the level of "protection" to fat finger here is a bit lower than in some other components in sg/sg that we have where we for example go through the low-level GitserverFS interface to get a path and reject even returning it when no tenant is in context, and use mechanisms like tenantiterator and periodic routines to make sure tenant is in ctx and never allow to pass a tenant ID anywhere and convert that into the corresponding ctx.
But I think we can get closer to that level once we utilize sourcegraph components to do a push-based flow.

grpc/propagator/propagator.go Show resolved Hide resolved
cmd/zoekt-webserver/main.go Show resolved Hide resolved
cmd/zoekt-sourcegraph-indexserver/index.go Outdated Show resolved Hide resolved
Comment on lines 15 to 22
// ContextIndexDir returns a context and index dir for the given tenant ID.
func ContextIndexDir(tenantID int, repoDir string) (context.Context, string) {
if !EnforceTenant() {
// Default to tenant 1 if enforcement is disabled.
return tenanttype.WithTenant(context.Background(), 1), repoDir
}
return tenanttype.WithTenant(context.Background(), tenantID), filepath.Join(repoDir, TenantsDir, strconv.Itoa(tenantID))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is effectively a "give me tenant ID, receive context for it" method that we hard-don't use in sg/sg. Wondering if there's an easy enough way to work around this, or if the solution would be the push model we talked about.

At the very least, we should add a //🚨 SECURITY: type comment here to say to not misuse this :)

if !EnforceTenant() {
return key + "1"
}
tnt, err := tenanttype.FromContext(ctx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in sg/sg, we pprof log the callers of FromContext as missing_tenant to generate profiles where we can spot missing tenant errors quickly. Would that be worth adding here as well?

internal/tenant/index.go Outdated Show resolved Hide resolved
internal/tenant/query.go Outdated Show resolved Hide resolved
internal/tenant/query.go Outdated Show resolved Hide resolved
internal/tenant/query.go Show resolved Hide resolved
Copy link
Member

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice :) I left a couple questions/ comments.

I didn't see any new tests or updates at the zoekt-sourcegraph-indexserver level. It feels valuable to have some end-to-end checks with more than one tenant. What do you think?

metricResolveRevisionDuration.WithLabelValues("false").Observe(duration.Seconds())
tr.LazyPrintf("failed fetching options batch: %v", err)
tr.SetError()
// This does not scale well for large numbers of tenants with small numbers of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are already introducing the notion of "tenant ID" to the Sourcegraph List call, why not also introduce it to GetIndexOptions? That would provide a consistency in how we collect all the indexing information, instead of what we do now (only sometimes materializing tenant ID in the request/ response, but other times using ctx).

eval.go Outdated
@@ -134,13 +135,17 @@ func (o *SearchOptions) SetDefaults() {
}

func (d *indexData) Search(ctx context.Context, q query.Q, opts *SearchOptions) (sr *SearchResult, err error) {
var res SearchResult
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For other tenant-aware services, we'd tried to design it so there's a "choke point" that clearly enforces tenancy. Often this is single interface and file guarded by //🚨 SECURITY: comments. Right now, we are only enforcing tenancy on Search, but not on List. And there are other places where we access index data, for example to get metadata or ngram stats.

I wonder if we could push this down further to where index data is loaded/ read. Like reader.readTOC and reader.readIndexData? That would help establish a single place on the read path where tenancy is enforced, as close to the data access as possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Now we wrap indexdata (which is a Searcher) in a tenantAwareSearcher that handles access. We use a similar pattern in Sourcegraph. WDYT?

Copy link
Member

@keegancsmith keegancsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets chats over a zoom. But I'm slightly uneasy about how the idea of tenant has spread all over the codebase, in particular the many calls to the tenant package which then adjust behaviour. I'm alright with that in sourcegraph-indexserver, but I think we should make it less tenant specific in the rest of zoekt and maybe more encode what exactly has changed.

From a normal zoekt perspective I think two things have changed:

  • shards can appear in multiple directories
  • we optionally enforce that we only search a specific directory

I'm surprised that you had to make changes to the root zoekt module. It seems like being aware of file location previously was only something that the shards package ever did. I would expect that when reading in a shard there you could do your wrapping (or something else) to ensure we only search certain shards.

return indexData, nil
return &tenantAwareSearcher{d: indexData}, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a simplification: rather than making tenantAwareSearcher check EnforcementMode on every request, only wrap if enforcement mode is on. I think that better justifies creating this wrapper. Otherwise I'd prefer removing this wrapper and moving the checks into indexData.


// ListIndexDirs returns a list of index directories for all tenants. If tenant
// enforcement is disabled, the list is []string{indexDir}.
func ListIndexDirs(indexDir string) []string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems very fragile since if any non tenant dir appears in here other places may fail. Maybe each tenant dir should rather have something like a tenant- prefix?

Often glob is used underneath. I wonder if instead we just always do tenant-*/*.zoekt would be appropriate?

notify()

toAdd, toRemove := addOrRemove(watcher.WatchList(), tenant.ListIndexDirs(s.dir))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternatively as part of scan you can just always check the difference between WatchList and the dirs to watch? WatchList itself just holding a read-only mutex + creating in memory slice. ListIndexDirs has the same (or better) perf as the glob you do in scan.

@stefanhengl
Copy link
Member Author

Abandoning this in favor of #859

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants