sourcegraph: multi-tenant zoekt #858

stefanhengl · 2024-11-08T11:27:36Z

This updates webserver and sourcegraph-indexserver to support multi-tenancy.

The change is behind an ENV feature-flag. Apart from changes to the grpc message format for IndexOptions and ListResponse (-> requires update in Sourcegraph), this change is a noop if multi-tenancy is not enabled. Especially, other users of Zoekt should not be affected.

Key changes:

sourcegraph-indexserver
- The list call now returns a map "tenant id -> list of repo ids". Before, we returned just a list of repo ids.
- Iterate over map to get index options per tenant
- Set index dir per tenant
- Note: Changes are limited to sourcegraph-indexserver. "gitindex" and "builder" are not affected.
webserver
- watcher: watch data/index/tenants for new tenant dirs
- check if path of index matches tenant

Notes:

This design will not scale well beyond a hundred tenants, mostly because we iterate over all tenants to get IndexOptions. For better performance we might have to switch the communication and let Sourcegraph push index jobs to Zoekt.
The debug pages need more thought. I had disabled them initially but in this PR they are unchanged. Things like the reindex button on the indexserver debug page will not work, because we don't have the tenant information.

Future:

We might consider skipping entire tenant directories instead of checking shard by shard
Enable shard merging per tenant
Make "force reindex" work if the requests comes via the socket connection

Test plan:

new units tests
manual testing:
- I ran this PR together with a multi-tenant instance of Sourcegraph and confirmed that I can run indexed search per tenant as expected.
- I confirmed backward compatibility by running this PR against a Sourcegraph instance without multi-tenancy.

stefanhengl · 2024-11-08T12:38:14Z

grpc/propagator/propagator.go

@@ -0,0 +1,114 @@
+package propagator


This is 100% copy&paste from Sourcegraph.

stefanhengl · 2024-11-08T12:44:26Z

shards/watcher.go

 				notify()

+				toAdd, toRemove := addOrRemove(watcher.WatchList(), tenant.ListIndexDirs(s.dir))


I think this is pretty robust and simple. However, in the worst case we don't notice a new tenant for 1 minute. Once a tenant has been added to the watcher, new shards are picked up instantly.

Alternatively we could listen to events of data/index/tenants and add new tenants as they appear. I tried both versions and this one makes it more obvious that it reduces to the old code if tenancy is not enforced.

alternatively as part of scan you can just always check the difference between WatchList and the dirs to watch? WatchList itself just holding a read-only mutex + creating in memory slice. ListIndexDirs has the same (or better) perf as the glob you do in scan.

stefanhengl · 2024-11-08T12:49:38Z

cmd/zoekt-sourcegraph-indexserver/sg.go

@@ -131,7 +132,7 @@ type sourcegraphClient struct {
 }

 func (s *sourcegraphClient) List(ctx context.Context, indexed []uint32) (*SourcegraphListResult, error) {
-	repos, err := s.listRepoIDs(ctx, indexed)
+	reposIter, repos, err := s.listRepoIDs(ctx, indexed)


The changes in List and s.listRepoIDs are probably the core of the whole change.

eseliger

Seems fine from my non-zoekt-contributor side. Left a few smaller comments inline.

I think the level of "protection" to fat finger here is a bit lower than in some other components in sg/sg that we have where we for example go through the low-level GitserverFS interface to get a path and reject even returning it when no tenant is in context, and use mechanisms like tenantiterator and periodic routines to make sure tenant is in ctx and never allow to pass a tenant ID anywhere and convert that into the corresponding ctx.
But I think we can get closer to that level once we utilize sourcegraph components to do a push-based flow.

grpc/propagator/propagator.go

cmd/zoekt-webserver/main.go

cmd/zoekt-sourcegraph-indexserver/protos/sourcegraph/zoekt/configuration/v1/configuration.proto

cmd/zoekt-sourcegraph-indexserver/index.go

eseliger · 2024-11-08T14:35:33Z

internal/tenant/index.go

+// ContextIndexDir returns a context and index dir for the given tenant ID.
+func ContextIndexDir(tenantID int, repoDir string) (context.Context, string) {
+	if !EnforceTenant() {
+		// Default to tenant 1 if enforcement is disabled.
+		return tenanttype.WithTenant(context.Background(), 1), repoDir
+	}
+	return tenanttype.WithTenant(context.Background(), tenantID), filepath.Join(repoDir, TenantsDir, strconv.Itoa(tenantID))
+}


This is effectively a "give me tenant ID, receive context for it" method that we hard-don't use in sg/sg. Wondering if there's an easy enough way to work around this, or if the solution would be the push model we talked about.

At the very least, we should add a //🚨 SECURITY: type comment here to say to not misuse this :)

eseliger · 2024-11-08T14:36:42Z

internal/tenant/index.go

+	if !EnforceTenant() {
+		return key + "1"
+	}
+	tnt, err := tenanttype.FromContext(ctx)


in sg/sg, we pprof log the callers of FromContext as missing_tenant to generate profiles where we can spot missing tenant errors quickly. Would that be worth adding here as well?

internal/tenant/index.go

internal/tenant/query.go

cmd/zoekt-sourcegraph-indexserver/sg.go

jtibshirani

Nice :) I left a couple questions/ comments.

I didn't see any new tests or updates at the zoekt-sourcegraph-indexserver level. It feels valuable to have some end-to-end checks with more than one tenant. What do you think?

jtibshirani · 2024-11-08T19:07:49Z

cmd/zoekt-sourcegraph-indexserver/sg.go

-				metricResolveRevisionDuration.WithLabelValues("false").Observe(duration.Seconds())
-				tr.LazyPrintf("failed fetching options batch: %v", err)
-				tr.SetError()
+		// This does not scale well for large numbers of tenants with small numbers of


Since we are already introducing the notion of "tenant ID" to the Sourcegraph List call, why not also introduce it to GetIndexOptions? That would provide a consistency in how we collect all the indexing information, instead of what we do now (only sometimes materializing tenant ID in the request/ response, but other times using ctx).

jtibshirani · 2024-11-08T19:28:49Z

eval.go

@@ -134,13 +135,17 @@ func (o *SearchOptions) SetDefaults() {
 }

 func (d *indexData) Search(ctx context.Context, q query.Q, opts *SearchOptions) (sr *SearchResult, err error) {
+	var res SearchResult


For other tenant-aware services, we'd tried to design it so there's a "choke point" that clearly enforces tenancy. Often this is single interface and file guarded by //🚨 SECURITY: comments. Right now, we are only enforcing tenancy on Search, but not on List. And there are other places where we access index data, for example to get metadata or ngram stats.

I wonder if we could push this down further to where index data is loaded/ read. Like reader.readTOC and reader.readIndexData? That would help establish a single place on the read path where tenancy is enforced, as close to the data access as possible.

Good point. Now we wrap indexdata (which is a Searcher) in a tenantAwareSearcher that handles access. We use a similar pattern in Sourcegraph. WDYT?

keegancsmith

Lets chats over a zoom. But I'm slightly uneasy about how the idea of tenant has spread all over the codebase, in particular the many calls to the tenant package which then adjust behaviour. I'm alright with that in sourcegraph-indexserver, but I think we should make it less tenant specific in the rest of zoekt and maybe more encode what exactly has changed.

From a normal zoekt perspective I think two things have changed:

shards can appear in multiple directories
we optionally enforce that we only search a specific directory

I'm surprised that you had to make changes to the root zoekt module. It seems like being aware of file location previously was only something that the shards package ever did. I would expect that when reading in a shard there you could do your wrapping (or something else) to ensure we only search certain shards.

keegancsmith · 2024-11-13T10:20:04Z

read.go

-	return indexData, nil
+	return &tenantAwareSearcher{d: indexData}, nil


a simplification: rather than making tenantAwareSearcher check EnforcementMode on every request, only wrap if enforcement mode is on. I think that better justifies creating this wrapper. Otherwise I'd prefer removing this wrapper and moving the checks into indexData.

keegancsmith · 2024-11-13T10:37:07Z

internal/tenant/index.go

+
+// ListIndexDirs returns a list of index directories for all tenants. If tenant
+// enforcement is disabled, the list is []string{indexDir}.
+func ListIndexDirs(indexDir string) []string {


this seems very fragile since if any non tenant dir appears in here other places may fail. Maybe each tenant dir should rather have something like a tenant- prefix?

Often glob is used underneath. I wonder if instead we just always do tenant-*/*.zoekt would be appropriate?

keegancsmith · 2024-11-13T10:52:24Z

shards/watcher.go

 				notify()

+				toAdd, toRemove := addOrRemove(watcher.WatchList(), tenant.ListIndexDirs(s.dir))


alternatively as part of scan you can just always check the difference between WatchList and the dirs to watch? WatchList itself just holding a read-only mutex + creating in memory slice. ListIndexDirs has the same (or better) perf as the glob you do in scan.

stefanhengl · 2024-11-15T11:29:52Z

Abandoning this in favor of #859

tenant package

45a4410

cla-bot bot added the cla-signed label Nov 8, 2024

stefanhengl commented Nov 8, 2024

View reviewed changes

grpc/propagator/propagator.go

@@ -0,0 +1,114 @@

package propagator

Copy link

Member Author

stefanhengl Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is 100% copy&paste from Sourcegraph.

stefanhengl commented Nov 8, 2024

View reviewed changes

stefanhengl added 2 commits November 8, 2024 14:03

indexserver

16f8ad8

webserver

14adde0

stefanhengl force-pushed the sh/multitenant-zoekt branch from 5e87584 to 14adde0 Compare November 8, 2024 13:03

stefanhengl requested review from a team and eseliger November 8, 2024 13:07

stefanhengl marked this pull request as ready for review November 8, 2024 13:07

eseliger approved these changes Nov 8, 2024

View reviewed changes

janhartman reviewed Nov 8, 2024

View reviewed changes

cmd/zoekt-sourcegraph-indexserver/sg.go Outdated Show resolved Hide resolved

jtibshirani reviewed Nov 8, 2024

View reviewed changes

stefanhengl added 7 commits November 11, 2024 11:27

TenantId -> TenantID

1ba4a90

Match -> IsTenantPath

8561e21

files -> fds

496a655

wrap indexdata in tenant aware searcher

2dbb945

add more tests

0af8a9f

add goroutine tagger

134e5ae

pprof log missing tenants

3e094ae

keegancsmith reviewed Nov 13, 2024

View reviewed changes

stefanhengl closed this Nov 15, 2024

jtibshirani mentioned this pull request Nov 15, 2024

sourcegraph: multi-tenant Zoekt #859

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sourcegraph: multi-tenant zoekt #858

sourcegraph: multi-tenant zoekt #858

stefanhengl commented Nov 8, 2024 •

edited

Loading

stefanhengl Nov 8, 2024

stefanhengl Nov 8, 2024

keegancsmith Nov 13, 2024

stefanhengl Nov 8, 2024

eseliger left a comment

eseliger Nov 8, 2024

eseliger Nov 8, 2024

jtibshirani left a comment

jtibshirani Nov 8, 2024

jtibshirani Nov 8, 2024

stefanhengl Nov 12, 2024

keegancsmith left a comment

keegancsmith Nov 13, 2024

keegancsmith Nov 13, 2024

keegancsmith Nov 13, 2024

stefanhengl commented Nov 15, 2024

		notify()

		toAdd, toRemove := addOrRemove(watcher.WatchList(), tenant.ListIndexDirs(s.dir))

		return indexData, nil
		return &tenantAwareSearcher{d: indexData}, nil

sourcegraph: multi-tenant zoekt #858

sourcegraph: multi-tenant zoekt #858

Conversation

stefanhengl commented Nov 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eseliger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jtibshirani left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keegancsmith left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stefanhengl commented Nov 15, 2024

stefanhengl commented Nov 8, 2024 •

edited

Loading