Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: [Urgent] Failed to LoadSegment because index file index_null_offset is missing #39881

Open
1 task done
Andy6132024 opened this issue Feb 14, 2025 · 9 comments
Open
1 task done
Assignees
Labels
kind/bug Issues or changes related a bug triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@Andy6132024
Copy link

Andy6132024 commented Feb 14, 2025

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: v2.5.4
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka): kafka 
- SDK version(e.g. pymilvus v2.0.0rc2): 2.5.4
- OS(Ubuntu or CentOS): RockyLinux
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

After upgrading from Milvus v2.5.3 to v.2.5.4, one collection in particular got stuck at loading. Unfortunately this collection holds very important data and it has become a blocker for our devs.

The querynode logs indicate that an index file named index_null_offset was missing.

[2025/02/13 10:51:10.700 +00:00] [WARN] [cluster/worker.go:105] ["failed to call LoadSegments via grpc worker"] [traceID=e8990d64d2ae223d2f828c7fae6fa364] [workerID=2197] [error="At LoadSegment: Error in GetObjectSize[errcode:404, exception:, errmessage:No response body., params:params, bucket=milvus-bucket, object=file/index_files/455538097355005125/0/454664253305745956/455538097355005123/index_null_offset]"]

Searched existing issues and found one very similar issue but it seems this issue was not fully resolved in v2.5.4 (tried dropping and re-creating the indexes but no avail). Another related issue is this one which also reported missing some files in the index_files folder.

The collection which got stuck at loading leverages the new feature Full Text Search in v2.5 which uses BM25 algorithm to automatically convert raw texts into sparse vectors. Not sure if this info might help you identify the root cause. Here's the pseudo code of its schema,

id = FieldSchema(
  name="id",
  dtype=DataType.VARCHAR,
  max_length=36,
  is_primary=True,
  auto_id=False
)
vector = FieldSchema(
  name="vector",
  dtype=DataType.FLOAT_VECTOR,
  dim=1536,
)
year_month = FieldSchema(
  name="year_month",
  dtype=DataType.INT64,
)
text = FieldSchema(
  name="text",
  dtype=DataType.VARCHAR,
  max_length=65535,
  enable_analyzer=True,
  enable_match=True
)
sparse_vector = FieldSchema(
  name="sparse_vector",
  dtype=DataType.SPARSE_FLOAT_VECTOR
)

bm25_function = Function(
    name="text_bm25_emb",
    input_field_names=["text"],
    output_field_names=["sparse_vector"],
    function_type=FunctionType.BM25,
)

schema = CollectionSchema(
  fields=[id, vector, year_month, text, sparse_vector],
  description="test",
  enable_dynamic_field=True,
  partition_key_field="year_month",
)

Expected Behavior

Collection is loaded successfully after upgrading to v2.5.4

Steps To Reproduce

Created a collection in v2.5.3 or lower using the schema above and then upgrade Milvus to v2.5.4. Check if the collection can be loaded.

Milvus Log

No response

Anything else?

No response

@Andy6132024 Andy6132024 added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 14, 2025
@yanliang567
Copy link
Contributor

Trying to reproduce the issue in house

@melon1705
Copy link

https://github.com/milvus-io/milvus/issues/39889。I encountered the same problem, how to solve it?

@melon1705
Copy link

#39889 encountered the same problem, how to solve it?

@yanliang567
Copy link
Contributor

I did not reproduce the issue when upgrading Milvus from v2.5.3 to v2.5.4. What I did is

  1. deploy milvus v2.5.3 on k8s
  2. create a collection with schema above
  3. insert 2000 entities of fake data, build index and load
  4. verify search successfully
  5. upgrade Milvus to v2.5.4
  6. verify search successfully
  7. release and reload the collection and verify search successfully

@yanliang567
Copy link
Contributor

@Andy6132024 could you please attach the full milvus logs for investigation?
If you install Milvus with k8s, please refer this doc to export the whole Milvus logs.
If you install Milvus with docker-compose, please use docker-compose logs > milvus.log to export the logs.

/assign @Andy6132024

@yanliang567 yanliang567 added triage/needs-information Indicates an issue needs more information in order to work on it. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 14, 2025
@melon1705
Copy link

I did not reproduce the issue when upgrading Milvus from v2.5.3 to v2.5.4. What I did is

  1. deploy milvus v2.5.3 on k8s
  2. create a collection with schema above
  3. insert 2000 entities of fake data, build index and load
  4. verify search successfully
  5. upgrade Milvus to v2.5.4
  6. verify search successfully
  7. release and reload the collection and verify search successfully

It should be running for some time (usually more than a week) and suddenly an error is reported

@Andy6132024
Copy link
Author

Andy6132024 commented Feb 14, 2025

@Andy6132024 could you please attach the full milvus logs for investigation? If you install Milvus with k8s, please refer this doc to export the whole Milvus logs. If you install Milvus with docker-compose, please use docker-compose logs > milvus.log to export the logs.

/assign @Andy6132024

Hi @yanliang567, you need way more entities to be able to reproduce this issue. 2000 entities will only give you one segment. The collection which hit this issue has 4.7 million entities and about 20 segments.
Will try to find more logs but appreciate if you can ask the Milvus developers to look into it. This is a blocker for us.

/assign @yanliang567
cc @xiaofan-luan

@yanliang567
Copy link
Contributor

/assign @smellthemoon
could you please help to take a look? it seems the same error in #35741

@smellthemoon
Copy link
Contributor

same issue #39889. Let's discuss it over there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests

4 participants