Skip to content

[🙏]When using csv as input, what is source_column meaning? #923

Answered by natoverse
KylinMountain asked this question in Q&A
Discussion options

You must be logged in to vote

source_column is a metadata field to indicate where the document that this row contains came from (e.g., author, website, etc). It was added so that we could reconstruct information about the documents in UIs post-summarization, and is not used in the indexing process.

I think your suggestion is probably best - merge them into a single row so you have one document per. You should be able to achieve this using the GRAPHRAG_CHUNK_BY_COLUMNS setting, which we do a groupby on before splitting the chunks.

Replies: 1 comment 3 replies

Comment options

You must be logged in to vote
3 replies
@natoverse
Comment options

Answer selected by KylinMountain
@KylinMountain
Comment options

@YepJin
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants