[🙏]When using csv as input, what is source_column meaning? #923
Answered
by
natoverse
KylinMountain
asked this question in
Q&A
-
I have a conversation which may be splitted in several rows, but they have a same conversation ID specified by the column conversation id. So if I specify source_column to the column conversation id, will this make GraphRAG consider them as part of the same conversation? Will the entities in the conversation be extracted together? If not, what might solution for such case? Manually merge them into one row? |
Beta Was this translation helpful? Give feedback.
Answered by
natoverse
Aug 14, 2024
Replies: 1 comment 3 replies
-
@natoverse Would you help please answer this question, this is very important to me, thanks~~~~~ |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
source_column
is a metadata field to indicate where the document that this row contains came from (e.g., author, website, etc). It was added so that we could reconstruct information about the documents in UIs post-summarization, and is not used in the indexing process.I think your suggestion is probably best - merge them into a single row so you have one document per. You should be able to achieve this using the
GRAPHRAG_CHUNK_BY_COLUMNS
setting, which we do a groupby on before splitting the chunks.