You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Copyright 2023 OpenSPG Authors## Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except# in compliance with the License. You may obtain a copy of the License at## http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software distributed under the License# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express# or implied.importosfromkag.builder.component.readerimport (
DocxReader, PDFReader,
TXTReader, MarkDownReader,
CSVReader, JSONReader
)
fromkag.builder.component.splitterimportLengthSplitter, OutlineSplitterfromknext.builder.builder_chain_abcimportBuilderChainABCfromkag.builder.component.extractorimportKAGExtractorfromkag.builder.component.vectorizer.batch_vectorizerimportBatchVectorizerfromkag.builder.component.writerimportKGWriterfromkag.solver.logic.solver_pipelineimportSolverPipelineimportloggingfromkag.common.envimportinit_kag_configfile_path=os.path.dirname(__file__)
suffix_mapping= {
"docx": DocxReader,
"pdf": PDFReader,
"txt": TXTReader,
"md": MarkDownReader,
"json": JSONReader,
"csv": CSVReader,
}
classSanGuoDemoBuildChain(BuilderChainABC):
defbuild(self, **kwargs):
file_path=kwargs.get("file_path","a.docx")
suffix=file_path.split(".")[-1]
reader=suffix_mapping[suffix]()
ifreaderisNone:
raiseNotImplementedErrorproject_id=int(os.getenv("KAG_PROJECT_ID"))
splitter=LengthSplitter(split_length=1000, window_length=100)
vectorizer=BatchVectorizer()
extractor=KAGExtractor(project_id=project_id)
writer=KGWriter()
chain=reader>>splitter>>extractor>>vectorizer>>writerreturnchaindefbuildKG(test_file,**kwargs):
chain=SanGuoDemoBuildChain(file_path=test_file)
chain.invoke(test_file, max_workers=1)
if__name__=="__main__":
test_txt=os.path.join(file_path,"./data/三国测试.txt")
buildKG(test_txt)
我使用的默认schema,我的数据文档如下:
在建立的过程都没有问题,我的
indexer.py
文件也采用了用户手册中的例子,代码如下:抽取后的 neo4j 结果如下:
随后我使用手册中的实例提问代码进行提问:
根据该问题我得到的
sub_querys
和logic_forms
如下:chunk_retriever: 在
/KAG/kag/solver/logic/core_modules/lf_executor.py
文件中的self.chunk_retriever
进行召回的时候先对我的提问进行了ner任务,结果也没问题,得到如下结果:**但在使用
self.match_entities
来召回片段的时候,也就是调用下面这个代码去文本块中召回的时候,却召回不到任何东西:而
nontyped_nodes
也同样是召回不到任何东西。kg_retriever: 由于第一个是
get_spo
操作所以进入的是下面的逻辑:但上述
cur_spo_set
得到的结果也是None。所以得到的sub_answer为 "i don't know" ,随后去运行下面这句代码:但得到的
docs_with_score
仍为空。所以我第一次操作的结果为:很奇怪就在于neo4j图谱中明明有关羽这个节点的呀,而且文本chunk召回为什么召回不到呢?
The text was updated successfully, but these errors were encountered: