Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

logic_form_plan.py 实现感觉不太好 #94

Open
tpoisonooo opened this issue Dec 4, 2024 · 5 comments
Open

logic_form_plan.py 实现感觉不太好 #94

tpoisonooo opened this issue Dec 4, 2024 · 5 comments

Comments

@tpoisonooo
Copy link
Contributor

tpoisonooo commented Dec 4, 2024

我用的 qwen2.5_7B_instruct

现存问题

默认的 prompt 没讲明白要 LLM 干啥,导致 default_lf_planner.py 输出一直是空白 []

proposal 修改

新的

你是一个程序员,请阅读 function_description 和 用户输入,调用不同的 function 。

## function_description
[
      {
          "functionName": "get_spo",
          "function_declaration": "get_spo(s=s_alias:entity_type[entity_name], p=p_alias:edge_type, o=o_alias:entity_type[entity_name], p.edge_type=value)",
          "description": "查找spo信息,s代表主体,o代表客体,表示为变量名:实体类型[实体名称],实体名称作为可选参数,当有明确的查询实体时需要给出;p代表谓词,即关系或属性,表示为变量名:边类型或属性类型;这里为每个变量都分配一个变量名,作为后续提及时的指代;注意,s、p、o不能在同一表达式中反复多次出现;当变量为前文指代的变量名是,变量名必须和指代的变量名一致,且只需给出变量名,实体类型仅在首次引入时给定"
      },
      {
          "functionName": "count",
          "function_declaration": "count_alias=count(alias)",
          "description": "统计节点个数,参数为指定待统计的节点集合,只能是get_spo中出现的变量名;count_alias作为变量名表示计算结果,只能是int类型,变量名可作为下文的指代"
      },
      {
          "functionName": "sum",
          "function_declaration": "sum(alias, num1, num2, ...)->sum_alias",
          "description": "数据求和,参数为指定待求和的集合,可以是数字也可以是前文中出现的变量名,其内容只能是数值类型;sum_alias作为变量名表示计算结果,只能是数值类型,变量名可作为下文的指代"
      },
      {
          "functionName": "sort",
          "function_declaration": "sort(set=alias, orderby=o_alias or count_alias or sum_alias, direction=min or max, limit=N)",
          "description": "对节点集合排序,set指定待排序的节点集合,只能是get_spo中出现的变量名;orderby指定排序的依据,为节点的关系或属性名称,若是前文提及过的,则用别名指代;direction指定排序的方向,只能是min(正序)或max(倒序)排列;limit为输出个数限制,为int类型;可作为最后的输出结果"
      },
      {
          "functionName": "get",
          "function_decl:aration": "get(alias)",
          "description": "返回指定的别名代表的信息,可以是实体、关系路径或get_spo中获取到的属性值;可作为最后的输出结果"
      }
]


## 示例
[
        {
            "Action": "吴京是谁",
            "answer": "Step1:查询吴京\nAction1:get_spo(s=s1:公众人物[吴京], p=p1, o=o1)\nOutput:输出s1\nAction2:get(s1)"
        },
        {
            "query": "30+6加上华为创始人在2024年的年龄是多少",
            "answer": "Step1:30+6 等于多少?\nAction1:sum(30,6)->sum1\nStep2:华为创始人是谁?\nAction2:get_spo(s=s2:企业[华为],p=p2:创始人,o=o2)\nStep3:华为创始人出生在什么年份?\nAction3:get_spo(s=o2,p=p3:出生年份,o=o3)\nStep4:华为创始人在2024年的年龄是多少?\nAction4:sum(2024,-o3)->sum4\nStep5:30+6的结果与华为创始人在2024年的年龄相加是多少?\nAction5:sum(sum1,sum4)->sum5\nStep6:输出sum5\nAction6:get(sum5)"
        }
]

## 用户输入
知识图谱的作用是啥,能用来处理基因问题么?

新 prompt 效果

image

@tpoisonooo
Copy link
Contributor Author

旧的 prompt

{
        "instruction": "",
    "function_description": "functionName is operator name;the function format is functionName(arg_name1=arg_value1,[args_name2=arg_value2, args_name3=arg_value3]),括号中为参数,被[]包含的参数为可选参数,未被[]包含的为必选参数",
    "function": [
      {
          "functionName": "get_spo",
          "function_declaration": "get_spo(s=s_alias:entity_type[entity_name], p=p_alias:edge_type, o=o_alias:entity_type[entity_name])",
          "description": "Find SPO information. 's' represents the subject, 'o' represents the object, and they are denoted as variable_name:entity_type[entity_name]. The entity name is an optional parameter and should be provided when there is a specific entity to query. 'p' represents the predicate, which can be a relationship or attribute, denoted as variable_name:edge_type_or_attribute_type. Each variable is assigned a unique variable name, which is used for reference in subsequent mentions. Note that 's', 'p', and 'o' should not appear repeatedly within the same expression; only one set of SPO should be queried at a time. When a variable is a reference to a previously mentioned variable name, the variable name must match the previously mentioned variable name, and only the variable name needs to be provided; the entity type is only given when it is first introduced."
      },
      {
          "functionName": "count",
          "function_declaration": "count(alias)->count_alias",
          "description": "Count the number of nodes. The parameter should be a specified set of nodes to count, and it can only be variable names that appear in the get_spo query. The variable name 'count_alias' represents the counting result, which must be of int type, and this variable name can be used for reference in subsequent mentions."
      },
      {
          "functionName": "sum",
          "function_declaration": "sum(alias, num1, num2, ...)->sum_alias",
          "description": "Calculate the sum of data. The parameter should be a specified set to sum, which can be either numbers or variable names mentioned earlier, and its content must be of numeric type. The variable name 'sum_alias' represents the result of the calculation, which must be of numeric type, and this variable name can be used for reference in subsequent mentions."      },
      {
          "functionName": "sort",
          "function_declaration": "sort(set=alias, orderby=o_alias or count_alias or sum_alias, direction=min or max, limit=N)",
          "description": "Sort a set of nodes. The 'set' parameter specifies the set of nodes to be sorted and can only be variable names that appear in the get_spo query. The 'orderby' parameter specifies the basis for sorting, which can be the relationship or attribute name of the nodes. If it has been mentioned earlier, an alias should be used. The 'direction' parameter specifies the sorting order, which can only be 'min' (ascending) or 'max' (descending). The 'limit' parameter specifies the limit on the number of output results and must be of int type. The sorted result can be used as the final output."      },
      {
          "functionName": "compare",
          "function_declaration": "compare(set=[alias1, alias2, ...], op=min|max)",
          "description": "Compare nodes or numeric values. The 'set' parameter specifies the set of nodes or values to be compared, which can be variable names that appear in the get_spo query or constants. The 'op' parameter specifies the comparison operation: 'min' to find the smallest and 'max' to find the largest."
      },
      {
          "functionName": "get",
          "function_decl:aration": "get(alias)",
          "description": "Return the information represented by a specified alias. This can be an entity, a relationship path, or an attribute value obtained in the get_spo query. It can be used as the final output result."
      }
    ],
    "cases": [
        {
            "query": "Which sports team for which Cristiano Ronaldo played in 2011 was founded last ?",
            "answer": "Step1:Which Sports Teams Cristiano Ronaldo Played for in 2011 ?
Action1:get_spo(s=s1:Player[Cristiano Ronaldo],p=p1:PlayedForIn2011Year,o=o1:SportsTeam)
Step2:In which year were these teams established ?
Action2:get_spo(s=o1,p=p2:FoundationYear,o=o2:Year)
Step3:Which team was founded last ?
Action3:sort(set=o1, orderby=o2, direction=max, limit=1)"
        },
        {
            "query": "Who was the first president of the association which published Journal of Psychotherapy Integration?",
            "answer": "Step1:Which association that publishes the Journal of Psychotherapy Integration ?
Action1:Journal(s=s1:Player[Psychotherapy Integration],p=p1:Publish,o=o1:Association)
Step2:Who was the first president of that specific association?
Action2:get_spo(s=o1,p=p2:FirstPresident,o=o2:Person)"
        },
        {
            "query": "When did the state where Pocahontas Mounds is located become part of the United States?",
            "answer": "Step1:Which State Where Pocahontas Mounds is Located ?
Action1:get_spo(s=s1:HistoricalSite[Pocahontas Mounds], p=p1:LocatedIn, o=o1:State)
Step2:When did this state become a part of the United States ?
Action2:get_spo(s=o1, p=p2:YearOfBecamingPartofTheUnitedStates, o=o2:Date)"
        },
        {
            "query": "Which of the two tornado outbreaks killed the most people?",
            "answer": "Step1:Which is the first tornado outbreaks ?
Action1:get_spo(s=s1:Event[Tornado Outbreak], p=p1:TheFirst, o=o1:Event)
Step2:Which is the second tornado outbreaks ?
Action2:get_spo(s=s2:Event[Tornado Outbreak], p=p2:TheSecond, o=o2:Event)
Step3:How many people died in the first tornado outbreak ?
Action3:get_spo(s=s1, p=p3:KilledPeopleNumber, o=o3:Number)
Step4:How many people died in the second tornado outbreak ?
Action4:get_spo(s=s2, p=p4:KilledPeopleNumber, o=o4:Number)
Step5:To compare the death toll between two tornado outbreaks to determine which one had more fatalities.
Action5:compare(set=[o3,o4], op=max)"
        }
    ],
    "output_format": "Only output words in answer, for examples: `Step`, `Action` content",
    "query": "知识图谱的作用是啥,能用来处理基因问题么?"
}   
    

旧 prompt 效果,指令不遵循。

image

@tpoisonooo
Copy link
Contributor Author

要不我来改改吧 .. cc @northmachine

@tpoisonooo
Copy link
Contributor Author

已跑通

image

@thundax-lyp
Copy link
Contributor

没能复现这个的现象。
如果是行业内的prompt, 可以申请在 /kag/solver/prompt下增加你的行业路径(YOUR_NAME),然后设置环境变量 KAG_PROMPT_BIZ_SCENE={YOUR_NAME}。

也可以在自己的代码中,在 SolverPipeline 中传入你的 KagReasoner,在 reasoner 中指定你的 lf_planner.

SolverPipeline(reasoner=DefaultReasoner(lf_planner=YourLFPlanner()))

@caszkgui
Copy link
Collaborator

caszkgui commented Dec 5, 2024

要不我来改改吧 .. cc @northmachine

thanks for your advice, you can create a pull request to merge your code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants