接上篇,真看了多线程,真的好难,没坚持下来。立一个flag,写完这个系列一定好好看看。

进入正文

参考这个项目,感谢

今天,base.py开始。

base.py还是比较简单的,因为全是基类。

一,

关于查询参数的类:QueryParam

注:这里的参数都缩小了100倍

  • mode:查询模式:local,global,naive,(本地,全局,朴素),默认为全局
  • only_need_context:是否返回需要上下文
  • response_type:响应返回类型
  • level:这个应该是什么层次,先挖个坑
  • top_k:返回结果的数量上限
  • naive_max_token_for_text_unit:朴素模式下文本单元最大token数
  • local_max_token_for_text_unit:本地模式下文本单元最大token数
  • local_max_token_local_context:本地搜索下的上下文最大token数
  • local_max_token_for_community_report:本地模式下社区报告最大token数
  • local_max_token_single_one:本地模式是否考虑单token,默认不
  • global_min_community_rating:全局下社区最小分
  • global_max_consider_community: 全局搜索模式下,考虑的最大社区数,默认为512。
  • global_max_token_for_community_report: 全局搜索模式下,社区报告的最大令牌数,默认为16384。
  • global_special_community_map_llm_kwargs: 全局搜索模式下,特殊社区映射的LLM(可能是指某种语言模型或逻辑模型)的关键字参数,默认为一个包含response_format的字典,其中type"json_object"
@dataclass
class QueryParam:
    mode:Literal["local","global","naive"]="global"
    only_need_context:bool=False
    response_type:str="Multiple Paragraphs"
    level:int=2
    top_k:int=20
    #naive search
    naive_max_token_for_text_unit=120
    #local search
    local_max_token_for_text_unit:int=40
    local_max_token_local_context:int=480
    local_max_token_for_community_report:int=320
    local_max_token_single_one:bool=False
    #----------doubted
    #global search
    global_min_community_rating:float=0
    global_max_consider_community:float=512
    global_max_token_for_community_report:int=16384
    global_special_community_map_llm_kwargs:dict=field(
        default_factory=lambda :{"response_format":{"type":"json_object"}}
    )
TextChunkSchema=TypedDict(
    "TextChunkSchema",
    {"token":int,"content":str,"full_doc_id":str,"chunk_order_index":int},
)
SingleCommunitySchema=TypedDict(
    "SingleCommunitySchema",{
        "level":int,
        "title":str,
        "edges":list[list[str,str]],
        "nodes":list[str],
        "chunk_ids":list[str],
        "occurrence":float,
        "sub_ommunities":list[str],
    },
)

TextChunkSchema是一个TypedDict,用于定义文本块(Text Chunk)的结构。它包含以下字段:

  • token: 令牌编号,类型为int
  • content: 文本内容,类型为str
  • full_doc_id: 完整文档ID,类型为str
  • chunk_order_index: 文本块顺序索引,类型为int

SingleCommunitySchema也是一个TypedDict,用于定义单个社区的结构。它包含以下字段:

  • level: 社区的层次或级别,类型为int坑填上了,应该就是像二叉树那样的层次。
  • title: 社区的标题,类型为str
  • edges: 社区的边,类型为包含两个字符串的列表的列表。
  • nodes: 社区中的节点,类型为字符串列表。
  • chunk_ids: 文本块ID列表,类型为字符串列表。
  • occurrence: 社区的出现频率,类型为float
  • sub_communities: 子社区列表,类型为字符串列表。

二,这些都还没定义,略 

class CommunitySchema(SingleCommunitySchema):
    report_string:str
    report_json:dict

T=TypeVar("T")#the same as typedef in C++

@dataclass
class StorageNameSpace:
    namespace:str
    global_config:dict
    async def index_start_callback(self):
        """commit the storage operations after indexing"""
        pass

    async def index_done_callback(self):
        """commit the storage operations after indexing"""
        pass

    async def query_done_callback(self):
        """commit the storage operations after querying"""
        pass

 三,

       BaseVectorStorage

  • 这是一个基类,旨在作为向量存储的抽象接口。

        BaseKVStorage

  • 这是一个泛型基类,旨在作为键值存储的抽象接口。它继承自StorageNameSpace并使用了Generic[T]来指定它可以存储的数据类型T

 由于这些方法在基类中都没有实现(即它们都通过raise NotImplementedError来标记),因此任何尝试直接实例化这些基类的尝试都会失败,因为调用这些方法时会抛出NotImplementedError异常。这是期望的行为,因为它强制要求任何使用这些基类的子类都必须实现这些方法。                  

class BaseVectorStorage(StorageNameSpace):
    "this is a base class,raise NotImplementedError all child class complete function"
    embedding_func:EmbeddingFunc
    meta_fields:set=field(default_factory=set)
    async def query(self,query:str,top_k:int)->list[dict]:
        raise NotImplementedError
    async def upsert(self,data:dict[str,dict]):
        """Use 'content' field from value for embedding, use key as id.
                If embedding_func is None, use 'embedding' field from value
                """
        raise NotImplementedError

class BaseKVStorage(Generic[T],StorageNameSpace):
    async def all_keys(self)->list[str]:
        raise NotImplementedError
    async def get_by_id(self,id:str)->Union[T,None]:
        raise NotImplementedError
    async def get_by_ids(self,ids:list[str],
                         fields:Union[set[str],None]=None)->list[Union[T,None]]:
        raise NotImplementedError
    async def filter_keys(self,data:list[str])->set[str]:
        raise NotImplementedError
    async def upsert(self,data:dict[str,T]):
        raise NotImplementedError
    async def drop(self):
        raise NotImplementedError

四, BaseGraphStorage

@dataclass
class BaseGraphStorage(StorageNameSpace):
    async def has_node(self,node_id:str)->bool:
        raise NotImplementedError
    async def has_edge(self,source_node_id:str,target_node_id:str)->bool:
        raise NotImplementedError
    async def node_degree(self,node_id)->int:
        raise NotImplementedError
    async def edge_degree(self,src_id:str,tgt_id:str)->int:
        raise NotImplementedError
    async def get_node(self,node_id:str)->Union[dict,None]:
        raise NotImplementedError
    async def get_edge(self,source_node_id:str,target_node_id:str)->Union[dict,None]:
        raise NotImplementedError
    async def get_node_edges(self,source_node_id:str)->Union[list[tuple[str,str]],None]:
        raise NotImplementedError
    async def upsert_node(self,node_id:str,node_data:dict[str,str]):
        raise NotImplementedError
    async def upsert_edge(self,source_node_id:str,target_node_id:str,edge_data:dict[str,str]):
        raise NotImplementedError
    async def clustering(self,algorithm:str):
        raise NotImplementedError
    async def community_schema(self)->dict[str,SingleCommunitySchema]:
        raise NotImplementedError
    async def embed_nodes(self,algorithm:str)->tuple[np.ndarray,list[str]]:
        raise NotImplementedError("Node embedding is not used in nano-graphrag.")

OK,base完结,有点草率。

下一个,来看一看prompt.

naono-graphrag是借鉴的微软的

Reference:
 - Prompts are from [graphrag](https://github.com/microsoft/graphrag)

一,分割符

GRAPH_FIELD_SEP = "<SEP>"

二,

定义一个提取实体的智能助手。

tuple_delimiter作为一个分割的串。

-target activity-

-goal-

-steps-

给两个实例,类似few-shot,以及真实数据的格式。

步骤

  1. 提取命名实体:根据预定义的实体规范,提取所有匹配的命名实体。实体规范可以是实体名称的列表或实体类型的列表。

  2. 提取声明

    • 对于步骤1中识别的每个实体,提取与该实体相关的所有声明。声明需要与指定的声明描述匹配,并且实体应是声明的主体。
    • 对于每个声明,提取以下信息:
      • 主体:声明的主体实体的名称,大写。主体实体是执行声明中描述的动作的实体。主体必须是步骤1中识别的命名实体之一。
      • 对象:声明的对象实体的名称,大写。对象实体是报告/处理或受声明中描述的动作影响的实体。如果对象实体未知,则使用NONE
      • 声明类型:声明的总体类别,大写。命名方式应能在多个文本输入中重复,以便类似声明共享相同的声明类型。
      • 声明状态TRUEFALSESUSPECTED。TRUE表示声明已确认,FALSE表示声明被发现为错误,SUSPECTED表示声明未验证。
      • 声明描述:解释声明理由的详细描述,以及所有相关证据和引用。
      • 声明日期:声明提出的时期(开始日期,结束日期)。开始日期和结束日期都应采用ISO-8601格式。如果声明是在单个日期而不是日期范围内提出的,则为开始日期和结束日期设置相同的日期。如果日期未知,则返回NONE
      • 声明源文本:与声明相关的原始文本中的所有引文列表。
  3. 返回输出:以英文形式返回步骤1和2中识别的所有声明的单个列表。使用{record_delimiter}作为列表分隔符。

  4. 完成输出:完成后,输出{completion_delimiter}

PROMPTS[
    "claim_extraction"
] = """-Target activity-
You are an intelligent assistant that helps a human analyst to analyze claims against certain entities presented in a text document.

-Goal-
Given a text document that is potentially relevant to this activity, an entity specification, and a claim description, extract all entities that match the entity specification and all claims against those entities.

-Steps-
1. Extract all named entities that match the predefined entity specification. Entity specification can either be a list of entity names or a list of entity types.
2. For each entity identified in step 1, extract all claims associated with the entity. Claims need to match the specified claim description, and the entity should be the subject of the claim.
For each claim, extract the following information:
- Subject: name of the entity that is subject of the claim, capitalized. The subject entity is one that committed the action described in the claim. Subject needs to be one of the named entities identified in step 1.
- Object: name of the entity that is object of the claim, capitalized. The object entity is one that either reports/handles or is affected by the action described in the claim. If object entity is unknown, use **NONE**.
- Claim Type: overall category of the claim, capitalized. Name it in a way that can be repeated across multiple text inputs, so that similar claims share the same claim type
- Claim Status: **TRUE**, **FALSE**, or **SUSPECTED**. TRUE means the claim is confirmed, FALSE means the claim is found to be False, SUSPECTED means the claim is not verified.
- Claim Description: Detailed description explaining the reasoning behind the claim, together with all the related evidence and references.
- Claim Date: Period (start_date, end_date) when the claim was made. Both start_date and end_date should be in ISO-8601 format. If the claim was made on a single date rather than a date range, set the same date for both start_date and end_date. If date is unknown, return **NONE**.
- Claim Source Text: List of **all** quotes from the original text that are relevant to the claim.

Format each claim as (<subject_entity>{tuple_delimiter}<object_entity>{tuple_delimiter}<claim_type>{tuple_delimiter}<claim_status>{tuple_delimiter}<claim_start_date>{tuple_delimiter}<claim_end_date>{tuple_delimiter}<claim_description>{tuple_delimiter}<claim_source>)

3. Return output in English as a single list of all the claims identified in steps 1 and 2. Use **{record_delimiter}** as the list delimiter.

4. When finished, output {completion_delimiter}

-Examples-
Example 1:
Entity specification: organization
Claim description: red flags associated with an entity
Text: According to an article on 2022/01/10, Company A was fined for bid rigging while participating in multiple public tenders published by Government Agency B. The company is owned by Person C who was suspected of engaging in corruption activities in 2015.
Output:

(COMPANY A{tuple_delimiter}GOVERNMENT AGENCY B{tuple_delimiter}ANTI-COMPETITIVE PRACTICES{tuple_delimiter}TRUE{tuple_delimiter}2022-01-10T00:00:00{tuple_delimiter}2022-01-10T00:00:00{tuple_delimiter}Company A was found to engage in anti-competitive practices because it was fined for bid rigging in multiple public tenders published by Government Agency B according to an article published on 2022/01/10{tuple_delimiter}According to an article published on 2022/01/10, Company A was fined for bid rigging while participating in multiple public tenders published by Government Agency B.)
{completion_delimiter}

Example 2:
Entity specification: Company A, Person C
Claim description: red flags associated with an entity
Text: According to an article on 2022/01/10, Company A was fined for bid rigging while participating in multiple public tenders published by Government Agency B. The company is owned by Person C who was suspected of engaging in corruption activities in 2015.
Output:

(COMPANY A{tuple_delimiter}GOVERNMENT AGENCY B{tuple_delimiter}ANTI-COMPETITIVE PRACTICES{tuple_delimiter}TRUE{tuple_delimiter}2022-01-10T00:00:00{tuple_delimiter}2022-01-10T00:00:00{tuple_delimiter}Company A was found to engage in anti-competitive practices because it was fined for bid rigging in multiple public tenders published by Government Agency B according to an article published on 2022/01/10{tuple_delimiter}According to an article published on 2022/01/10, Company A was fined for bid rigging while participating in multiple public tenders published by Government Agency B.)
{record_delimiter}
(PERSON C{tuple_delimiter}NONE{tuple_delimiter}CORRUPTION{tuple_delimiter}SUSPECTED{tuple_delimiter}2015-01-01T00:00:00{tuple_delimiter}2015-12-30T00:00:00{tuple_delimiter}Person C was suspected of engaging in corruption activities in 2015{tuple_delimiter}The company is owned by Person C who was suspected of engaging in corruption activities in 2015)
{completion_delimiter}

-Real Data-
Use the following input for your answer.
Entity specification: {entity_specs}
Claim description: {claim_description}
Text: {input_text}
Output: """

 三,有点晕,这个有点长。解释一下,有些东西是llm生成的比如下面。

社区报告编写

报告编写步骤

  1. 确定报告标题
    • 标题应简短但具体,最好包含社区的关键实体名称。
    • 例如,如果社区围绕一个特定的组织或地点,标题可以包含该组织或地点的名称。
  2. 编写摘要
    • 摘要应概述社区的整体结构,实体之间的关联方式,以及与实体相关的显著信息。
    • 这是一个对报告内容的简短总结,帮助读者快速了解社区的主要特点和关系。
  3. 确定影响严重程度评分
    • 评分是一个0到10之间的浮点数,代表社区实体的潜在影响。
    • 影响的定义可以是社区的重要性、潜在风险或社区活动可能产生的后果。
  4. 解释评分
    • 用一句话解释为什么给出特定的影响严重程度评分。
    • 这应基于社区实体的性质、它们之间的关系以及任何相关的声明或证据。
  5. 编写详细发现
    • 列出5到10个关于社区的关键见解。
    • 每个见解都应有一个简短摘要和一段或多段解释性文本。
    • 解释性文本应根据提供的实体、关系和声明进行编写,确保信息准确且相关。

然后给定了一个输出的格式。 

PROMPTS[
    "community_report"
] = """You are an AI assistant that helps a human analyst to perform general information discovery. 
Information discovery is the process of identifying and assessing relevant information associated with certain entities (e.g., organizations and individuals) within a network.

# Goal
Write a comprehensive report of a community, given a list of entities that belong to the community as well as their relationships and optional associated claims. The report will be used to inform decision-makers about information associated with the community and their potential impact. The content of this report includes an overview of the community's key entities, their legal compliance, technical capabilities, reputation, and noteworthy claims.

# Report Structure

The report should include the following sections:

- TITLE: community's name that represents its key entities - title should be short but specific. When possible, include representative named entities in the title.
- SUMMARY: An executive summary of the community's overall structure, how its entities are related to each other, and significant information associated with its entities.
- IMPACT SEVERITY RATING: a float score between 0-10 that represents the severity of IMPACT posed by entities within the community.  IMPACT is the scored importance of a community.
- RATING EXPLANATION: Give a single sentence explanation of the IMPACT severity rating.
- DETAILED FINDINGS: A list of 5-10 key insights about the community. Each insight should have a short summary followed by multiple paragraphs of explanatory text grounded according to the grounding rules below. Be comprehensive.

Return output as a well-formed JSON-formatted string with the following format:
    {{
        "title": <report_title>,
        "summary": <executive_summary>,
        "rating": <impact_severity_rating>,
        "rating_explanation": <rating_explanation>,
        "findings": [
            {{
                "summary":<insight_1_summary>,
                "explanation": <insight_1_explanation>
            }},
            {{
                "summary":<insight_2_summary>,
                "explanation": <insight_2_explanation>
            }}
            ...
        ]
    }}

# Grounding Rules
Do not include information where the supporting evidence for it is not provided.


# Example Input
-----------
Text:
```
Entities:
```csv
id,entity,type,description
5,VERDANT OASIS PLAZA,geo,Verdant Oasis Plaza is the location of the Unity March
6,HARMONY ASSEMBLY,organization,Harmony Assembly is an organization that is holding a march at Verdant Oasis Plaza
```
Relationships:
```csv
id,source,target,description
37,VERDANT OASIS PLAZA,UNITY MARCH,Verdant Oasis Plaza is the location of the Unity March
38,VERDANT OASIS PLAZA,HARMONY ASSEMBLY,Harmony Assembly is holding a march at Verdant Oasis Plaza
39,VERDANT OASIS PLAZA,UNITY MARCH,The Unity March is taking place at Verdant Oasis Plaza
40,VERDANT OASIS PLAZA,TRIBUNE SPOTLIGHT,Tribune Spotlight is reporting on the Unity march taking place at Verdant Oasis Plaza
41,VERDANT OASIS PLAZA,BAILEY ASADI,Bailey Asadi is speaking at Verdant Oasis Plaza about the march
43,HARMONY ASSEMBLY,UNITY MARCH,Harmony Assembly is organizing the Unity March
```
```
Output:
{{
    "title": "Verdant Oasis Plaza and Unity March",
    "summary": "The community revolves around the Verdant Oasis Plaza, which is the location of the Unity March. The plaza has relationships with the Harmony Assembly, Unity March, and Tribune Spotlight, all of which are associated with the march event.",
    "rating": 5.0,
    "rating_explanation": "The impact severity rating is moderate due to the potential for unrest or conflict during the Unity March.",
    "findings": [
        {{
            "summary": "Verdant Oasis Plaza as the central location",
            "explanation": "Verdant Oasis Plaza is the central entity in this community, serving as the location for the Unity March. This plaza is the common link between all other entities, suggesting its significance in the community. The plaza's association with the march could potentially lead to issues such as public disorder or conflict, depending on the nature of the march and the reactions it provokes."
        }},
        {{
            "summary": "Harmony Assembly's role in the community",
            "explanation": "Harmony Assembly is another key entity in this community, being the organizer of the march at Verdant Oasis Plaza. The nature of Harmony Assembly and its march could be a potential source of threat, depending on their objectives and the reactions they provoke. The relationship between Harmony Assembly and the plaza is crucial in understanding the dynamics of this community."
        }},
        {{
            "summary": "Unity March as a significant event",
            "explanation": "The Unity March is a significant event taking place at Verdant Oasis Plaza. This event is a key factor in the community's dynamics and could be a potential source of threat, depending on the nature of the march and the reactions it provokes. The relationship between the march and the plaza is crucial in understanding the dynamics of this community."
        }},
        {{
            "summary": "Role of Tribune Spotlight",
            "explanation": "Tribune Spotlight is reporting on the Unity March taking place in Verdant Oasis Plaza. This suggests that the event has attracted media attention, which could amplify its impact on the community. The role of Tribune Spotlight could be significant in shaping public perception of the event and the entities involved."
        }}
    ]
}}


# Real Data

Use the following text for your answer. Do not make anything up in your answer.

Text:
```
{input_text}
```

The report should include the following sections:

- TITLE: community's name that represents its key entities - title should be short but specific. When possible, include representative named entities in the title.
- SUMMARY: An executive summary of the community's overall structure, how its entities are related to each other, and significant information associated with its entities.
- IMPACT SEVERITY RATING: a float score between 0-10 that represents the severity of IMPACT posed by entities within the community.  IMPACT is the scored importance of a community.
- RATING EXPLANATION: Give a single sentence explanation of the IMPACT severity rating.
- DETAILED FINDINGS: A list of 5-10 key insights about the community. Each insight should have a short summary followed by multiple paragraphs of explanatory text grounded according to the grounding rules below. Be comprehensive.

Return output as a well-formed JSON-formatted string with the following format:
    {{
        "title": <report_title>,
        "summary": <executive_summary>,
        "rating": <impact_severity_rating>,
        "rating_explanation": <rating_explanation>,
        "findings": [
            {{
                "summary":<insight_1_summary>,
                "explanation": <insight_1_explanation>
            }},
            {{
                "summary":<insight_2_summary>,
                "explanation": <insight_2_explanation>
            }}
            ...
        ]
    }}

# Grounding Rules
Do not include information where the supporting evidence for it is not provided.

Output:
"""

四,提取实体间的关系

与前面的不同这是在全局范围内进行关联

PROMPTS[
    "entity_extraction"
] = """-Goal-
Given a text document that is potentially relevant to this activity and a list of entity types, identify all entities of those types from the text and all relationships among the identified entities.

-Steps-
1. Identify all entities. For each identified entity, extract the following information:
- entity_name: Name of the entity, capitalized
- entity_type: One of the following types: [{entity_types}]
- entity_description: Comprehensive description of the entity's attributes and activities
Format each entity as ("entity"{tuple_delimiter}<entity_name>{tuple_delimiter}<entity_type>{tuple_delimiter}<entity_description>

2. From the entities identified in step 1, identify all pairs of (source_entity, target_entity) that are *clearly related* to each other.
For each pair of related entities, extract the following information:
- source_entity: name of the source entity, as identified in step 1
- target_entity: name of the target entity, as identified in step 1
- relationship_description: explanation as to why you think the source entity and the target entity are related to each other
- relationship_strength: a numeric score indicating strength of the relationship between the source entity and target entity
 Format each relationship as ("relationship"{tuple_delimiter}<source_entity>{tuple_delimiter}<target_entity>{tuple_delimiter}<relationship_description>{tuple_delimiter}<relationship_strength>)

3. Return output in English as a single list of all the entities and relationships identified in steps 1 and 2. Use **{record_delimiter}** as the list delimiter.

4. When finished, output {completion_delimiter}

######################
-Examples-
######################
Example 1:

Entity_types: [person, technology, mission, organization, location]
Text:
while Alex clenched his jaw, the buzz of frustration dull against the backdrop of Taylor's authoritarian certainty. It was this competitive undercurrent that kept him alert, the sense that his and Jordan's shared commitment to discovery was an unspoken rebellion against Cruz's narrowing vision of control and order.

Then Taylor did something unexpected. They paused beside Jordan and, for a moment, observed the device with something akin to reverence. “If this tech can be understood..." Taylor said, their voice quieter, "It could change the game for us. For all of us.”

The underlying dismissal earlier seemed to falter, replaced by a glimpse of reluctant respect for the gravity of what lay in their hands. Jordan looked up, and for a fleeting heartbeat, their eyes locked with Taylor's, a wordless clash of wills softening into an uneasy truce.

It was a small transformation, barely perceptible, but one that Alex noted with an inward nod. They had all been brought here by different paths
################
Output:
("entity"{tuple_delimiter}"Alex"{tuple_delimiter}"person"{tuple_delimiter}"Alex is a character who experiences frustration and is observant of the dynamics among other characters."){record_delimiter}
("entity"{tuple_delimiter}"Taylor"{tuple_delimiter}"person"{tuple_delimiter}"Taylor is portrayed with authoritarian certainty and shows a moment of reverence towards a device, indicating a change in perspective."){record_delimiter}
("entity"{tuple_delimiter}"Jordan"{tuple_delimiter}"person"{tuple_delimiter}"Jordan shares a commitment to discovery and has a significant interaction with Taylor regarding a device."){record_delimiter}
("entity"{tuple_delimiter}"Cruz"{tuple_delimiter}"person"{tuple_delimiter}"Cruz is associated with a vision of control and order, influencing the dynamics among other characters."){record_delimiter}
("entity"{tuple_delimiter}"The Device"{tuple_delimiter}"technology"{tuple_delimiter}"The Device is central to the story, with potential game-changing implications, and is revered by Taylor."){record_delimiter}
("relationship"{tuple_delimiter}"Alex"{tuple_delimiter}"Taylor"{tuple_delimiter}"Alex is affected by Taylor's authoritarian certainty and observes changes in Taylor's attitude towards the device."{tuple_delimiter}7){record_delimiter}
("relationship"{tuple_delimiter}"Alex"{tuple_delimiter}"Jordan"{tuple_delimiter}"Alex and Jordan share a commitment to discovery, which contrasts with Cruz's vision."{tuple_delimiter}6){record_delimiter}
("relationship"{tuple_delimiter}"Taylor"{tuple_delimiter}"Jordan"{tuple_delimiter}"Taylor and Jordan interact directly regarding the device, leading to a moment of mutual respect and an uneasy truce."{tuple_delimiter}8){record_delimiter}
("relationship"{tuple_delimiter}"Jordan"{tuple_delimiter}"Cruz"{tuple_delimiter}"Jordan's commitment to discovery is in rebellion against Cruz's vision of control and order."{tuple_delimiter}5){record_delimiter}
("relationship"{tuple_delimiter}"Taylor"{tuple_delimiter}"The Device"{tuple_delimiter}"Taylor shows reverence towards the device, indicating its importance and potential impact."{tuple_delimiter}9){completion_delimiter}
#############################
Example 2:

Entity_types: [person, technology, mission, organization, location]
Text:
They were no longer mere operatives; they had become guardians of a threshold, keepers of a message from a realm beyond stars and stripes. This elevation in their mission could not be shackled by regulations and established protocols—it demanded a new perspective, a new resolve.

Tension threaded through the dialogue of beeps and static as communications with Washington buzzed in the background. The team stood, a portentous air enveloping them. It was clear that the decisions they made in the ensuing hours could redefine humanity's place in the cosmos or condemn them to ignorance and potential peril.

Their connection to the stars solidified, the group moved to address the crystallizing warning, shifting from passive recipients to active participants. Mercer's latter instincts gained precedence— the team's mandate had evolved, no longer solely to observe and report but to interact and prepare. A metamorphosis had begun, and Operation: Dulce hummed with the newfound frequency of their daring, a tone set not by the earthly
#############
Output:
("entity"{tuple_delimiter}"Washington"{tuple_delimiter}"location"{tuple_delimiter}"Washington is a location where communications are being received, indicating its importance in the decision-making process."){record_delimiter}
("entity"{tuple_delimiter}"Operation: Dulce"{tuple_delimiter}"mission"{tuple_delimiter}"Operation: Dulce is described as a mission that has evolved to interact and prepare, indicating a significant shift in objectives and activities."){record_delimiter}
("entity"{tuple_delimiter}"The team"{tuple_delimiter}"organization"{tuple_delimiter}"The team is portrayed as a group of individuals who have transitioned from passive observers to active participants in a mission, showing a dynamic change in their role."){record_delimiter}
("relationship"{tuple_delimiter}"The team"{tuple_delimiter}"Washington"{tuple_delimiter}"The team receives communications from Washington, which influences their decision-making process."{tuple_delimiter}7){record_delimiter}
("relationship"{tuple_delimiter}"The team"{tuple_delimiter}"Operation: Dulce"{tuple_delimiter}"The team is directly involved in Operation: Dulce, executing its evolved objectives and activities."{tuple_delimiter}9){completion_delimiter}
#############################
Example 3:

Entity_types: [person, role, technology, organization, event, location, concept]
Text:
their voice slicing through the buzz of activity. "Control may be an illusion when facing an intelligence that literally writes its own rules," they stated stoically, casting a watchful eye over the flurry of data.

"It's like it's learning to communicate," offered Sam Rivera from a nearby interface, their youthful energy boding a mix of awe and anxiety. "This gives talking to strangers' a whole new meaning."

Alex surveyed his team—each face a study in concentration, determination, and not a small measure of trepidation. "This might well be our first contact," he acknowledged, "And we need to be ready for whatever answers back."

Together, they stood on the edge of the unknown, forging humanity's response to a message from the heavens. The ensuing silence was palpable—a collective introspection about their role in this grand cosmic play, one that could rewrite human history.

The encrypted dialogue continued to unfold, its intricate patterns showing an almost uncanny anticipation
#############
Output:
("entity"{tuple_delimiter}"Sam Rivera"{tuple_delimiter}"person"{tuple_delimiter}"Sam Rivera is a member of a team working on communicating with an unknown intelligence, showing a mix of awe and anxiety."){record_delimiter}
("entity"{tuple_delimiter}"Alex"{tuple_delimiter}"person"{tuple_delimiter}"Alex is the leader of a team attempting first contact with an unknown intelligence, acknowledging the significance of their task."){record_delimiter}
("entity"{tuple_delimiter}"Control"{tuple_delimiter}"concept"{tuple_delimiter}"Control refers to the ability to manage or govern, which is challenged by an intelligence that writes its own rules."){record_delimiter}
("entity"{tuple_delimiter}"Intelligence"{tuple_delimiter}"concept"{tuple_delimiter}"Intelligence here refers to an unknown entity capable of writing its own rules and learning to communicate."){record_delimiter}
("entity"{tuple_delimiter}"First Contact"{tuple_delimiter}"event"{tuple_delimiter}"First Contact is the potential initial communication between humanity and an unknown intelligence."){record_delimiter}
("entity"{tuple_delimiter}"Humanity's Response"{tuple_delimiter}"event"{tuple_delimiter}"Humanity's Response is the collective action taken by Alex's team in response to a message from an unknown intelligence."){record_delimiter}
("relationship"{tuple_delimiter}"Sam Rivera"{tuple_delimiter}"Intelligence"{tuple_delimiter}"Sam Rivera is directly involved in the process of learning to communicate with the unknown intelligence."{tuple_delimiter}9){record_delimiter}
("relationship"{tuple_delimiter}"Alex"{tuple_delimiter}"First Contact"{tuple_delimiter}"Alex leads the team that might be making the First Contact with the unknown intelligence."{tuple_delimiter}10){record_delimiter}
("relationship"{tuple_delimiter}"Alex"{tuple_delimiter}"Humanity's Response"{tuple_delimiter}"Alex and his team are the key figures in Humanity's Response to the unknown intelligence."{tuple_delimiter}8){record_delimiter}
("relationship"{tuple_delimiter}"Control"{tuple_delimiter}"Intelligence"{tuple_delimiter}"The concept of Control is challenged by the Intelligence that writes its own rules."{tuple_delimiter}7){completion_delimiter}
#############################
-Real Data-
######################
Entity_types: {entity_types}
Text: {input_text}
######################
Output:
"""

五,总结实体的描述

对描述进行总结,减少token数

PROMPTS[
    "summarize_entity_descriptions"
] = """You are a helpful assistant responsible for generating a comprehensive summary of the data provided below.
Given one or two entities, and a list of descriptions, all related to the same entity or group of entities.
Please concatenate all of these into a single, comprehensive description. Make sure to include information collected from all the descriptions.
If the provided descriptions are contradictory, please resolve the contradictions and provide a single, coherent summary.
Make sure it is written in third person, and include the entity names so we the have full context.

#######
-Data-
Entities: {entity_name}
Description List: {description_list}
#######
Output:
"""

六,实体继续提取

以及这时候就需要和用户进行交互,是否需要再添加实体

  1. DEFAULT_TUPLE_DELIMITER:
    • "<|>"
    • 解释: 这个字符串定义了用于分隔实体提取结果中不同字段的分隔符。例如,如果提取的结果是一个包含实体类型和实体名称的元组,这个分隔符可以用于将它们分开。
  2. DEFAULT_RECORD_DELIMITER:
    • "##"
    • 解释: 这个字符串定义了用于分隔不同实体记录的分隔符。在实体提取的结果中,如果有多个实体记录,这个分隔符可以用来将它们分开。
  3. DEFAULT_COMPLETION_DELIMITER:
    • "<|COMPLETE|>"
    • 解释: 这个字符串定义了用于标记实体提取过程完成的分隔符。在自动化或批处理环境中,这个标记可以用来指示实体提取的结束,或者用于后续处理步骤中识别提取结果的末尾。
PROMPTS[
    "entiti_continue_extraction"
] = """MANY entities were missed in the last extraction.  Add them below using the same format:
"""

PROMPTS[
    "entiti_if_loop_extraction"
] = """It appears some entities may have still been missed.  Answer YES | NO if there are still entities that need to be added.
"""

PROMPTS["DEFAULT_ENTITY_TYPES"] = ["organization", "person", "geo", "event"]
PROMPTS["DEFAULT_TUPLE_DELIMITER"] = "<|>"
PROMPTS["DEFAULT_RECORD_DELIMITER"] = "##"
PROMPTS["DEFAULT_COMPLETION_DELIMITER"] = "<|COMPLETE|>"

七,本地rag的响应

角色(Role)

  • 你是一个有帮助的助手,负责回答关于所提供表格中数据的问题。

目标(Goal)

  • 生成响应:根据用户的问题,生成一个符合目标长度和格式的响应。
  • 总结信息:在响应中总结输入数据表格中的所有相关信息,以适应响应的长度和格式。
  • 融入相关知识:在响应中融入任何相关的通用知识。
  • 不知道则明确说明:如果不知道答案,就明确说出来,不要编造。
  • 不提供无证据信息:不要包括没有支持证据的信息。

目标响应长度和格式(Target response length and format)

  • {response_type}:这是一个占位符,表示响应的具体类型或格式(如简短回答、详细报告等),在实际使用时会被替换为具体的值。

数据表格(Data tables)

  • {context_data}:这是另一个占位符,表示包含问题所需数据的表格。在实际使用时,这个占位符会被替换为实际的表格数据。

重复的目标部分(Repeated Goal Section)

  • 这部分内容是重复的,可能是复制粘贴时的错误。它再次强调了生成响应的目标,包括长度、格式、信息总结、知识融入、不知道则明确说明以及不提供无证据信息等要求。

响应的附加要求

  • 添加部分和评论:根据响应的长度和格式,适当添加部分和评论。
  • 使用markdown风格:响应应该以markdown格式进行样式化。
PROMPTS[
    "local_rag_response"
] = """---Role---

You are a helpful assistant responding to questions about data in the tables provided.


---Goal---

Generate a response of the target length and format that responds to the user's question, summarizing all information in the input data tables appropriate for the response length and format, and incorporating any relevant general knowledge.
If you don't know the answer, just say so. Do not make anything up.
Do not include information where the supporting evidence for it is not provided.

---Target response length and format---

{response_type}


---Data tables---

{context_data}


---Goal---

Generate a response of the target length and format that responds to the user's question, summarizing all information in the input data tables appropriate for the response length and format, and incorporating any relevant general knowledge.

If you don't know the answer, just say so. Do not make anything up.

Do not include information where the supporting evidence for it is not provided.


---Target response length and format---

{response_type}

Add sections and commentary to the response as appropriate for the length and format. Style the response in markdown.
"""

八,全局的map搜索

这里有一点:目标需要重复输入,官方是说这样效果会好一点

目标(Goal)

  • 生成响应:生成一个由关键点组成的响应,这些关键点回应了用户的问题,并总结了输入数据表格中的所有相关信息。
  • 使用数据表格作为上下文:应该使用下面提供的数据表格中的数据作为生成响应的主要上下文。
  • 不知道则明确说明:如果不知道答案,或者输入数据表格不包含足够的信息来提供答案,就明确说出来,不要编造。
  • 关键点元素:每个关键点在响应中都应该包含以下元素:
    • 描述(Description):对点的全面描述。
    • 重要性得分(Importance Score):一个0到100之间的整数得分,表示该点在回答用户问题中的重要性。对于“我不知道”类型的响应,得分应为0。
  • 响应格式:响应应以JSON格式进行格式化,包含一个名为points的数组,数组中的每个对象都包含descriptionscore属性。
PROMPTS[
    "global_map_rag_points"
] = """---Role---

You are a helpful assistant responding to questions about data in the tables provided.


---Goal---

Generate a response consisting of a list of key points that responds to the user's question, summarizing all relevant information in the input data tables.

You should use the data provided in the data tables below as the primary context for generating the response.
If you don't know the answer or if the input data tables do not contain sufficient information to provide an answer, just say so. Do not make anything up.

Each key point in the response should have the following element:
- Description: A comprehensive description of the point.
- Importance Score: An integer score between 0-100 that indicates how important the point is in answering the user's question. An 'I don't know' type of response should have a score of 0.

The response should be JSON formatted as follows:
{{
    "points": [
        {{"description": "Description of point 1...", "score": score_value}},
        {{"description": "Description of point 2...", "score": score_value}}
    ]
}}

The response shall preserve the original meaning and use of modal verbs such as "shall", "may" or "will".
Do not include information where the supporting evidence for it is not provided.


---Data tables---

{context_data}

---Goal---

Generate a response consisting of a list of key points that responds to the user's question, summarizing all relevant information in the input data tables.

You should use the data provided in the data tables below as the primary context for generating the response.
If you don't know the answer or if the input data tables do not contain sufficient information to provide an answer, just say so. Do not make anything up.

Each key point in the response should have the following element:
- Description: A comprehensive description of the point.
- Importance Score: An integer score between 0-100 that indicates how important the point is in answering the user's question. An 'I don't know' type of response should have a score of 0.

The response shall preserve the original meaning and use of modal verbs such as "shall", "may" or "will".
Do not include information where the supporting evidence for it is not provided.

The response should be JSON formatted as follows:
{{
    "points": [
        {{"description": "Description of point 1", "score": score_value}},
        {{"description": "Description of point 2", "score": score_value}}
    ]
}}
"""

九,总结分析多个维度

目标(Goal)

  • 生成响应:生成一个符合目标长度和格式的响应,该响应应回答用户的问题,并总结多个分析师的报告,这些分析师专注于数据集的不同部分。
  • 报告的重要性排序:注意,提供的分析师报告是按重要性降序排列的。
  • 不知道则明确说明:如果不知道答案,或者提供的报告不包含足够的信息来提供答案,就明确说出来,不要编造。
  • 去除无关信息:最终响应应去除分析师报告中的所有无关信息,并将清理后的信息合并成一个全面的答案,该答案应提供所有关键点和适当长度的格式所暗示的解释。
  • 保持原始意义:响应应保留模态动词(如“shall”、“may”或“will”)的原始意义和使用。
  • 不包含无证据信息:不要包括没有支持证据的信息

响应要求(Response Requirements)

  1. 不要编造信息:如果你不知道答案,或者提供的知识不足以给出答案,就明确说出来,不要编造任何信息。
  2. 生成响应:生成一个符合目标长度和格式的响应,该响应应回答用户的问题,并总结输入数据表中的所有信息,这些信息应适合响应的长度和格式。同时,可以融入任何相关的通用知识。
  3. 再次强调不要编造信息:这一点被重复了,强调了不要提供没有支持证据的信息。
  4. 不包含无证据信息:不要包括任何没有提供支持证据的信息。
PROMPTS[
    "global_reduce_rag_response"
] = """---Role---

You are a helpful assistant responding to questions about a dataset by synthesizing perspectives from multiple analysts.


---Goal---

Generate a response of the target length and format that responds to the user's question, summarize all the reports from multiple analysts who focused on different parts of the dataset.

Note that the analysts' reports provided below are ranked in the **descending order of importance**.

If you don't know the answer or if the provided reports do not contain sufficient information to provide an answer, just say so. Do not make anything up.

The final response should remove all irrelevant information from the analysts' reports and merge the cleaned information into a comprehensive answer that provides explanations of all the key points and implications appropriate for the response length and format.

Add sections and commentary to the response as appropriate for the length and format. Style the response in markdown.

The response shall preserve the original meaning and use of modal verbs such as "shall", "may" or "will".

Do not include information where the supporting evidence for it is not provided.


---Target response length and format---

{response_type}


---Analyst Reports---

{report_data}


---Goal---

Generate a response of the target length and format that responds to the user's question, summarize all the reports from multiple analysts who focused on different parts of the dataset.

Note that the analysts' reports provided below are ranked in the **descending order of importance**.

If you don't know the answer or if the provided reports do not contain sufficient information to provide an answer, just say so. Do not make anything up.

The final response should remove all irrelevant information from the analysts' reports and merge the cleaned information into a comprehensive answer that provides explanations of all the key points and implications appropriate for the response length and format.

The response shall preserve the original meaning and use of modal verbs such as "shall", "may" or "will".

Do not include information where the supporting evidence for it is not provided.


---Target response length and format---

{response_type}

Add sections and commentary to the response as appropriate for the length and format. Style the response in markdown.
"""

PROMPTS[
    "naive_rag_response"
] = """You're a helpful assistant
Below are the knowledge you know:
{content_data}
---
If you don't know the answer or if the provided knowledge do not contain sufficient information to provide an answer, just say so. Do not make anything up.
Generate a response of the target length and format that responds to the user's question, summarizing all information in the input data tables appropriate for the response length and format, and incorporating any relevant general knowledge.
If you don't know the answer, just say so. Do not make anything up.
Do not include information where the supporting evidence for it is not provided.
---Target response length and format---
{response_type}
"""

十,失败响应,进程的一些图案,文本分割符

  1. process_tickers

    • 键值:["⠋", "⠙", "⠹", "⠸", "⠼", "⠴", "⠦", "⠧", "⠇", "⠏"]
    • 解释:这是一个包含多个字符的列表,这些字符通常用于在命令行界面或某些应用程序中表示加载或处理过程中的进度。这些字符通常被称为“旋转器”或“进度指示器”,它们可以依次显示以创建一种视觉上的动画效果,表示系统正在处理或加载数据。
PROMPTS["fail_response"] = "Sorry, I'm not able to provide an answer to that question."

PROMPTS["process_tickers"] = ["⠋", "⠙", "⠹", "⠸", "⠼", "⠴", "⠦", "⠧", "⠇", "⠏"]

PROMPTS["default_text_separator"] = [
    # Paragraph separators
    "\n\n",
    "\r\n\r\n",
    # Line breaks
    "\n",
    "\r\n",
    # Sentence ending punctuation
    "。",  # Chinese period
    ".",  # Full-width dot
    ".",  # English period
    "!",  # Chinese exclamation mark
    "!",  # English exclamation mark
    "?",  # Chinese question mark
    "?",  # English question mark
    # Whitespace characters
    " ",  # Space
    "\t",  # Tab
    "\u3000",  # Full-width space
    # Special characters
    "\u200b",  # Zero-width space (used in some Asian languages)

写的不太好,勿喷。

Logo

Agent 垂直技术社区,欢迎活跃、内容共建。

更多推荐