reference:

LangChain&DeepLearningAI

概述

LangChain是一个用于构建LLM应用的开源开发框架,目前有Python和JavaScript两种包,它们专注与组合和模块化

特点:

  • 有许多可与彼此或单独使用的个体组件

  • 用例众多,极易入门

重要部分

  • Models

    • LLMs:20+integrations
    • Chat Models
    • Text Embedding Models:10+integrations
  • Prompts

    • Prompt Templates
    • Output Parsers:5+implementations
      • Retry/fixing logic
    • Example Selectors:5+implementations
  • Indexes 将数据注入系统以与模型结合使用的方式

    • Document Loaders:50+implementations
    • Text Splitters:10+implementations
    • Vector stores:10+integrations
    • Retrievers:5+integrations/implementations
  • Chains

    • Prompt + LLM + Output parsing
    • Can be used as building blocks for longer chains
    • More application specific chains:20+types
  • Agents

    • Agent Types
    • Agent Toolkits

Models, Prompts and Parsers

以下代码在jupyter中运行

Chat API : OpenAI

由普通调用GPT 引出Langchain调用GPT

直接调用OpenAI的API

1
2
3
4
5
6
import openai
import os
#pip install python-dotenv
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file
openai.api_key = os.environ['OPENAI_API_KEY']
1
2
3
4
5
6
7
8
9
# 调用ChatGPT的辅助函数
def get_completion(prompt, model="gpt-3.5-turbo"):
messages = [{"role":"user", "content":prompt}]
response = openai.ChatCompletion.create(
model=model,
messages=messages,
temperature=0,
)
return response.choices[0].message["content"]
1
get_completion("What is 1+1?")

‘1+1 equals 2.’

为了推动模型,提示和解析器的Langchain抽象化,假设我们收到一封用户用其他语言写的电子邮件

1
2
3
4
5
6
7
8
customer_email = """
Arrr,I be fuming that me blender lid \
flew off and splattered me kitchen walls \
with smoothie!And to make matters worse,\
the warranty don't cover the cost of \
cleaning up me kitchen.I need yer help \
right now,matey!
"""

要求LLM以平静和尊重的口吻将文本翻译为美式英语

1
2
3
style = """American English \
in a calm and respectful tone.
"""
1
2
3
4
5
6
prompt = f"""Tanslate the text \
that is delimited by triple backticks
into a style that is {style}.
text: ```{customer_email}```
"""
print(prompt)

Tanslate the text that is delimited by triple backticks
into a style that is American English in a calm and respectful tone.
text: ```
Arrr,I be fuming that me blender lid flew off and splattered me kitchen walls with smoothie!And to make matters worse,the warranty don’t cover the cost of cleaning up me kitchen.I need yer help right now,matey!

1
2
response = get_completion(prompt)  # 调用GPT
response

‘I am quite frustrated that my blender lid unexpectedly flew off and made a mess of my kitchen walls with smoothie! Additionally, the warranty does not cover the expenses for cleaning up my kitchen. I kindly request your assistance at this moment, my friend.’

想象一下,如果有不同的客户用不同语言写邮件,英语法语德语等,需要生成一系列的提示来生成这样的翻译

Langchain如何更优雅的做到这一点?

使用Model将一些常用LLM API封装了起来

1
2
3
4
from langchain.chat_models import ChatOpenAI
# 这是langchain GPT API 的抽象
chat = ChatOpenAI()
chat

ChatOpenAI(cache=None, verbose=False, callbacks=None, callback_manager=None, tags=None, metadata=None, client=<class ‘openai.api_resources.chat_completion.ChatCompletion’>, model_name=‘gpt-3.5-turbo’, temperature=0.7, model_kwargs={}, openai_api_key=‘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’, openai_api_base=‘’, openai_organization=‘’, openai_proxy=‘’, request_timeout=None, max_retries=6, streaming=False, n=1, max_tokens=None, tiktoken_model_name=None)

以上都是可以在ChatOpenAI()中设置的参数,比如ChatOpenAI( temperature=0.0 )

定义模板

1
2
3
4
5
template_string = """Translate the text \
that is delimited by triple backticks \
into a style that is {style}. \
text: ```{text}```
"""

为了重复使用这个模板,我们导入Langchain的提示模板

1
2
from langchain.prompts import ChatPromptTemplate
prompt_template = ChatPromptTemplate.from_template(template_string)

可以使用 prompt_template.messages[0].prompt 查看prompt对象

可以使用 prompt_template.messages[0].prompt.input_variables 查看需要在模板插入的内容

定义在模板插入的内容

1
2
3
customer_style = """American English \
in a calm and respectful tone.
"""
1
2
3
4
5
6
7
8
customer_email = """
Arrr,I be fuming that me blender lid \
flew off and splattered me kitchen walls \
with smoothie!And to make matters worse,\
the warranty don't cover the cost of \
cleaning up me kitchen.I need yer help \
right now,matey!
"""

使用模板格式化

1
2
3
customer_messages = prompt_template.format_messages(
style=customer_style,
text=customer_email )
1
2
3
print(type(customer_messages))  # 返回一个列表
print(type(customer_messages[0])) #第一个就是HumanMessage模板内容
print(customer_messages[0])

<class ‘list’>
<class ‘langchain.schema.messages.HumanMessage’>
content=“Translate the text that is delimited by triple backticks into a style that is American English in a calm and respectful tone.\n. text: ```\nArrr,I be fuming that me blender lid flew off and splattered me kitchen walls with smoothie!And to make matters worse,the warranty don’t cover the cost of cleaning up me kitchen.I need yer help right now,matey!\n```\n” additional_kwargs={} example=False

调用GPT

1
2
customer_response = chat(customer_messages)
customer_response.content

“Arrr, I’m quite upset that my blender lid flew off and splattered my kitchen walls with smoothie! And to make matters worse, the warranty doesn’t cover the cost of cleaning up my kitchen. I would greatly appreciate your help at this moment, matey!”

顾客的语言可能是其他语言以及想要翻译成其他风格,只需要在模板格式化(format)时更改模板中相应内容即可

为什么要使用提示模板

为什么要使用Prompt模板而不是直接使用字符串?

随着构建复杂的应用,提示可能会变得非常长而且详细,所以模板是对提示的抽象,可以在构建复杂应用时提高复用性,比如下面这个例子,要让结果更精确,往往需要更细节的提示

image-20230817142700814

Langchain还内置了一些常用的提示,比如summarization,question answering, 连接数据库或其他API等,通过使用内置提示,可以快速构建应用而不用自己写提示

Output Parsing输出解析

使用模板能提示语言模型按照特定格式生成输出,比如使用特定关键词

下面的示例使用的时Langchain默认的模型思维链推理框架ReAct框架

思维链:能够给予模型一个思考过程,以达到更精确的结论(在后面的Agent用途很大)

image-20230817143439909

模板与解析器的结合可以很好的抽象指定LLM的输入,并且使解析器正确解析LLM的输出

Ouput Parser示例

LangCain解析LLM的Json输出

示例:从一个产品评论中提取信息并以JSON的格式,格式化输出

1
2
3
4
5
6
# 一个期望输出的示例
{
"gift": False, # 是否是别人送的礼物
"delivery_days": 5, #所需交付天数
"price_value": "pretty affordable" # 价格怎么样
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
customer_review = """\
This leaf blower is pretty amazing. It has four settings:\
candle blower, gentle breeze, windy city, and tornado. \
It arrived in two days, just in time for my wife's \
anniversary present. \
I think my wife liked it so much she was speechless. \
So far I've been the only one using it, and I've been \
using it every other morning to clear the leaves on our lawn. \
It's slightly more expensive than the other leaf blowers \
out there, but I think it's worth it for the extra features.
"""

review_template = """\
For the following text, extract the following information:

gift: Was the item purchased as a gift or present for someone else? \
Answer True if yes, False if not or unknown.

delivery_days: How many days did it take for the product \
to arrive? If this information is not found, output -1.

price_value: Extract any sentences about the value or price,\
and output them as a comma separated Python list.

Format the output as JSON with the following keys:
gift
delivery_days
price_value

text: {text}
"""
1
2
3
4
from langchain.prompts import ChatPromptTemplate

prompt_template = ChatPromptTemplate.from_template(review_template)
print(prompt_template)
1
2
3
4
messages = prompt_template.format_messages(text=customer_review)
chat = ChatOpenAI(temperature=0.0)
response = chat(messages)
print(response.content)

检查响应的type, type(response.content) 其实是一个字符串,如果用键值对索引方式会报错

将LLM输出字符串解析为Python字典

1
2
from langchain.output_parsers import ResponseSchema
from langchain.output_parsers import StructuredOutputParser
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
gift_schema = ResponseSchema(name="gift",
description="Was the item purchased\
as a gift for someone else? \
Answer True if yes,\
False if not or unknown.")
delivery_days_schema = ResponseSchema(name="delivery_days",
description="How many days\
did it take for the product\
to arrive? If this \
information is not found,\
output -1.")
price_value_schema = ResponseSchema(name="price_value",
description="Extract any\
sentences about the value or \
price, and output them as a \
comma separated Python list.")

response_schemas = [gift_schema,
delivery_days_schema,
price_value_schema]
1
2
3
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)
format_instructions = output_parser.get_format_instructions()
print(format_instructions)

新的格式化模板

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
review_template_2 = """\
For the following text, extract the following information:

gift: Was the item purchased as a gift for someone else? \
Answer True if yes, False if not or unknown.

delivery_days: How many days did it take for the product\
to arrive? If this information is not found, output -1.

price_value: Extract any sentences about the value or price,\
and output them as a comma separated Python list.

text: {text}

{format_instructions}
"""

prompt = ChatPromptTemplate.from_template(template=review_template_2)

messages = prompt.format_messages(text=customer_review,
format_instructions=format_instructions)
1
print(messages[0].content)
1
2
response = chat(messages)
print(response.content)

```json
{
“gift”: false,
“delivery_days”: “2”,
“price_value”: “It’s slightly more expensive than the other leaf blowers out there, but I think it’s worth it for the extra features.”
}

```

转化为字典:

1
2
output_dict = output_parser.parse(response.content)
output_dict

这样就可以用键查找到值了

1
output_dict.get('delivery_days')

LLMs

语言模型有两种类型,在LangChain中称为:

  • LLM:这是一种语言模型,它将字符串作为输入并返回字符串
  • ChatModels:这是一个语言模型,它将消息列表作为输入并返回消息

LLM 的输入/输出简单易懂——一个字符串。 但是 ChatModels 呢? 输入是 ChatMessage 列表,输出是单个 ChatMessage。 ChatMessage 有两个必需的组件:

  • content: This is the content of the message.
  • role: This is the role of the entity from which the ChatMessage is coming from.

LangChain提供了几个对象来方便区分不同的角色:

  • HumanMessage: A ChatMessage coming from a human/user.
  • AIMessage: A ChatMessage coming from an AI/assistant.
  • SystemMessage: A ChatMessage coming from the system.
  • FunctionMessage: A ChatMessage coming from a function call.

如果这些角色听起来都不合适,还有一个 ChatMessage 类,您可以在其中手动指定角色。

LangChain 为两者提供了标准接口,但了解这种差异对于为给定语言模型构建提示很有用。

LangChain提供的标准接口有两种方法:

  • predict: 接受一个字符串,返回一个字符串
  • predict_messages: 接收消息列表,返回消息。

让我们看看如何使用这些不同类型的模型和这些不同类型的输入。 首先,我们导入一个 LLM 和一个 ChatModel。

1
2
3
4
5
6
7
8
9
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI

llm = OpenAI()
chat_model = ChatOpenAI()

llm.predict("hello")

chat_model.predict("hi!")

OpenAI 和 ChatOpenAI 对象基本上只是配置对象。 您可以使用温度等参数初始化它们,然后传递它们。

接下来,让我们使用预测方法来运行字符串输入。

1
2
3
4
5
6
text = "What would be a good company name \
for a company that makes colorful socks?"

llm.predict(text)

chat_model.predict(text)

最后,让我们使用 Predict_messages 方法来运行消息列表。

1
2
3
4
5
6
7
8
from langchain.schema import HumanMessage
text = "What would be a good company name for a company \
that makes colorful socks?"
messages = [HumanMessage(content=text)]

llm.predict_messages(messages)

chat_model.predict_messages(messages)

Prompt templates

以下是官网对于template的说法

大多数 LLMs 应用不会将用户输入直接传递到 LLMs 。 通常,他们会将用户输入添加到较大的文本中,称为提示模板,该文本提供有关当前特定任务的附加上下文。

在前面的示例中,我们传递给模型的文本包含生成公司名称的指令。 对于我们的应用程序,如果用户只需提供公司/产品的描述,而不必担心提供模型说明,那就太好了。

PromptTemplates 正好可以帮助解决这个问题! 它们捆绑了从用户输入到完全格式化的提示的所有逻辑。 这可以非常简单地开始 - 例如,生成上述字符串的提示就是:

1
2
3
4
5
6
7
8
9
10
from langchain.prompts import PromptTemplate

prompt = PromptTemplate.from_template("What is a good name for a company that makes {product}?")
# 或者用这种方式
# prompt = PromptTemplate(
# input_variables=["product"],
# template="What is a good name for a company that makes {product}?")


prompt.format(product="colorful socks")

What is a good name for a company that makes colorful socks?

PromptTemplates 还可用于生成消息列表。 在这种情况下,提示不仅包含有关内容的信息,还包含每条消息(其角色、在列表中的位置等)。这里,最常用的是 ChatPromptTemplate 是 ChatMessageTemplates 的列表。 每个 ChatMessageTemplate 都包含有关如何格式化该 ChatMessage 的说明 - 它的角色,以及它的内容。 让我们看看下面这个:

1
2
3
4
5
6
7
8
9
10
11
from langchain.prompts.chat import ChatPromptTemplate

template = "You are a helpful assistant that translates {input_language} to {output_language}."
human_template = "{text}"

chat_prompt = ChatPromptTemplate.from_messages([
("system", template),
("human", human_template),
])

chat_prompt.format_messages(input_language="English", output_language="French", text="I love programming.")

ChatPromptTemplate 还可以通过其他方式构建 - 有关更多详细信息,请参阅提示部分。