RAG（1）：管道Pipeline

基础的RAG Pipeline

# 设置RGA基本管道

from llama_index import SimpleDirectoryReader

from llama_index import Document

from llama_index import VectorStoreIndex

from llama_index import ServiceContext

from llama_index.llms import OpenAI

# 加载文档documents

documents = SimpleDirectoryReader(

    input_files=["./eBook-How-to-Build-a-Career-in-AI.pdf"]

).load_data()

# 分块chunk

document = Document(text="\n\n".join([doc.text for doc in documents]))

# 编码embedding

llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)

service_context = ServiceContext.from_defaults(

    llm=llm, embed_model="local:BAAI/bge-small-en-v1.5"

)

# 转为索引index

index = VectorStoreIndex.from_documents([document],

                                        service_context=service_context)

# 从索引中获取Query

query_engine = index.as_query_engine()

# 发送请求response

response = query_engine.query(

    "What are steps to take when finding projects to build your experience?"

)

通过答案相关性、上下文相关性和立足性进行评估

# 对RAG管道进行评估

# 创建进行评估的问题

eval_questions = []

with open('eval_questions.txt', 'r') as file:

    for line in file:

        # Remove newline character and convert to integer

        item = line.strip()

        print(item)

        eval_questions.append(item)

# You can try your own question:

new_question = "What is the right AI job for me?"

eval_questions.append(new_question)

# 初始化数据库和评估模块

from trulens_eval import Tru

tru = Tru()

tru.reset_database()

# recorder记录的三项评估要求

from utils import get_prebuilt_trulens_recorder

tru_recorder = get_prebuilt_trulens_recorder(query_engine,

                                             app_id="Direct Query Engine")

# 记录器进行评估

with tru_recorder as recording:

    for question in eval_questions:

        response = query_engine.query(question)

# 利用ID来实现跟踪、查看结果

records, feedback = tru.get_records_and_feedback(app_ids=[])

句窗检索Sentence Window retrieval

# 使用其他检索方式来提高匹配精度

# 1.句窗检索

from utils import build_sentence_window_index

from utils import get_sentence_window_query_engine

## 创建索引

sentence_index = build_sentence_window_index(

    document,

    llm,

    embed_model="local:BAAI/bge-small-en-v1.5",

    save_dir="sentence_index"

)

## 从索引中获取Query

sentence_window_engine = get_sentence_window_query_engine(sentence_index)

## 设置记录器进行评估，设置问答的例子

tru.reset_database()

tru_recorder_sentence_window = get_prebuilt_trulens_recorder(

    sentence_window_engine,

    app_id = "Sentence Window Query Engine"

)

## 运行句窗检索器

for question in eval_questions:

    with tru_recorder_sentence_window as recording:

        response = sentence_window_engine.query(question)

        print(question)

        print(str(response))

## 查看结果榜单

tru.get_leaderboard(app_ids=[])

## 通过仪表盘显示

tru.run_dashboard()

自动合并检索Auto-merging retrieval

from utils import build_automerging_index

# 2.自动更新检索

## 创建索引

automerging_index = build_automerging_index(

    documents,

    llm,

    embed_model="local:BAAI/bge-small-en-v1.5",

    save_dir="merging_index"

)

## 从索引中获取Query

automerging_query_engine = get_automerging_query_engine(

    automerging_index,

)

## 提问过程中进行merging

auto_merging_response = automerging_query_engine.query(

    "How do I build a portfolio of AI projects?"

)

## 设置记录器进行评估，设置问答的例子

tru.reset_database()

tru_recorder_automerging = get_prebuilt_trulens_recorder(automerging_query_engine,

                                                         app_id="Automerging Query Engine")

## 运行句窗检索器

for question in eval_questions:

    with tru_recorder_automerging as recording:

        response = automerging_query_engine.query(question)

        print(question)

        print(response)

## 查看结果榜单

tru.get_leaderboard(app_ids=[])

## 通过仪表盘显示

tru.run_dashboard()

RAG（1）：管道Pipeline的更多相关文章

[Linux] 流 ( Stream )、管道 ( Pipeline ) 、Filter - 笔记
流 ( Stream ) 1. 流,是指可使用的数据元素一个序列. 2. 流,可以想象为是传送带上等待加工处理的物品,也可以想象为工厂流水线上的物品. 3. 流,可以是无限的数据. 4. 有一种功能, ...
Android OpenGL ES（二）OpenGL ES管道(Pipeline) .
大部分图形系统都可以比作工厂中的装配线(Assemble line)或者称为管道(Pipeline).前一道的输出作为下道工序的输入.主CPU发出一个绘图指令,然后可能由硬件部件完成坐标变换,裁剪,添 ...
[并发并行]_[线程模型]_[Pthread线程使用模型之一管道Pipeline]
场景 1.经常在Windows, MacOSX 开发C多线程程序的时候, 经常需要和线程打交道, 如果开发人员的数量不多时, 同时掌握Win32和pthread线程并不是容易的事情, 而且使用Win ...
控制结构(9) 管道(pipeline)
// 上一篇:线性化(linearization) // 下一篇:指令序列(opcode) 最近阅读了酷壳上的一篇深度好文:LINUX PID 1 和 SYSTEMD.这篇文章介绍了systemd干掉 ...
redis使用管道pipeline提升批量操作性能(php演示)
Redis是一个TCP服务器,支持请求/响应协议. 在Redis中,请求通过以下步骤完成: 客户端向服务器发送查询,并从套接字读取,通常以阻塞的方式,用于服务器响应. 服务器处理命令并将响应发送回客户 ...
Redis 管道pipeline
Redis是一个cs模式的tcp server,使用和http类似的请求响应协议. 一个client可以通过一个socket连接发起多个请求命令. 每个请求命令发出后client通常会阻塞并等待red ...
jenkins~管道Pipeline的使用，再见jenkinsUI
Pipeline在Jenkins里的作用最近一直在使用jenkins进行自动化部署的工作,开始觉得很爽,省去了很多重复的工作,它帮助我自动拉服务器的代码,自动还原包包,自动编译项目,自动发布项目,自 ...
jenkins~管道Pipeline里使用公用类库
Pipeline使用了groovy语法,同时可以使用所有jenkins插件在groovy里进行调用,可以说通过UI可以实现的功能使用pipeline也可以实现,这一点我在上一篇文章里已经说明,今天主要 ...
redis管道pipeline
Jedis jedis = new Jedis("127.0.0.1",6379); Pipeline pipeline = jedis.pipelined(); for(int ...
Apache Beam实战指南 | 大数据管道（pipeline）设计及实践
Apache Beam实战指南 | 大数据管道(pipeline)设计及实践 mp.weixin.qq.com 策划 & 审校 | Natalie作者 | 张海涛编辑 | LindaAI 前 ...

随机推荐

灵活、可用、高扩展，EasyMR 带来全新 Yarn 的队列管理功能及可视化配置
YARN(Yet Another Resource Negotiator)是 Hadoop 生态系统中的资源调度器,主要用于资源管理和作业调度.YARN 自身具备队列管理功能,通过对 YARN 资源队 ...
为什么 `kubectl patch` 关闭探针不重启 Pod，重新开启却重启？
揭秘 Kubernetes 探针机制与 Pod 不可变性的博弈在 Kubernetes 运维中,一个常见现象引发困惑:关闭探针(如 LivenessProbe)时 Pod 不会重启,但重新启用后却可 ...
vs minmax冲突
简介 RT 参考链接 https://blog.csdn.net/danelumax2/article/details/9172465 预处理器设置项目属性 --> C/C++ --> ...
小心误关了NAS服务器！修改Linux的电源键功能
前言事情是这样的今天想用NAS上的服务突然发现NAS离线了我看了下原来是关机了很奇怪,这几天也没断电啊- 我又去分析了系统日志注意到了关机前的这段日志 Jul 13 23:24:33 pve ...
国际认可！天翼云“云顶”AI赋能营销服务应用案例成功入选联合国AI for Good创新扩大影响案例集！
7月8日至11日,由联合国国际电信联盟(ITU)与联合国工业发展组织.联合国教科文组织等40余家联合国机构共同举办的2025人工智能向善全球峰会在瑞士日内瓦召开.峰会期间,ITU正式发布"人 ...
angular虚拟滚动
方案:采用cdk-virtual-scroll-viewport和ng-zorro的Timeline时间轴组件结合方案2: 监听overflow元素scroll事件 onScroll(event) ...
SciTech-Python-编译Python的C/C++扩展的setup.py使用pybind映射C/C++到Python库
pybind:pybind11 - Seamless operability between C++11 and Python header-only library exposes C++ type ...
SciTech-BigDataAIML-LLM-Transformer Series-$\large Supervised\ Statistical\ Model$监督学习的统计模型+$\large Transformer+Self Attention$的核心原理及实现
SciTech-BigDataAIML-LLM-Transformer Series> $\large Supervised\ Statistical\ Model$: \(\large T ...
mysql事务隔离级别/脏读/不可重复读/幻读详解
一.四种事务隔离级别 1.1 read uncommitted 读未提交即:事务A可以读取到事务B已修改但未提交的数据. 除非是文章阅读量,每次+1这种无关痛痒的场景,一般业务系统没有人会使用该事务 ...

RAG（1）：管道Pipeline

RAG（1）：管道Pipeline的更多相关文章

随机推荐

热门专题