1、fetch phbase工作流程

  1. The coordinating node identifies which documents need to be fetched and issues a multi GETrequest to the relevant shards.
  2. Each shard loads the documents and enriches them, if required, and then returns the documents to the coordinating node.
  3. Once all documents have been fetched, the coordinating node returns the results to the client.

The coordinating node first decides which documents actually need to be fetched. For instance, if our query specified { "from": 90, "size": 10 }, the first 90 results would be discarded and only the next 10 results would need to be retrieved. These documents may come from one, some, or all of the shards involved in the original search request.

The coordinating node builds a multi-get request for each shard that holds a pertinent document and sends the request to the same shard copy that handled the query phase.

The shard loads the document bodies—the _source field—and, if requested, enriches the results with metadata and search snippet highlighting. Once the coordinating node receives all results, it assembles them into a single response that it returns to the client.

 
 

(1)各个shard 返回的只是各文档的id和排序值( IDs and sort values ,coordinate node根据这些id&sort values 构建完priority queue之后,然后把程序需要的document 的id发送mget请求去所有shard上获取对应的document

(2)各个shard将document返回给coordinate node

(3)coordinate node将合并后的document结果返回给client客户端

 
 

2、一般搜索,如果不加from和size,就默认搜索前10条,按照_score排序

 
 

延伸阅读

Deep Pagination

The query-then-fetch process supports pagination with the from and size parameters, but within limits. Remember that each shard must build a priority queue of length from + size, all of which need to be passed back to the coordinating node. And the coordinating node needs to sort through number_of_shards * (from + size) documents in order to find the correct size documents.

Depending on the size of your documents, the number of shards, and the hardware you are using, paging 10,000 to 50,000 results (1,000 to 5,000 pages) deep should be perfectly doable. But with big-enough from values, the sorting process can become very heavy indeed, using vast amounts of CPU, memory, and bandwidth. For this reason, we strongly advise against deep paging.

In practice, "deep pagers" are seldom human anyway. A human will stop paging after two or three pages and will change the search criteria. The culprits are usually bots or web spiders that tirelessly keep fetching page after page until your servers crumble at the knees.

If you do need to fetch large numbers of docs from your cluster, you can do so efficiently by disabling sorting with the scroll query, which we discuss later in this chapter.

 
 

 
 

 
 

58.fetch phbase的更多相关文章

  1. ES--06

    第51.初识搜索引擎_上机动手实战多搜索条件组合查询 课程大纲 GET /website/article/_search{ "query": { "bool": ...

  2. Swift - 后台获取数据(Background Fetch)的实现

    前面讲了如何让程序申请后台短时运行.但这个额外延长的时间毕竟有限.所以从iOS7起又引入两种在后台运行任务的方式:后台获取和后台通知. 1,后台获取介绍 后台获取(Background Fetch)是 ...

  3. FW: How to use Hibernate Lazy Fetch and Eager Fetch Type – Spring Boot + MySQL

    原帖 https://grokonez.com/hibernate/use-hibernate-lazy-fetch-eager-fetch-type-spring-boot-mysql In the ...

  4. python 读写西门子PLC 包含S7协议和Fetch/Write协议,s7支持200smart,300PLC,1200PLC,1500PLC

    本文将使用一个gitHub开源的组件技术来读写西门子plc数据,使用的是基于以太网的TCP/IP实现,不需要额外的组件,读取操作只要放到后台线程就不会卡死线程,本组件支持超级方便的高性能读写操作 nu ...

  5. kafka启动失败错误:: replica.fetch.max.bytes should be equal or greater than message.max.bytes

    1 详细异常 2019-10-14 14:38:21,260 FATAL kafka.Kafka$: java.lang.IllegalArgumentException: requirement f ...

  6. Git 少用 Pull 多用 Fetch 和 Merge

    本文有点长而且有点乱,但就像Mark Twain Blaise Pascal的笑话里说的那样:我没有时间让它更短些.在Git的邮件列表里有很多关于本文的讨论,我会尽量把其中相关的观点列在下面. 我最常 ...

  7. git提示:Fatal:could not fetch refs from ....

    在git服务器上新建项目提示: Fatal:could not fetch refs from git..... 百度搜索毫无头绪,最后FQgoogle,找到这篇文章http://www.voidcn ...

  8. sublime 插件推荐: Nettuts+ Fetch

    Nettuts+ Fetch github地址:Nettuts-Fetch 在sublime中直接用 ctrl+shift+P -> pci -> Nettuts-Fetch 即可下载 这 ...

  9. git pull和git fetch的区别

    Git中从远程的分支获取最新的版本到本地有这样2个命令:1. git fetch:相当于是从远程获取最新版本到本地,不会自动merge Git fetch origin master git log ...

随机推荐

  1. System.IO.Path 操作

    System.IO.Path 分类: C#2011-03-23 10:54 1073人阅读 评论(0) 收藏 举报 扩展磁盘string2010c System.IO.Path提供了一些处理文件名和路 ...

  2. golomb哥伦布编码——本质上就是通过0来区分商和余数

    哥伦布编码是一个针对整数的变长编码方式,详细介绍可以看维基百科.这里简单介绍下: 哥伦布编码使用指定的整数 M 把输入的整数分成两部分:商数 q.余数 r. 商数当做一元编码,而余数放在后面做为可缩短 ...

  3. [Apple开发者帐户帮助]二、管理你的团队(1)邀请团队成员

    组织可以选择向其开发团队添加其他人员.如果您已加入Apple开发者计划,您将在App Store Connect中管理团队成员.有关详细信息,请转到在App Store Connect帮助中添加用户. ...

  4. BN 详解和使用Tensorflow实现(参数理解)

    Tensorflow   BN具体实现(多种方式): 理论知识(参照大佬):https://blog.csdn.net/hjimce/article/details/50866313 补充知识: ① ...

  5. Oracle调整内存参后报ORA-00844和ORA-00851

    数据库服务器内存由16G增加为64G,为充分利用内存资源,对Oracle内存参数做了如下调整: SQL>alter system set sga_max_size=40960M scope=sp ...

  6. Windows 环境下 Docker 使用及配置

    原文引用: https://www.cnblogs.com/moashen/p/8067612.html 我们可以使用以下两种方式在Windows环境下使用docker: 1. 直接安装: Docke ...

  7. html中canvas渲染图片,并转化成base64格式保存

    最近在做一个上传头像然后保存显示的功能,因为涉及到裁剪大小和尺寸比例,所以直接上传图片再展示的话,就会出现问题,所以就想用canvas来渲染裁剪后的图片,然后转化成base64格式的图片再存储,这样取 ...

  8. https 结合使用 对称加密和非对称加密

    (一)对称加密(Symmetric Cryptography) ---共享密钥加密 对称加密是最快速.最简单的一种加密方式,加密(encryption)与解密(decryption)用的是同样的密钥( ...

  9. ie8及其以下版本兼容性问题之圆角

    解决办法:在http://css3pie.com/页面下载一个PIE.htc的文件,加载到根目录下,然后在css中加上一句behavior:url(../js/PIE.htc);如下: .border ...

  10. 菜鸟使用 centOS 安装 redis 并放入service 启动 记录

    1.下载redis: wget http://download.redis.io/releases/redis-2.8.17.tar.gz 若wget 不可用,请先安装wget yum install ...