Elasticsearch(GEO)空间检索查询

Elasticsearch(GEO)空间检索查询python版本

1、Elasticsearch

ES的强大就不用多说了，当你安装上插件，搭建好集群，你就拥有了一个搜索系统。

当然，ES的集群优化和查询优化就是另外一个议题了。这里mark一个最近使用的es空间检索的功能。

2、ES GEO空间检索

空间检索顾名思义提供了通过空间距离和位置关系进行检索的能力。有很多空间索引算法和类库可供选择。

ES内置了这种索引方式。下面详细介绍。

step1：创建索引

def create_index():

    mapping = {

        "mappings": {

            "poi": {

                "_routing": {

                    "required": "true",

                    "path": "city_id"

                },

                "properties": {

                    "id": {

                        "type": "integer"

                    },

                    "geofence_type": {

                        "type": "integer"

                    },

                    "city_id": {

                        "type": "integer"

                    },

                    "city_name": {

                        "type": "string",

                        "index": "not_analyzed"

                    },

                    "activity_id": {

                        "type": "integer"

                    },

                    "post_date": {

                        "type": "date"

                    },

                    "rank": {

                        "type": "float"

                    },

                    # 不管是point还是任意shape, 都用geo_shape,通过type来设置

                    # type在数据里

                    "location_point": {

                        "type": "geo_shape"

                    },

                    "location_shape": {

                        "type": "geo_shape"

                    },

                    # 在计算点间距离的时候, 需要geo_point类型变量

                    "point": {

                        "type": "geo_point"

                    }

                }

            }

        }

    }

    # 创建索引的时候可以不 mapping

    es.create_index(index='mapapp', body=mapping)

    # set_mapping = es_dsl.set_mapping('mapapp', 'poi', body=mapping)

这里我们创建了一个名叫mapapp的索引，映射的设置如mapping所示。

2、批量插入数据bulk

def bulk():
    # actions 是一个可迭代对象就行, 不一定是list
    workbooks = xlrd.open_workbook('./geo_data.xlsx')
    table = workbooks.sheets()[1]
    colname = list()
    actions = list()
    for i in range(table.nrows):
        if i == 0:
            colname = table.row_values(i)
            continue
        geo_shape_point = json.loads(table.row_values(i)[7])
        geo_shape_shape = json.loads(table.row_values(i)[8])
        geo_point = json.loads(table.row_values(i)[9])
        raw_data = table.row_values(i)[:7]
        raw_data.extend([geo_shape_point, geo_shape_shape, geo_point])
        source = dict(zip(colname, raw_data))
        geo = GEODocument(**source)
        action = {
                "_index": "mapapp",
                "_type": "poi",
                "_id": table.row_values(i)[0],
                "_routing": geo.city_id,
                #"_source": source,
                "_source": geo.to_json(),
            }
        actions.append(action)
    es.bulk(index='mapapp', actions=actions, es=es_handler, max=25)

刷入测试数据，geo_data数据形如：

id    geofence_type    city_id    city_name    activity_id    post_date    rank    location_point    location_shape    point

1    1    1    北京    100301    2016/10/20    100.30     {"type":"point","coordinates":[55.75,37.616667]}    {"type":"polygon","coordinates":[[[22,22],[4.87463,52.37254],[4.87875,52.36369],[22,22]]]}    {"lat":55.75,"lon":37.616667}

2    1    1    北京    100302    2016/10/21    12.00     {"type":"point","coordinates":[55.75,37.616668]}    {"type":"polygon","coordinates":[[[0,0],[4.87463,52.37254],[4.87875,52.36369],[0,0]]]}    {"lat":48.8567,"lon":2.3508}

3    1    1    北京    100303    2016/10/22    3432.23     {"type":"point","coordinates":[55.75,37.616669]}    {"type":"polygon","coordinates":[[[4.8833,52.38617],[4.87463,52.37254],[4.87875,52.36369],[4.8833,52.38617]]]}    {"lat":32.75,"lon":37.616668}

4    1    1    北京    100304    2016/10/23    246.80     {"type":"point","coordinates":[52.4796, 2.3508]}    {"type":"polygon","coordinates":[[[4.8833,52.38617],[4.87463,52.37254],[4.87875,52.36369],[4.8833,52.38617]]]}    {"lat":11.56,"lon":37.616669}

3、GEO查询：两点间距离

# 点与点之间的距离

# 按照距离升序排列,如果size取1个,就是最近的

def sort_by_distance():

    body = {

        "from": 0,

        "size": 1,

        "query": {

            "bool": {

                "must": [{

                    "term": {

                        "geofence_type": 1

                    }

                }, {

                    "term": {

                        "city_id": 1

                    }

                }]

            }

        },

        "sort": [{

            "_geo_distance": {

                "point": {

                    "lat": 8.75,

                    "lon": 37.616

                },

                "unit": "km",

                "order": "asc"

            }

        }]

    }

    for i in es.search(index='mapapp', doc_type='poi', body=body)['hits']['hits']:

        print type(i), i

4、GEO查询：边界框过滤

tips：大家都知道，ES的过滤是会生成缓存的，所以在优化查询的时候，常常需要将频繁用到的查询提取出来作为过滤呈现，但不幸的是，对于GEO过滤不会生成缓存，所以没有必要考虑，这里为了做出区分，使用post_filter，查询后再过滤，下面的都类似。

# 边界框过滤:用框去圈选点和形状

# 这里实现了矩形框选中

# post_filter后置filter, 对查询结果再过滤; aggs常用后置filter

def bounding_filter():

    body = {

        "from": 0,

        "size": 1,

        "query": {

            "bool": {

                "must": [{

                    "term": {

                        "geofence_type": 1

                    }

                }, {

                    "term": {

                        "city_id": 1

                    }

                }]

            }

        },

        "post_filter": {

            "geo_shape": {

                "location_point": {

                    "shape": {

                        "type": "envelope",

                        "coordinates": [[52.4796, 2.3508], [48.8567, -1.903]]

                    },

                    "relation": "within"

                }

            }

        }

    }

    for i in es.search(index='mapapp', doc_type='poi', body=body)['hits']['hits']:

        print type(i), i

5、GEO查询：圆形圈选

# 边界框过滤: 圆形圈选

# post_filter后置filter, 对查询结果再过滤; aggs常用后置filter

def circle_filter():

    body = {

        "from": 0,

        "size": 1,

        "query": {

            "bool": {

                "must": [{

                    "term": {

                        "geofence_type": 1

                    }

                }, {

                    "term": {

                        "city_id": 1

                    }

                }]

            }

        },

        "post_filter": {

            "geo_shape": {

                "location_point": {

                    "shape": {

                        "type": "circle",

                        "radius": "10000km",

                        "coordinates": [22, 45]

                    },

                    "relation": "within"

                }

            }

        }

    }

    for i in es.search(index='mapapp', doc_type='poi', body=body)['hits']['hits']:

        print type(i), i

6、GEO查询：反选

# 边界框反选:点落在框中,框被查询出来

# post_filter后置filter, 对查询结果再过滤; aggs常用后置filter

# 包含正则匹配regexp

def intersects():

    body = {

       "from": 0,

       "size": 1,

       "query": {

            "bool": {

                "must": [{

                    "term": {

                        "geofence_type": 1

                    }

                }, {

                    "regexp": {

                        "city_name": u".*北京.*"

                    }

                }, {

                    "term": {

                        "city_id": 1

                    }

                }]

            }

       },

       "post_filter": {

            "geo_shape": {

                "location_shape": {

                    "shape": {

                        "type": "point",

                        "coordinates": [22,22]

                    },

                    "relation": "intersects"

                }

            }

       }

    }

    for i in es.search(index='mapapp', doc_type='poi', body=body)['hits']['hits']:

        print type(i), i

7、最后粘两个空间聚合的例子，作为参考

# 空间聚合

# 按照与中心点距离聚合

def aggs_geo_distance():

    body = {

        "aggs": {

            "aggs_geopoint": {

                "geo_distance": {

                    "field": "point",

                    "origin": {

                        "lat": 51.5072222,

                        "lon": -0.1275

                    },

                    "unit": "km",

                    "ranges": [

                        {

                            "to": 1000

                        },

                        {

                            "from": 1000,

                            "to": 3000

                        },

                        {

                            "from": 3000

                        }

                    ]

                }

            }

        }

    }

    for i in es.search(index='mapapp', doc_type='poi', body=body)['aggregations']['aggs_geopoint']['buckets']:

        print type(i), i

# 空间聚合

# geo_hash算法, 网格聚合grid

# 两次聚合

def aggs_geohash_grid():

    body = {

        "aggs": {

            "new_york": {

                "geohash_grid": {

                    "field":     "point",

                    "precision": 5

                }

            },

            "map_zoom": {

                "geo_bounds": {

                    "field": "point"

              }

            }

          }

    }

    for i in es.search(index='mapapp', doc_type='poi', body=body)['aggregations']['new_york']['buckets']:

        print type(i), i

Elasticsearch(GEO)空间检索查询的更多相关文章

Elasticsearch(GEO)数据写入和空间检索
Elasticsearch简介什么是 Elasticsearch? Elasticsearch 是一个开源的分布式 RESTful搜索和分析引擎,能够解决越来越多不同的应用场景. 本文内容本文主要 ...
基于百度地图SDK和Elasticsearch GEO查询的地理围栏分析系统（1）
本文描述了一个系统,功能是评价和抽象地理围栏(Geo-fencing),以及监控和分析核心地理围栏中业务的表现. 技术栈:Spring-JQuery-百度地图WEB SDK 存储:Hive-Elast ...
elasticsearch GIS空间查询问题解决
在GIS行业的应用越来越广泛,GIS最常用根据区域进行空间数据查询我定义了两个方法,一起来看一下: /** * geodistance filter * 一个过滤器来过滤基于一个特定的距离从 ...
Elasticsearch文档查询
简单数据集到目前为止,已经了解了基本知识,现在我们尝试用更逼真的数据集,这儿已经准备好了一份虚构的JSON,关于客户银行账户信息的.每个文档的结构如下: { , , "firstname& ...
利用kibana插件对Elasticsearch进行bool查询
#bool查询#老版本的filtered查询已经被bool代替#用 bool包括 must should must_not filter来完成 ,格式如下:#bool:{# "filter ...
java操作elasticsearch实现前缀查询、wildcard、fuzzy模糊查询、ids查询
1.前缀查询(prefix) //prefix前缀查询 @Test public void test15() throws UnknownHostException { //1.指定es集群 clus ...
java操作elasticsearch实现条件查询（match、multiMatch、term、terms、reange）
1.条件match query查询 //条件查询match query @Test public void test10() throws UnknownHostException { //1.指定e ...
java使用elasticsearch进行模糊查询-已在项目中实际应用
java使用elasticsearch进行模糊查询使用环境上篇文章本人已书写过,需要maven坐标,ES连接工具类的请看上一篇文章,以下是内容是笔者在真实项目中运用总结而产生,并写的是主要方法和思路 ...
Elasticsearch 常用基本查询
安装启动很简单,参考官网步骤:https://www.elastic.co/downloads/elasticsearch 为了介绍Elasticsearch中的不同查询类型,我们将对带有下列字段的文 ...

随机推荐

RAID基础知识总结
1.RAID RAID:Redundant Arrays of Inexpensive(Independent)Disks,即独立磁盘冗余阵列,简称磁盘阵列.简单地说就是把多个独立的硬盘组合起来,从而 ...
Nhibernate学习教程（1）-- 开篇有益
NHibernate之旅(1):开篇有益本节内容 NHibernate是什么 NHibernate的架构 NHibernate资源欢迎加入NHibernate中文社区作者注:2009-11-06 ...
MySQL （三）-- 字段属性、索引、关系、范式、逆规范化
1 字段属性主键.唯一键和自增长. 1.1 主键主键:primary key,一张表中只能有一个字段可以使用对应的键,用来唯一的约束该字段里面的数据,不能重复. 一张表只能有最多一个主键. 1.1 ...
微信小程序icon，text，progress标签的测试
一:testIconAndTextAndProgress.wxml的代码如下.testIconAndTextAndProgress.js自动生成示例代码 //testIconAndTextAndPro ...
团队作业4——第一次项目冲刺（Alpha版本） Day5
首先和助教及老师表示抱歉,博客确实当时就写了,但是一直不算写好,因为这几天卡住了,预计实现的功能实现不了,进度跟不上,现在也在寻求解决方法. 1.站立式会议: 2. Leangoo任务分解图: 3.任 ...
201521123037 《Java程序设计》第7周学习总结
1. 本周学习总结以你喜欢的方式(思维导图或其他)归纳总结集合相关内容. 2. 书面作业 1. ArrayList代码分析 1.1 解释ArrayList的contains源代码查看ArrayLi ...
201521123093 java 第七周学习总结
1. 本周学习总结 2. 书面作业 1.ArrayList代码分析 1.1 解释ArrayList的contains源代码 //contains()方法 public boolean contains ...
201521123122 《java程序设计》第四周学习总结
1. 本周学习总结 1.1 尝试使用思维导图总结有关继承的知识点. 1.2 使用常规方法总结其他上课内容. 这个思维导图比较简单,详细内容点击此处 2. 书面作业注释的应用使用类的注释与方法的注释 ...
JAVA课程设计--简易计算器（201521123022 黄俊麟）
1.团队课程设计博客链接 http://www.cnblogs.com/I-love-java/p/7058752.html 2.个人负责模板或任务说明 1.初始化业务逻辑. 2.开方.正负.清零.退 ...
Could not instantiate bean class [org.springframework.web.multipart.MultipartFile]: Specified class
如果在使用SpringMVC中使用文件上传的MultipartFile对象时,出现了以下的错误: Could not instantiate bean class [org.springframewo ...

Elasticsearch(GEO)空间检索查询

Elasticsearch(GEO)空间检索查询python版本

Elasticsearch(GEO)空间检索查询的更多相关文章

随机推荐

热门专题