2021-03-04发表2021-12-28更新大数据19 分钟读完 (大约2839个字)

ES使用

Diving into the funcationality

逻辑结构

ElasticSearch是面向文档的，即索引与查询的最小单元就是文档（Document）。

{
	"name": "Elasticsearch Denver",
	"organizer": "Lee", 
  	"location": "Denver, Colorado, USA"
}

Types是文档在逻辑上的容器，类似于表包含行一样。不同结构的文档，放在不同的type类型下面。在一个Type中，对于fields的定义，叫做映射（mapping）。类似表结构的定义。Index是types的容器，类似于数据库。整体结构如下：

ES:Document—Type—Index
对应关系表：行—表—库

物理结构

默认情况下，Elasticsearch对于每个index分成5个shards，每个shard有一个备份。Shard是es在node间移动的最小单元。
Node是es的一个实例。
Shard是个Lucene的索引，存储倒排索引的文件目录.

创建索引（index）

“curl -XPUT 'localhost:9200/get-together/group/1?pretty' -d '{
    "name": "Elasticsearch Denver",
    "organizer": "Lee"
}”

使用curl命令访问es，会自动创建get-together的index，type为group，id为1的文档，其中field为name和organizer。系统会自动识别field的类型，即schema。

-X为指定协议类型，默认为GET。-XPUT 和-X PUT都可以

/index/type/_mapping,可以查看index/type的mapping说明

查找数据

_search是查找的参数

curl 'localhost:9200/get-together/_search?q=sample&pretty

多个index，可以用,分隔查询，**_all** 可以查询所有的index

Indexing, updating, deleting data

ES中的type，只是一个抽象的概念，在同一个index中，不同的type并没有不同的物理划分，type仅仅只是document的一个field。
ES的三种field类型：core types、Array and multi-fields、Predefined

Core Types

String

String类型的可以进行分词处理（analyze），划分为term（即一个单词，查询的基本单位）。默认情况下，会对string类型的字符串进行分词的处理，如果不需要，可以设置：

"index":"not_analyzed"

not_analyzed则意味着对整个字符串进行索引，不会再去其中的每个单词进行索引处理。

Numeric
byte, short(16bit), integer, long(64bit), float, double
Date

日期type为date

"format":"MM DD YYYY"

Boolean

存储true/false

Arrays and multi-fields

一个field中可以存储多个值，比如：

curl -XPUT 'localhost:9200/blog/posts/1' -d '{
    "tags": ["first", "initial"]
}

所有的core类型，均支持array，在不改变mapping的情况下，可以同时使用单值和多值。

array与multi-field的区别：

array是使用相同的配置索引更多的数据；
multi-fields则是对相同的数据多次使用不同的配置索引；

如对一列，可以同时设置analyzed和not_analyzed

可以在不重建索引的情况下，从单field升级到multi-field，但反过来不行。

Pre-defined fields

预定义的field均以_开头，ES来维护相关的值。主要分以下几类：

用来控制如何存储和查询文档：_source, _all
识别文档: _uid _id _type _index
增加新的属性: _size _timestamp _ttl
控制shard: _routing _parent

update与delete

使用_update进行数据更新

“ curl -XPOST 'localhost:9200/get-together/group/2/_update' -d '{
"doc": {
"organizer": "Roy"
    }
}'

删除单个文档：

“curl -XDELETE 'localhost:9200/online-shop/shirts/1”

按条件删除type和文档

% curl -XDELETE 'localhost:9200/online-shop/shirts
% curl -XDELETE 'localhost:9200/get-together/_query?q=elasticsearch

可以关闭、打开索引

curl -XPOST 'localhost:9200/online-shop/_close
curl -XPOST 'localhost:9200/online-shop/_open

Searching your data

查询请求的结构

指定查询的范围

REST查询使用_search进行数据的查询操作，可以使用GET或POST。

curl 'localhost:9000/_search' -d '...' //查询整个集群
curl 'localhost:9000/get-together/_search' -d '...' //查询get-together索引
curl 'localhost:9000/get-together/event/_search' -d '...' //查询get-together索引下的event类型
curl 'localhost:9000/_all/event/_search' -d '...' //查询所有索引中的event类型
curl 'localhost:9000/*/event/_search' -d '...' //同上
curl 'localhost:9000/get-together,other/event,group/_search' -d '...' //在get-togher,other索引下查询event,group类型中的数据
curl 'localhost:9000/+get-toge*,-get-together/_search' -d '...' //查询所有以get-toge开头的索引里的内容，get-together索引除外

查询请求的组成

query: 这个内容由查询DSL和过滤DSL组成，可以配置返回内容的范围等；
size: 设置返回的大小
from: 分页
_source: 原始数据的内容，可以设定_source内容如何返回，默认会返回_source的全部内容
sort: 排序

URL-based查询请求

curl 'localhost:9000/get-together/_search?from=10&size=10' //分页
curl 'localhost:9000/get-together/_search?sort=date:asc' //按date列的升序排列
...._source=title,date //_souce只返回title与date

使用q=field:keyword格式，可以指定查询某一个field中是否含有keyword

curl 'localhost:9000/get-together/_search?sort=date:asc&q=title:elasticsearch

Body-based查询请求

curl 'localhost:9000/get-together/_search' -d '{
    "query": {
           "match_all":{}
    },
    "from":10,
    "size":10
}'

限制_source数据的返回，可以如下：

curl 'localhost:9000/get-together/_search' -d '{
    "query": {
           "match_all":{}
    },
    "_source":["name","date"]
}'

对于_source,可以使用通配符进行限制，如name.*,还可以加include与exclude限制

curl 'localhost:9000/get-together/_search' -d '{
    "query": {
           "match_all":{}
    },
    "_source":{
        "include":["location.*","date"],
        "exclude":["location.geolocation"]
    }
}'

同样，可以对field指定排序

curl 'localhost:9000/get-together/_search' -d '{
    "query": {
           "match_all":{}
    },
    "sort":[
        {"create_on":"asc"},
        {"exclude":"desc"},
        "_score"
    ]
}'

响应数据的结构

查询数据格式如下：

http://192.168.99.100:32769/get-together/_search？q=title:elasticsearch&_source=title,date/

ES返回的查询结果格式如下：

{
    "took": 4, //查询花费的毫秒数
    "timed_out": false,  //查询的shard是否有超时现象
    "_shards": {
        "total": 2, //多少个shard响应了这个查询
        "successful": 2, //成功响应的shard数
        "failed": 0     //失败响应的shard数
    },
    "hits": {  //符合查询的数据
        "total": 7,  //符合条件的数据数量
        "max_score": 0.9904146, //最大的相关性
        "hits": [
            {
                "_index": "get-together",  //index名字
                "_type": "event",  //type名字
                "_id": "103", //document的id
                "_score": 0.9904146, //相关性分数
                "_routing": "2",
                "_parent": "2",
                "_source": {  //每个field中的数据
                    "title": "Introduction to Elasticsearch"
                }
            }
            ,
            {
                "_index": "get-together",
                "_type": "event",
                "_id": "105",
                "_score": 0.9904146,
                "_routing": "2",
                "_parent": "2",
                "_source": {
                    "title": "Elasticsearch and Logstash"
                }
            }
            ,
            ...
        ]
    }
}

查询与过滤的DSL

Match query and term filter

使用match可以按条件进行结果的过滤，系统默认是match_all

"query":{
    "match":{
        "title":"hadoop"
    }
}

Filter与query类似，区别在于对score的影响和查询的性能。Query会对查询计算score，而filter只会判断文档是否匹配该查询。所以，filters比正常的query查询要快，而且可以被缓存。

”query":{
    "filtered": {  //指定query的类型，是一个filtered的查询方式
        "query": {
            "match": {
                "title":"hadoop"
            }
        },
        "filter": {
            "term": {
                "host": "andy"
            }
        }
    }
}

如上所示，一个filtered查询包括两个内容，一个query和一个filter。term查询应用于所有的包含andy的document。ES通过一个bitet来判断文档是否匹配该filter。同时，bitst可以缓存起来，供下次查询时使用。

常用的query与filter

match_all

匹配所有的documents

{
    "query": {
        "match_all":{}
    }
}

可以增加一个filter,用来限制条件

{
    "query": {
        "filtered": {
            "query": {
                "match_all":{}
            },
            "filter": {
                ...
            }
        }
    }
}

query_string

query_string在URL-based方式中,是以q=xxx的格式调用,而在body请求中,格式如下:

{
    "query": {
        "query_string": {
            "query": "nosql"
        }
    }
}

默认情况下,query_string查询_all field,如果想指定列,可以使用如description:nosql的形式,或使用default_field指定:

{
    "query": {
        "query_string": {
            "default_field":"description",
            "query": "nosql"
        }
    }
}

查询条件可以使用AND,OR(必须大写)进行组合,还可以使用-限制条件,如:

name:nosql AND -description:mongodb

query_string功能比较强大,不建议开放给用户直接使用.

term query and term filter

这种方式可以让用户直接指定field和term进行查询操作(只针对没有analyzed的精确查询)

{
    "query": {
        "term": {
            "tags": "Elasticsearch"
        }
    },
    "_source":["name","tags"]
}

term filter可以用来限制文档的结果,由于不计算score,可以使用match_all来配合:

{
    "query": {
        "filtered": {
            "query": {
                "match_all": {}
            },
            "filter": {
                "term": {
                    "tags": "Elasticsearch"
                }
            }
        }
    }
    "_source":["name","tags"]
}

Terms query

terms,即多值的查询

{
    "query": {
        "term": {
            "tags": ["Elasticsearch", "hadoop"]
        }
    },
    "_source":["name","tags"]
}

可以使用minimum_should_match参数,去指定最少匹配的term数量

Match query and term filter

match查询也可以像query查询一样,指定一个field或者使用_all来进行查询.Match查询一般有不同的方式,主要分为两种:boolean和phrase.

boolean查询方式

默认情况下,match查询使用Boolean行为和OR的操作,如查询”Elasticsearch Denver”,则会使用Elasticsearch OR Denver的或方式,命中其中之一即可.
如果要使用AND关系,则需要手工指定:
{
“query”: {
“match”: {
“name”: {
“query”: “Elasticsearch Denver”,
“operator”: “and”
}
}
}
}

phrase查询方式

如果想要查询的词组中,有一些记不清的情况,可以使用phrase方式,通过设定单词之间的slop,来进行查询. Slop的默认值为0,即中间不含有其它单词(精确查询).

{
    "query": {
        "match": {
            "name": {
                "type": "phrase",
                "query": "Elasticsearch Denver",
                "slop": 1
            }
        }
    }
}

上面的例子,就是查询Elasticsearch xxxxx Denver,返回符合要求的结果.

Phrase_prefix query

match_phrase_prefix查询,允许对查询内容的最后一项进行前缀的匹配. 常应用于搜索框的自动完成功能.

{
    "query": {
        "match": {
            "name": {
                "type": "phrase_prefix",
                "query": "Elasticsearch den",
                "max_expansions": 1
            }
        }
    },
    "_source":["name"]
}

Match multiple fields with multi_match

multi_match允许用户跨多个field去查询一个值
{
“query”: {
“multi_match”: {
“query”: “elasticsearch hadoop”,
“fields”: [ “name”, “description” ]
}
}
}

组合查询

Bool query

bool查询允许用户组合多个子查询,并对每个子查询标注must,should 或者must_not.

must: 必须符合 AND关系
should: OR关系

must_not: 非关系

{

  "query": {
      "bool": {
          "must": {
              {
                  "term": {
                      "attendees": "david"
                  }
              }
          },
          "should": {
              {
                  "term": {
                      "ttendees": "clint"
                  }
              }
          },
          ....
      }
  }

}

Bool filter

利用组合filter的方法进行查询

{
    "query": {
        "filtered": {
              "query": {
                "match_all": {}
              },
              "filter": {
                "bool": {
                  "must": [
                        {
                          "term": {
                                "attendees": "david"
                          }
                        }
                  ],
                  "should": [
                        {
                          "term": {
                                "attendees": "clint”
                          }
                        },
                        {
                          "term": {
                                "attendees": "andy"
                          }
                        }
                  ],
                  "must_not": [
                        {
                          "range" :{
                                "date": {
                                  "lt": "2013-06-30T00:00"
                                }
                          }
                        }
                  ]
                }
              }
        }
    }
}

超越match和filter查询

范围查询和filter

针对数字、日期甚至字符串指定一个范围。
query如下：

{
    "query": {
        "range": {
            "created_on": {
                "gt": "2012-06-01",
                "lt": "2012-09-01"
            }
        }
    }
}

filter如下：

{
    "query": {
        "filtered": {
            "query": {
                "match_all": {}
            },
            "filter": {
                "range": {
                    "created_on": {
                        "gt": "2012-06-01",
                        "lt": "2012-09-01"
                    }
                }
            }
        }
    }
}

gt: 大于
gte：大于等于
lt：小于
lte：小于等于

前缀查询和filter

使用prefix查询可以查找没有analyzed的前缀

{
    "query": {
        "prefix": {
            "title": "liber"
        }
    }
}

通配符查询

{
    "query": {
        "wildcard": {
            "title": {
                "wildcard": "ba*n"
            }
        }
    }
}

查询field（存在filter）

存在的filter

{
    "query": {
        "filtered": {
              "query": {
                "match_all": {}
              },
              "filter": {
                "exists": { "field": "location.geolocation" }
             }
        }
    }
}

遗漏的filter

{
    "query": {
        "filtered": {
              "query": {
                "match_all": {}
              },
              "filter": {
                "missing": { 
                    "field": "reviews",
                    "existence": true,
                    "null_value": true
                }
             }
        }
    }
}

选择合适的查询方式

User Case	Query type to use
You want to take input from a user, similar to a Google-style interface, and search for documents with the input.	Use a match query or the simple_query_string query if you want to support +/- and search in specific fields.
You want to take input as a phrase and search for documents containing that phrase, perhaps with some amount of leniency (slop).	Use a match_phrase query with an amount of slop to find phrases similar to what the user is searching for.”
You want to combine many different searches or types of searches, creating a single search out of them.	Use the bool query to combine any number of subqueries into a single query.
You want to search for certain words across many fields in a document.	Use the multi_match query, which behaves similarly to the match query but on multiple fields.
You want to return every document from a search.	Use the match_all query to return all documents from a search.
You want to search a field for values that are between two specified values.	Use a range query to search within documents with values between a certain range.
You want to search a field for values that start with a specified string.	Use a prefix query to search for terms starting with a given string.
You want to autocomplete the value of a single word based on what the user has already typed in.	Use a prefix query to send what the user has typed in and get back exact matches starting with the text.
You want to search for all documents that have no value for a specified field.	Use the missing filter to filter out documents that are missing fields.

ES使用

https://www.ovasty.com/posts/es.html

作者

ovasty

发布于

2021-03-04

更新于

2021-12-28

许可协议

#大数据 elasticsearch

ES使用

Diving into the funcationality

逻辑结构

物理结构

创建索引（index）

查找数据

Indexing, updating, deleting data

Core Types

Arrays and multi-fields

Pre-defined fields

update与delete

Searching your data

查询请求的结构

指定查询的范围

查询请求的组成

响应数据的结构

查询与过滤的DSL

Match query and term filter

常用的query与filter

Match query and term filter

Phrase_prefix query

组合查询

Bool query

Bool filter

超越match和filter查询

范围查询和filter

前缀查询和filter

通配符查询

查询field（存在filter）

存在的filter

遗漏的filter

选择合适的查询方式

作者

发布于

更新于

许可协议

评论

链接

分类

最新文章

归档

标签

订阅更新

广告