(0). 创建索引和Mapping信息请参考:
https://www.lixin.help/2020/11/29/ElasticSearch-Mapping.html
(1). 批量插入测试数据(_bulk)
# 批量插入数据
POST _bulk
{"index":{"_index":"hobbys","_id":1001}}
{"id":"1001","name":"张三","age":20,"email":"1111@qq.com","sex":"男","hobby":"羽毛球,乒乓球,足球"}
{"index":{"_index":"hobbys","_id":1002}}
{"id":"1002","name":"李四","age":21,"email":"2222@qq.com","sex":"女","hobby":"羽毛球,乒乓球,足球,蓝球"}
{"index":{"_index":"hobbys","_id":1003}}
{"id":"1003","name":"王五","age":22,"email":"3333@qq.com","sex":"男","hobby":"羽毛球,蓝球,游泳,听音乐"}
{"index":{"_index":"hobbys","_id":1004}}
{"id":"1004","name":"赵六","age":23,"email":"4444@qq.com","sex":"女","hobby":"跑步,游泳"}
{"index":{"_index":"hobbys","_id":1005}}
{"id":"1005","name":"孙七","age":24,"email":"5555@qq.com","sex":"男","hobby":"听音乐,看电影"}
# 添加数据
PUT /hobbys/_doc/1006
{"id":"1006","name":"小玲","age":25,"email":"6666@qq.com","sex":"女","hobby":"乐器,看电影"}
(2). 检索(match)
ES在保存数据时有如下几步操作:
- 获取文档Field(hobby)配置的分词器(Standard).
- 对Document Field进行分词(Standard分词器,英文按空格划分,中文按单字划分).
- 把分词后的内容,建立倒排索引,同时,把原始Document进行保存.
ES match数据时有如下几步操作:
- match会获取待检索Field(hobby)的分词器(Standard分词器,英文按空格划分,中文按单字划分.)
- 根据分词器(Standard),对检索内容(“音乐”)进行分词.
- 根据分词后的内容(“音”,”乐”),让ES到各分片的:”倒排搜索中进行检索”.
- 在倒排索引中,有相应的documentID,获取:documentID,对应的:document
总结:
math会对内容进行分词.# 搜索爱好:包含音乐的
# SELECT * FROM hobbys WHERE hobby LIKE '%音%' OR hobby LIKE '%乐%';
POST /hobbys/_search
{
"query": {
"match": {
"hobby": "音乐"
}
},
"highlight": {
"fields": {
"hobby":{}
}
}
}
# 搜索爱好包含音乐的,结果集
{
"took" : 802,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.8456821,
"hits" : [
{
"_index" : "hobbys",
"_type" : "_doc",
"_id" : "1005",
"_score" : 1.8456821,
"_source" : {
"id" : "1005",
"name" : "孙七",
"age" : 24,
"email" : "5555@qq.com",
"sex" : "男",
"hobby" : "听音乐,看电影"
},
"highlight" : {
"hobby" : [
"听<em>音</em><em>乐</em>,看电影"
]
}
},
{
"_index" : "hobbys",
"_type" : "_doc",
"_id" : "1003",
"_score" : 1.4829273,
"_source" : {
"id" : "1003",
"name" : "王五",
"age" : 22,
"email" : "3333@qq.com",
"sex" : "男",
"hobby" : "羽毛球,蓝球,游泳,听音乐"
},
"highlight" : {
"hobby" : [
"羽毛球,蓝球,游泳,听<em>音</em><em>乐</em>"
]
}
},
{
"_index" : "hobbys",
"_type" : "_doc",
"_id" : "1006",
"_score" : 0.7909737,
"_source" : {
"id" : "1006",
"name" : "小玲",
"age" : 25,
"email" : "6666@qq.com",
"sex" : "女",
"hobby" : "乐器,看电影"
},
"highlight" : {
"hobby" : [
"<em>乐</em>器,看电影"
]
}
}
]
}
}
(3). 精确匹配(term/terms)
term主要用于精确匹配哪些值,比如:数字,日期,布尔值.
或者not_analyzed的字符串(未经分词器处理的文本数据类型[keyword]).
# 精确查找:性别为男性的
# SELECT * FROM hobbys WHERE sex = '男';
GET /hobbys/_search
{
"query": {
"terms": {
"sex": [
"男"
]
}
}
}
# 精确查找年龄是20/21/22岁
# SELECT * FROM hobbys WHERE age IN (20,21,22);
GET /hobbys/_search
{
"query": {
"terms": {
"age": [ 20,21,22 ]
}
}
}
# 精确查找
# SELECT * FROM hobbys WHERE email = '5555@qq.com';
GET /hobbys/_search
{
"query": {
"terms": {
"email": [
"5555@qq.com"
]
}
}
}
(4). 范围查询(rang)
gt : 大于
gte : 大于等于
lt : 小于
lte : 小于等于
# 范围查询
# SELECT * FROM hobbys WHERE age >= 20 AND age <=24;
GET /hobbys/_search
{
"query": {
"range": {
"age": {
"gte": 20,
"lte": 24
}
}
}
}
(5). Exits
判断某个Field不为空(IS_NOT_NUL).
# 添加一条数据,hobby为空的数据.
PUT /hobbys/_doc/1007
{"id":"1007","name":"小何","age":26,"email":"7777@qq.com","sex":"男"}
# 查询出字段(hobby)不为空的数据(会发现:id为1007的数据没有在结果集展示)
# SELECT * FROM hobbys WHERE hobby IS NOT NULL;
GET /hobbys/_search
{
"query": {
"exists": {
"field": "hobby"
}
}
}
(6). Match查询
match查询是一个标准查询,不管你是否需要全文查询还是精确查询基本上都要用到它.
如果你使用match查询一个全文本字段(text),它会在真正查询之前先用分词器分析match一下查询字符. 如果用match查询,指定的是一个确切值,在遇到:数字,日期,布尔值或者not_analyzed的这符串时,它将为你搜索你给定的值.
# text分词字段.
# SELECT * FROM hobbys WHERE name LIKE '%小%' OR name LIKE '%玲%';
GET /hobbys/_search
{
"query": {
"match": {
"name": "小玲"
}
},
"highlight": {
"fields": {
"name":{}
}
}
}
# not_analyzed字段
# SELECT * FROM hobbys WHERE age = 20;
GET /hobbys/_search
{
"query": {
"match": {
"age": 20
}
}
}
(7). boolean查询
组合查询
must : 多个查询条件完全匹配(and).
must_not : 多个查夜条件相反匹配(not).
should : 至少有一个查询条件匹配(or).
# 组合查询
# SELECT * FROM hobbys WHERE sex = "男" AND age >= 20 AND age <=24;
POST /hobbys/_search
{
"query": {
"bool": {
"filter": { "match":{ "sex" : "男" } },
"must": { "range":{ "age":{ "gte" : 20, "lte" : 24 }} }
}
}
}
# boolean组合查询
# 查询年龄在20-24岁之间的,性别为:男
# SELECT * FROM hobbys
# WHERE hobby IS NOT NULL
# AND age >= 20 AND age <=24
# AND sex = "男" ;
POST /hobbys/_search
{
"query": {
"bool": {
"must": [
{
"exists" :{ "field": "hobby" }
},
{
"range":{
"age":{
"gte":20,
"lte":24
}
}
},
{
"match": {
"sex": "男"
}
}
]
}
}
}
# boolean 组合查询
# 查询年龄在20到24岁之间的,并且,性别不能为:"男"
# SELECT * FROM hobbys WHERE hobby IS NOT NULL AND age <= 20 AND age >= 24 AND sex != "男";
POST /hobbys/_search
{
"query": {
"bool": {
"must": [
{
"exists" :{ "field": "hobby" }
},
{
"range":{
"age":{
"gte":20,
"lte":24
}
}
}
],
"must_not": [
{"match":{"sex":"男"}}
]
}
}
}
# exists找出不为NULL,must_not取反.
# SELECT * FROM hobbys WHERE hobby IS NULL;
POST /hobbys/_search
{
"query": {
"bool": {
"must_not": [
{
"exists": {
"field":"hobby"
}
}
]
}
}
}
(8). 过滤(filter)查询
POST /hobbys/_search
{
"query": {
"bool": {
"filter": {
"terms": {
"age": [ 20,21 ]
}
}
}
}
}
(9). filter和match对比
一条过滤(filter)语句会询问每个文档的字段是否包含着特定的值.
建议: 做精确匹配搜索时,最好用过滤语句(filter),因为:过滤语句可以缓存数据.
查询语句(match)会询问每个文档的字段值与特定值的匹配程度如何.该查询语句会计算每个文档与查询语句的相关性,会给出一些相关性评分(_score),并且,按照相关性对匹配到的文档进行排序,这种评分方式非常适用于一个没有完全配置结果(没人工干扰)的全文本搜索.
查询语句(match)不仅需要查找匹配的文档,还需要计算每个文档的相关性,所以一般来说:查询语句(match)要比过滤语句(filter)更耗时,并且查询结果也不可缓存.