一文搞懂elasticsearch

一文搞懂目前流行的elasticsearch，可以用来做搜索、日志收集等通途。使用6.1.1版本。

一些术语

文档Document
- 用户存储在es中的数据文档
索引Index
- 有相同字段的文档Document列表组成
节点Node
- 一个es的实例，集群的构成单元
集群Cluster
- 有一个或多个节点组成，对外提供服务
shard（分片）
- 将索引切割成为的物理存储组件：但每一个shard都是一个独立且完整的索引。创建索引时，es 默认将其分割为5个shard，用户也可以按需自定义, 创建完成之后不可修改。
- shard 有2种类型：primary shard 和replica(用于数据冗余及查询时代负载均衡）每个主shard的副本数量可以自定义,且可动态修改. 每个分片都是lucene索引备份：拷贝一份分片，就完成了分片的备份

Document

json object 由字段（field)组成，常见数据类型如下

字符串：text,keyword
数值型：long,integer,short,byte,double,float,half_float,scaled_float
布尔:boolean
日期:date
二进制:binary
范围类型:integer_range,float_range,long_range,double_range,date_range

每个文档由唯一的id标识

自行指定
es自动生成

Document MetaData

元数据，用于标注文档的相关信息

_index 文档所在索引名
_type 文档所在的类型名
_id 文档唯一id
_uid 组合id，由 _type和 _id组成(6.x type 不再起作用，同 _id一样)
_source 文档的原始json数据，可以从这里获取每个字段的值
_all 整合所有字段内容到该值，默认禁用

Index 介绍

等同于数据库中的表

索引中存储具有相同结构的文档(document)

每个索引都有自己的mapping定义，用于定义字段名和类型

一个集群可以有多个索引，比如：

nginx日志存储的时候可以按照日期每天生成一个索引来存储

 nginx-log-2020-06-30
 nginx-log-2020-07-01

Restful api

Elasticsearch 集群对外提供restful api

REST ：Representational state transfer

URI指定资源，如index,Document等

http method 指明资源操作类型。如get，post，put，delete等

常用两种交互方式

curl命令
kibana devtools

启动es、kibana

1# 启动es
2bin/elasticsearch -Ecluster.routing.allocation.disk.threshold_enabled=false -Epath.data=crmao
3#浏览器  http://localhost:9200
4
5#启动kibana 使用默认配置
6bin/kibana
7# 浏览器访问 http://127.0.0.1:5601/

Index_api

1put /test_index    创建一个叫test_index的索引
2get _cat/indices   查看现有索引

document api

指定id创建
不指定id创建

指定id创建

 1put /test_index/doc/1           put /index_name/type_name/id
 2{
 3    "username":"crmao",
 4    "age":1
 5}
 6创建文档时，如果索引不存在，es会自动创建对应的index和type
 7
 8
 9{
10  "_index": "test_index",
11  "_type": "doc",
12  "_id": "1",
13  "_version": 1, //每次操作都会加1
14  "result": "created",
15  "_shards": {
16    "total": 2,
17    "successful": 1,
18    "failed": 0
19  },
20  "_seq_no": 0,
21  "_primary_term": 1
22}

不指定id创建

 1post /test_index/doc
 2{
 3    "username":"huang",
 4    "age":27
 5}
 6
 7reponse:
 8{
 9    "_index": "test_index",
10    "_type": "doc",
11    "_id": "AWvKlbzNkKWPJLWbc9FF", //自动生成id
12    "_version": 1,
13    "result": "created",
14    "_shards": {
15        "total": 2,
16        "successful": 1,
17        "failed": 0
18    },
19    "created": true
20}

查询api

 1get /test_index/doc/1
 2get /test_index/doc/_search #查全部默认10个文档
 3
 4get /test_index/doc/_search #指定id 查
 5{
 6    "query":{
 7        "term":{
 8            "_id":"1"
 9        }
10    }
11}

批量操作api

bulk的格式：

action：index/create/update/delete

metadata: _index _type _id

request body：_source (删除操作不需要加request body)

1{ `action`: { `metadata` }}
2{ `request body`        }

1post _bulk
2{"index":{"_index":"test_index","_type":"doc","_id":3}}
3{"username":"maodada1","age":10}
4{"delete":{"_index":"test_index","_type":"doc","_id":1}}
5{"update":{"_index":"test_index","_type":"doc","_id":"3"}}
6{"doc":{"age":22}}

倒排索引

正排索引：文档id到文档内容，单词的关联关系

文档id	文档内容
1	elasticsearch是最流行的搜索引擎
2	php是世界上最好的语言
3	搜索引擎是如何诞生的

倒排索引：单词到文档id的关联关系

单词	文档id列表
elasticsearch	1
流行	1
搜索引擎	1，3
php	2
世界	2
语言	2
如何	3
诞生	3

查询包含搜索引擎的文档

通过倒排搜索获得搜索引擎对应的文档id 1和3
通过正排索引查询1和3完整内容
返回用户最终结果

倒排索引详解

倒排索引是搜索引擎的核心，主要包含两部分

单词词典（term dictionary)
倒排列表（posting list)

单词词典

记录所有文档的单词，一般都比较大
记录单词到倒排列表的关联信息

倒排列表

记录了单词对应的文档集合，由倒排索引项（posting）组成

倒排索引项（posting)主要包含如下信息：

11. 文档id，用于获取原始信息
22. 单词频率（TF,Term Frequency),记录该单词在该文档中的出现次数用于后续相关性算分
33. 位置（position),记录单词在文档中分词位置（多个），用于做词语搜索(phrase query)
44. 偏移（offset)，记录单词在文档的开始和结束位置，用于做高亮显示

表1中“搜索引擎”为例

docId	TF	position	offset
1	1	2 (里面有3个单词，它是第三个所以是2)	<18,22>
3	1	0	<0,4>

分词

分词是指将文本转换成一系列单词（ term or token）也叫文本分析，在es 里面称为Analysis,如下所示

1文本： elasticsearch 是最流行的搜索引擎
2分词结果： elasticsearch 、 流行 、 搜索引擎

分词器是es中专门处理分词的组件，英文为Analyzer,它的组成如下：

Character Filters
- 针对原始文本进行处理，比如去除html特殊标记符
Tokenizer
- 将原始文本按照一定规则切分单词

Token Filters

1 针对tokenizer 处理的单词进行再加工，比如 小写， 新增， 删除等处理 (如 这 ，的， 那    这些词进行删除， 近义词 进行增加)

分词器调用顺序 Character Filters->Tokenizer->Token Filters

Analyze Api

es提供了一个测试分词的api 接口，方便验证分词效果，endpoint是 _analyze

可以直接指定 analyzer（分词器）进行测试
可以直接指定索引中的字段进行测试
可以自定义分词器进行测试

指定 analyzer（分词器）进行测试

 1post 127.0.0.1:9200/_analyze
 2{
 3	"analyzer":"standard",
 4	"text":"hello world!"
 5}
 6
 7
 8reponse:
 9{
10    "tokens": [
11        {
12            "token": "hello",
13            "start_offset": 0,
14            "end_offset": 5,
15            "type": "<ALPHANUM>",
16            "position": 0
17        },
18        {
19            "token": "world",
20            "start_offset": 6,
21            "end_offset": 11,
22            "type": "<ALPHANUM>",
23            "position": 1
24        }
25    ]
26}

指定索引中的字段进行测试

1POST text_index/_analyze
2{
3    "field":"username",  #  测试字段
4    "text":"hello world" # 测试文本
5}

自定义分词器进行测试

1POST _analyze
2{
3    "tokenizer":"standard",
4    "filter":["lowercase"],
5    "text":"Hello World"
6}

自带分词器

standard
simple
whitespace
stop
keyword
pattern
Language

1post /_analyze
2{
3	"analyzer":"simple",
4	"text":"The 2 QUICK Brown..."
5}

中文分词

IK
- 实现中英文单词的切分，支持ik_smart、ik_maxword等模式
- 可自定义词库，支持热更新分词词典
- https://github.com/medcl/elasticsearch-analysis-ik
jieba
- java中最流行的分词系统，支持分词和词性标注
- 支持繁体分词、自定义词典、并行分词等
- https://github.com/sing1ee/elasticsearch-jieba-plugin
Hanlp
THULAC

自定义分词

当自带的分词无法满足需求时，可以自定义分词

通过自定义character filters,tokenizer 和token filter 实现

character filters

在tokenizer 之前对原始文本进行处理，比如增加、删除或替换字符
自带如下：
- HTML Strip去除html标签和转换html实体
- Mapping 进行字符替换操作
- Pattern Replace 进行正则匹配替换
会影响后续在tokenizer 解析的postion和offset信息

测试使用

 1POST _analyze
 2{
 3    "tokenizer":"keyword",
 4    "char_filter":["html_strip"],
 5    "text":"<p>I&apos;m so <b>happy</b>!</p>"
 6}
 7
 8response:
 9{
10  "tokens": [
11    {
12      "token": """
13
14I'm so happy!
15
16""",
17      "start_offset": 0,
18      "end_offset": 32,
19      "type": "word",
20      "position": 0
21    }
22  ]
23}

tokenizer

将原始文本按照一定规则切分为单词（term or token）
自带的如下：
- standard 按照单词进行分割
- letter 按照非字符类型进行分割
- whitespace 按照空格进行分割
- UAX URL Email 按照standard 分割，但不会分割邮箱和url
- NGram 和 Edge NGram连词分割
- Path Hierarchy 按照文件路径进行分割

测试如下：

 1POST /_analyze
 2{
 3     "tokenizer":"path_hierarchy",
 4     "text":"/one/two/three" 
 5}
 6response:
 7{
 8  "tokens": [
 9    {
10      "token": "/one",
11      "start_offset": 0,
12      "end_offset": 4,
13      "type": "word",
14      "position": 0
15    },
16    {
17      "token": "/one/two",
18      "start_offset": 0,
19      "end_offset": 8,
20      "type": "word",
21      "position": 0
22    },
23    {
24      "token": "/one/two/three",
25      "start_offset": 0,
26      "end_offset": 14,
27      "type": "word",
28      "position": 0
29    }
30  ]
31}

Token filters

对于 tokenizer输出的单词（term）进行增加、删除、修改等操作
自带的如下：
- lowercase 将所有term转换为小写
- stop删除stop words
- Synonym添加近义词的term
- Ngram 和 Edge NGram连词分割

 1POST _analyze
 2{
 3    "text":"a Hello,World!",
 4    "tokenizer":"standard",
 5    "filter":[
 6        "stop",
 7        "lowercase",
 8        {
 9            "type":"ngram", //切割
10            "min_gram":4, //切割最小长度
11            "max_gram":4  //切割最大长度
12        }
13    ]
14}

自定义分词api

自定义分词需要再索引的配置中设定

 1PUT /test_index
 2{
 3    "settings":{
 4        "analysis":{
 5            "analyzer":{
 6                "analyzer_name":{
 7                   "char_filter":{},
 8                    "tokenizer":{},
 9                    "filter":{}, 
10                }
11            },
12            "tokenizer":{}, //自定义的
13            "char_filter":{},//自定义的
14            "filter":{} //自定义的
15        }
16    
17    }
18}

例子如下：

 1delete /test_index
 2PUT /test_index
 3{
 4    "settings":{
 5        "analysis":{
 6            "analyzer":{
 7                "my_custom_analyzer":{
 8                    "type":"custom",
 9                   "char_filter":[
10                    "html_strip"
11                   ],
12                    "tokenizer":"standard",
13                    "filter":[
14                        "lowercase",
15                        "asciifolding"
16                    ]
17                }
18            }
19            
20        }
21    }
22}

使用自定义分词器

 1POST /test_index/_analyze
 2{
 3    "analyzer":"my_custom_analyzer",
 4    "text":"Is this <b>a box</b>?"
 5}
 6
 7response:
 8{
 9  "tokens": [
10    {
11      "token": "is",
12      "start_offset": 0,
13      "end_offset": 2,
14      "type": "<ALPHANUM>",
15      "position": 0
16    },
17    {
18      "token": "this",
19      "start_offset": 3,
20      "end_offset": 7,
21      "type": "<ALPHANUM>",
22      "position": 1
23    },
24    {
25      "token": "a",
26      "start_offset": 11,
27      "end_offset": 12,
28      "type": "<ALPHANUM>",
29      "position": 2
30    },
31    {
32      "token": "box",
33      "start_offset": 13,
34      "end_offset": 20,
35      "type": "<ALPHANUM>",
36      "position": 3
37    }
38  ]
39}

分词使用说明

分词会在如下两个时机使用：

创建或更新文档时（index time），会对相应的文档进行分词处理
查询时（search time),会对查询语句进行分词,分词后进行查询

索引时分词

索引时分词是通过配置index mapping 的中每个字段的analyzer属性实现的，不指定分词时，使用默认的standard。

 1PUT test_index
 2{
 3    "mappings":{
 4        "doc":{
 5            "properties":{
 6                "title":{
 7                    "type":"text",
 8                    "analyzer":"whitespace" //指定分词器
 9                }
10            }
11        }
12    }
13}

查询时分词

查询时分词的指定方式

查询的时候通过analyzer指定分词器
通过index mapping设置 search_analyzer实现（用这个）

 1POST /test_index/_search
 2{
 3    "query":{
 4        "match":{
 5            "message":{
 6                "query":"hello",
 7                "analyzer":"standard"
 8            }
 9        }
10    }
11}
12
13put /test_index
14{
15    "mappings":{
16        "doc":{
17            "properties":{
18                "title":{
19                    "type":"text",
20                    "analyzer":"whitespace",
21                    "search_analyzer":"whitespace",
22                }
23            }
24        }
25    }
26}

一般不需要特别指定查询时分词器，直接使用索引时分词器即可，否则会出现无法匹配的情况

分次使用建议

名气是否需要分词，不需要分析的字段，将type设置为keyword，省空间和提高写性能
善用_analyze api 查看文档的具体分词结果
动手测试

mapping 设置

类似数据中的表结构定义，主要作用如下：

定义index下单字段名（field name）
定义字段的类型，比如数值型，字符型，布尔型等
定义倒排索引相关的配置，比如是否索引，记录position等

查看索引

 1get  127.0.01:9200/test_index/_mapping    
 2{
 3    "test_index": {
 4        "mappings": {
 5            "doc": {  //doc  是type 
 6                "properties": {
 7                    "age": {
 8                        "type": "long"
 9                    },
10                    "username": {
11                        "type": "text",
12                        "fields": {
13                            "keyword": {
14                                "type": "keyword",
15                                "ignore_above": 256
16                            }
17                        }
18                    }
19                }
20            }
21        }
22    }
23}

自定义mapping

 1//创建索引
 2put my_index
 3{
 4     "mappings": {
 5            "doc": {  //doc  是type 
 6                "properties": {
 7                    "age": {
 8                        "type": "long"
 9                    },
10                    "username": {
11                        "type": "text",
12                        "fields": {
13                            "keyword": {
14                                "type": "keyword",
15                                "ignore_above": 256
16                            }
17                        }
18                    }
19                }
20            }
21        }
22}

mapping中的字段类型一旦设定后，禁止直接修改，原因如下：

lucene实现的倒排索引生成后不允许修改
重新建立新的索引，然后做reindex操作
允许新增字段
通过dynamic参数来控制字段的新增
- true（默认）允许自动新增字段
- false 不允许自动新增字段，但是文档可以正常写入，但无法对字段做查询操作
- strict文档不能写入报错

如果一个字段是object类型，可以针对这个字段设置dynamic 属性，那么作用域只针对这个字段的。

索引设置dynamic 参数为false演示

  1DELETE  /my_index 
  2//创建索引 指定dynamic 为false
  31. put /my_index
  4{
  5     "mappings": {
  6            "doc": {  
  7            	"dynamic":false,  
  8                "properties": {
  9                    "age": {
 10                        "type": "integer"
 11                    },
 12                    "username": {
 13                        "type": "text"
 14                        }
 15                    }
 16                    
 17                }
 18            }
 19        }
 20}
 21
 22
 232. put /my_index/doc/1
 24{
 25    "username":"othing here",
 26    "desc":"nothing here"            //不存在的字段
 27}
 28
 29
 303. get /my_index/doc/1
 31reponse:
 32{
 33    "_index": "my_index",
 34    "_type": "doc",
 35    "_id": "1",
 36    "_version": 1,
 37    "found": true,
 38    "_source": {
 39        "username": "othing here",
 40        "desc": "nothing here"
 41    }
 42}
 43
 44
 454. get /my_index/doc/_search
 46{
 47    "query":{
 48        "match":{
 49            "username":"here"
 50        }
 51    }
 52}
 53reponse: 得到一条结果
 54{
 55    "took": 36,
 56    "timed_out": false,
 57    "_shards": {
 58        "total": 5,
 59        "successful": 5,
 60        "skipped": 0,
 61        "failed": 0
 62    },
 63    "hits": {
 64        "total": 1,
 65        "max_score": 0.25811607,
 66        "hits": [
 67            {
 68                "_index": "my_index",
 69                "_type": "doc",
 70                "_id": "1",
 71                "_score": 0.25811607,
 72                "_source": {
 73                    "username": "nothing here",
 74                    "desc": "nothing here"
 75                }
 76            }
 77        ]
 78    }
 79}
 80
 81
 825. get /my_index/doc/_search
 83{
 84    "query":{
 85        "match":{
 86            "desc":"here"
 87        }
 88    }
 89}
 90reponse: 没有结果
 91{
 92    "took": 17,
 93    "timed_out": false,
 94    "_shards": {
 95        "total": 5,
 96        "successful": 5,
 97        "skipped": 0,
 98        "failed": 0
 99    },
100    "hits": {
101        "total": 0,
102        "max_score": null,
103        "hits": []
104    }
105}

copy_to 参数选项

将字段的值复制到目标字段，实现类似_all的作用
不会出现在_source中，只用来搜索

 1DELETE /my_index
 2put /my_index
 3{
 4    "mappings":{
 5        "doc":{
 6            "properties":{
 7                "first_name":{
 8                    "type":"text",
 9                    "copy_to":"full_name"
10                },
11                "last_name":{
12                    "type":"text",
13                    "copy_to":"full_name"
14                },
15                "full_name":{
16                    "type":"text"
17                }       
18            }   
19        }
20    } 
21}
22
23put /my_index/doc/1
24{
25    "first_name":"mao"
26    "last_name":"zhongyu"
27}
28
29get /my_index/doc/1
30{
31    "_index": "my_index",
32    "_type": "doc",
33    "_id": "1",
34    "_version": 1,
35    "found": true,
36    "_source": {
37        "first_name": "mao",
38        "last_name": "zhongyu"
39    }
40}
41
42
43get  /my_index/doc/_search
44{
45    "query":{
46    	"match":{
47                  "full_name":{
48                      "query":"mao zhongyu",
49                      "operator":"and"     // and 必须包含mao zhongyu搜的出来
50                  }
51    	}
52  
53    }
54}
55
56reponse:
57{
58    "took": 3,
59    "timed_out": false,
60    "_shards": {
61        "total": 5,
62        "successful": 5,
63        "skipped": 0,
64        "failed": 0
65    },
66    "hits": {
67        "total": 1,
68        "max_score": 0.51623213,
69        "hits": [
70            {
71                "_index": "my_index",
72                "_type": "doc",
73                "_id": "1",
74                "_score": 0.51623213,
75                "_source": {
76                    "first_name": "mao",
77                    "last_name": "zhongyu"
78                }
79            }
80        ]
81    }
82}

index 参数选项

控制当前字段是否索引，默认为true，即记录索引，false不记录，即不可搜索

设置false，可以不存倒排索引能省大量的空间

 1DELETE /my_index
 2put /my_index
 3{
 4    "mappings":{
 5        "docs":{
 6            "properties":{
 7                "cookie":{
 8                    "type":"text",
 9                    "index":false
10                }
11            }
12        }
13    }
14}
15
16put /my_index/doc/1
17{
18    "cookie":"name=maozhongyu"
19}
20
21get /my_index/doc/_search //报错
22{
23	"query":{
24        "match":{
25            "cookie":"name"
26        }
27	}    
28}

index_options 参数选项

index_options 用于控制倒排索引记录的内容，有如下4种配置

docs 只记录doc id
frees 记录doc id 和term frequencies
options 记录doc id 、term frequencies和 term position
offsets记录doc id 、term frequencies 、term position 和 character offsets

text类型的默认设置为 options 其他默认为docs

记录的内容越多，占用空间越大

 1举例
 2put /my_index
 3{
 4    "mappings":{
 5        "doc":{
 6            "properties":{
 7                "cookie":{
 8                    "type":"text",
 9                    "index_options":"offsets"
10                }
11            }
12        }
13    }
14}

null_value 参数选项

当字段遇到null 值时代处理策略，默认为null，即空值，此时es为忽略该值。可以通过设定该值设定字段的默认值

 1put /my_index
 2{
 3   "mappings":{
 4        "doc":{
 5            "properties":{
 6                "status_code":{
 7                    "type":"integer",
 8                    "null_value":500,
 9                }
10            }
11        }
12    }
13}

数据类型

核心数据类型

https://www.elastic.co/guide/en/elasticsearch/reference/6.0/mapping-types.html

字符串 text（分词）、keyword （不分词）
数值型 long、integer、short、byte、double、float、half_float、scaled_float
日期类型 date
布尔类型 boolean
二进制类型 binary
范围类型 integer_range 、float_range、long_range、double_range、date_range

复制数据类型

数组类型 array 其实就是多了几个term
对象类型 object for single JSON objects (子字段)
嵌套类型 nested for arrays of JSON objects

地理位置数据类型

geo_point
geo-shape

专用类型

记录ip地址 ip
实现自动补全 completion
记录分词数 token_count
记录字符串hash值 murmur3

….

多字段特性 multi-fields

允许对同一个字段采用不同的配置，比如分词。

举例如对人对姓名进行搜索，还要支持拼音搜索

只需要在人名中新增一个子字段为pinyin即可，不用横向加一个字段pinyin ，然后对pinyin这个字段进行搜索

 1{
 2    "test_index":{
 3        "mappings":{
 4            "doc":{
 5                "properties":{
 6                    "username":{
 7                        "type":"text",
 8                        "fields":{
 9                            "pinyin":{
10                                "type":"text",
11                                "analyzer":"pinyin"
12                            }
13                        }
14                    }
15                }
16            }
17        }
18    }
19}
20
21
22搜索
23get /text_index/doc/_search
24{
25    "query":{
26        "match":{
27            "username.pinyin":"search  name"
28        }
29    }
30}

dynamic mapping

dynamic field mapping

如果没有索引的情况下，可以动态创建索引

 1DELETE /my_index
 2put /my_index/doc/1
 3{
 4    "username":"maozhongyu",
 5    "age":1,
 6    "birth":"1988-10-01",
 7    "married":false,
 8    "year":"18",
 9    "tags":["boy","fashion"],
10    "money":100.1,
11    "fnum":"100.1"
12}
13get  /my_index/_mapping
14{
15    "my_index": {
16        "mappings": {
17            "doc": {
18                "properties": {
19                    "age": {
20                        "type": "long"
21                    },
22                    "birth": {
23                        "type": "date"
24                    },
25                    "fnum": {
26                        "type": "text",
27                        "fields": {
28                            "keyword": {
29                                "type": "keyword",
30                                "ignore_above": 256
31                            }
32                        }
33                    },
34                    "married": {
35                        "type": "boolean"
36                    },
37                    "money": {
38                        "type": "float"
39                    },
40                    "tags": {
41                        "type": "text",
42                        "fields": {
43                            "keyword": {
44                                "type": "keyword",
45                                "ignore_above": 256
46                            }
47                        }
48                    },
49                    "username": {
50                        "type": "text",
51                        "fields": {
52                            "keyword": {
53                                "type": "keyword",
54                                "ignore_above": 256
55                            }
56                        }
57                    },
58                    "year": {
59                        "type": "text",
60                        "fields": {
61                            "keyword": {
62                                "type": "keyword",
63                                "ignore_above": 256
64                            }
65                        }
66                    }
67                }
68            }
69        }
70    }
71}

es是依靠json你文档的字段类型来实现自动识别字段类型

上面string 自动识别的时候，默认日期开启，浮点数和整数字符串未开启

日期识别

默认开启日期识别，

默认的识别

[ "strict_date_optional_time","yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z"]

所有 strict_date_optional_time 格式参见

https://www.elastic.co/guide/en/elasticsearch/reference/6.0/mapping-date-format.html#strict-date-time

 1 date-opt-time     = date-element ['T' [time-element] [offset]]
 2 date-element      = std-date-element | ord-date-element | week-date-element
 3 std-date-element  = yyyy ['-' MM ['-' dd]]
 4 ord-date-element  = yyyy ['-' DDD]
 5 week-date-element = xxxx '-W' ww ['-' e]
 6 time-element      = HH [minute-element] | [fraction]
 7 minute-element    = ':' mm [second-element] | [fraction]
 8 second-element    = ':' ss [fraction]
 9 fraction          = ('.' | ',') digit+
10

1举例
2PUT my_index/my_type/1
3{
4  "create_date": "2015/09/02"
5}
6
7GET my_index/_mapping

禁用日期识别 ,date_detection参数设置为false

 1PUT my_index
 2{
 3  "mappings": {
 4    "my_type": {
 5      "date_detection": false
 6    }
 7  }
 8}
 9
10PUT my_index/my_type/1 
11{
12  "create": "2015/09/02"
13}
14The create_date field has been added as a text field.

自定义日期识别

设置dynamic_date_formats 参数指定格式

 1举例
 2PUT my_index
 3{
 4  "mappings": {
 5    "my_type": {
 6      "dynamic_date_formats": ["MM/dd/yyyy"]
 7    }
 8  }
 9}
10
11PUT my_index/my_type/1
12{
13  "create_date": "09/25/2015"
14}

数字识别

设置 numeric_detection 参数为true ，默认不开启

 1PUT my_index
 2{
 3  "mappings": {
 4    "my_type": {
 5      "numeric_detection": true
 6    }
 7  }
 8}
 9
10PUT my_index/my_type/1
11{
12  "my_float":   "1.0", 
13  "my_integer": "1" 
14}
15
16	
17The my_float field is added as a double field.
18The my_integer field is added as a long field.

links

下载地址

Elasticsearch手册地址

Elasticsearch-PHP