Elasticsearch Mapping设置

2021-06-02 leiting (5098阅读)

标签 Elasticsearch

Mapping类似数据库中的表结构定义，主要作用如下：

定义Index下的字段名（Field Name）
定义字段的类型，比如数据型、字符串型、布尔型等
定义倒排索引相关配置，比如是否索引、记录position等

自定义mapping：

Mapping中字段类型一旦设定后，禁止直接修改（Lucene实现的倒排索引生成后不允许修改）
重新建立新的索引，然后做reindex操作
允许新增字段
通过dynamic参数来控制字段的新增

true:默认值，允许自动新增字段
false：不允许自动新增字段，但是文档可以正常写入，但无法对字段进行查询等操作
strict：文档不能写入，报错

通过实例来演示dynamic参数的用法：

#定义索引，定义title、name、age三个字段类型，对于其他新增字段dynamic设置为false
PUT  myindex
{
  "mappings": {
    "doc": {
      "dynamic": false, 
      "properties": {
        "title": {
          "type": "text"
        },
        "name": {
          "type": "keyword"
        },
        "age": {
          "type": "integer"
        }
      }
    }
  }
}

#查看刚才自定义的mapping
GET myindex/_mapping

#索引一条文档，字段title、desc,其中desc为新增字段
PUT myindex/doc/1
{
  "title": "hello world",
  "desc": "nothing"
}

#使用title字段查询，一切正常
GET myindex/_search
{
  "query": {
    "match": {
      "title": "hello"
    }
  }
}

#无法使用desc字段进行查询，返回为0
GET myindex/_search
{
  "query": {
    "match": {
      "desc": "nothing"
    }
  }
}

参数说明
（一）index：控制当前字段是否索引，默认为true,即记录索引，false不记录，即不可搜索

PUT  myindex
{
  "mappings": {
    "doc": {
      "properties": {
        "cookie": {
          "type": "text",
          "index":false 
        }
      }
    }
  }
}
#使用cookie字段查询会报错

（二）index_options：用于控制倒排索引记录的内容，有如下4种配置

docs: 只记录doc id
freqs: 记录doc id 和term frequencies
positions: 记录doc id、term frequencies和term position
offsets: 记录doc id、term frequencies、term position和character offsets

text类型默认配置为positions,其他默认为docs
记录内容越多，占用空间越大。

PUT  myindex1
{
  "mappings": {
    "doc": {
      "properties": {
        "cookie": {
          "type": "text",
          "index_options": "offsets" 
        }
      }
    }
  }
}

（三）null_value: 当字段遇到null值时的处理策略，默认为null,即空值，此时es会忽略该值。可以通过设定该值设定字段的默认值

PUT  myindex1
{
  "mappings": {
    "doc": {
      "properties": {
        "status_code": {
          "type": "keyword",
          "null_value": "NULL" 
        }
      }
    }
  }
}

数据类型

核心数据类型：

字符串型：text、keyword
数值型：long、integer、short、byte、double、float、half_float、scaled_float
日志类型：date
布尔类型：boolean
二进制类型：binary
范围类型：integer_range、float_range、long_range、double_range、date_range

复杂数据类型：

属组类型：array
对象类型：object
嵌套类型：nested object

地理位置数据类型：

geo_point
geo_shape

专用类型：

记录IP地址 ip
实现自动补全 completion
记录分词数 token_count
记录字符串hash值 murmur3
percolator
join

multi-fields多字段特性
允许对同一个字段采用不同的配置，比如分词，常见的例子如对人名实现拼音搜索，只需要在人名中新增一个子字段为pinyin即可。

PUT  myindex1
{
  "mappings": {
    "doc": {
      "properties": {
        "username": {
          "type":"text",
          "fields": {
            "pinyin": {
              "type": "text",
              "analyzer": "pinyin"
            }
          }
        }
      }
    }
  }
}       

GET myindex1/_search
{
  "query": {
    "match": {
      "username.pinyin": "hanhan"
    }
  }
}

Dynamic Mapping

动态字段映射

es可以自动识别文档字段类型，从而降低用户使用成本，如下所示：

es自动识别age为long类型，username为text类型
es是依靠JSOn文档的字段类型来实现自动识别字段类型，支持的类型如下：

验证es自动识别：

PUT my_index2/doc/1
{
  "username": "user1",
  "age":14,
  "birth":"1988-10-10",
  "married": false,
  "year": "18",
  "tags": ["boy","fashion"],
  "money":100.1
}

Dynamic日期与数字识别

日期识别：

默认是[ "strict_date_optional_time","yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z"]
"strict_date_optional_time"是ISO datetime的格式，完整格式类似为：YYYY-MM-DDThh:mm:ssTZD(eg 1997-07-16T19:20:30+01:00)

dynamic_date_formats可以自定义日期类型
date_detection可以关闭日期自动识别的机制

PUT my_index
{
  "mappings": {
    "my_type":{
      "dynamic_date_formats": ["yyyy-MM-dd"]
    }
  }
}

PUT my_index/my_type/1
{
  "create_date": "2015-09-02"
}

GET my_index/_mapping

数字识别：
字符串是数字时，默认不会自动识别为整型，因为字符串中出现数字数完全合理的
numeric_detection可以开启字符创中数字的自动识别功能。如下所示：

PUT my_index
{
  "mappings": {
    "my_type":{
      "numeric_detection":"true"
    }
  }
}

PUT my_index/my_type/1
{
  "my_float": "1.0",
  "my_integer": "1"
}

GET my_index/_mapping

动态模板

允许根据es自动识别的数据类型、字段名等动态设定字段类型，可以实现如下效果：

所有字符串都设定为keyword类型，即默认不分词
所有以message开头字段都设定为text类型，即分词
所有以long_开头的字段都设定为long类型
所有自动匹配为double类型的都设定为float类型，以节省空间

匹配规则一般有如下几个参数：

match_mapping_type: 匹配es自动识别的字段类型，如boolean,long,string等
match,unmatch：匹配字段名
path_match,path_unmatch: 匹配路径

Dynamic Template API
（一）设置字符串默认使用keyword类型

es默认会为字符串设置为text类型，并增加一个keyword的子字段

PUT test_index
{
  "mappings": {
    "doc": {
      "dynamic_templates": [
          {
            "strings_as_keywords": {
              "match_mapping_type": "string",
              "mapping":{
                "type": "keyword"
              }
            }
          }
        ]
    }
  }
}

PUT test_index/doc/1
{
  "name": "alfred"
}

GET test_index/_mapping

（二）设置以message开头的字段都设置为text类型（顺序由上而下）

PUT test_index
{
  "mappings": {
    "doc": {
      "dynamic_templates": [
          {
            "message_as_text": {
              "match_mapping_type": "string",
              "match": "message*",
              "mapping":{
                "type": "text"
              }
            }
          },
          {
            "strings_as_keywords": {
              "match_mapping_type": "string",
              "mapping":{
                "type": "keyword"
              }
            }
          }          
        ]
    }
  }
}

自定义mapping建议

自定义mapping的操作步骤如下：

写入一条文档到es的临时索引中，获取es自动生成的mapping
修改步骤1得到的mapping，自定义相关配置
使用步骤2的mapping创建实际所需索引

（一）索引一条文档到es的临时索引中，并查看默认mapping：

PUT test_index/doc/1
{
  "referer": "-",
  "response_code": "200",
  "remote_ip": "192.168.20.200",
  "method": "POST",
  "user_name": "-",
  "http_version": "1.1",
  "body_sent": {
    "bytes": "0"
  },
  "url": "/analyzevideo"
}

默认会为字符串设置为text类型，并增加一个keyword的子字段
（二）根据默认的mapping进行自定义修改：
设置bytes字段类型为long,url字段类型为text，其余字段类型为keyword

PUT test_index
{
    "mappings": {
      "doc": {
        "properties": {
          "body_sent": {
            "properties": {
              "bytes": {
                "type": "long"
              }
            }
          },
          "http_version": {
            "type": "keyword"
          },
          "method": {
            "type": "keyword"
          },
          "referer": {
            "type": "keyword"
          },
          "remote_ip": {
            "type": "keyword"
          },
          "response_code": {
            "type": "keyword"
          },
          "url": {
            "type": "text"
          },
          "user_name": {
            "type": "keyword"
          }
        }
      }
    }
}

（3）使用动态模板对上边自定义的mapping做进一步优化

PUT test_index
{
    "mappings": {
      "doc": {
       "dynamic_templates": [
          {
            "strings_as_keywords": {
              "match_mapping_type": "string",
              "mapping":{
                "type": "keyword"
              }
            }
          }
        ],
        "properties": {
          "body_sent": {
            "properties": {
              "bytes": {
                "type": "long"
              }
            }
          },
          "url": {
            "type": "text"
          }
        }
      }
    }
}

索引模板

索引模板：英文为 Index Template，主要用于在新建索引时自动应用预先设定的配置，简化索引创建的操作步骤。

可以设定索引的配置和mapping
可以有多个模板，根据order设置，order大的覆盖小的配置

索引模板API：
索引模板API,endpoint为_template,如下所示：

https://blog.csdn.net/wfs1994/article/details/80766935

IT PHP 编程语言开发编程 Linux 科技 Elasticsearch HTML/CSS/XML 面试数据库网络 JAVA NoSQL C/C++ Golang 操作系统 Git 算法正则表达式 Redis 互联网 MySql 软件运维 JavaScript 国际架构设计 Mac OS TCP/IP Excel Windows Oracle Socket VR Vim MongoDB 运营 Python MemCache 商业硬件电子娱乐设计摄影 nginx WordPress 游戏 HTTP 团建数码电器 Docker

LC-MS技术在复杂样品分析中的独特优势 vue-element-plus-admin控制TreeSelect树形下拉的可选择属性 BD经理与销售经理如何重塑企业增长引擎？科学分析中样品瓶选择对色谱与质谱检测结果的影响及 SureSTART 解决方案原子吸收光谱仪与液相色谱仪如何各展所长？解码服务要素对价格的影响机制公务机包机如何实现全球旅程无缝衔接？

略微加速

略速 - 互联网笔记