elasticSearch~中文分词器安装及使用

分词API请求方式: postURL：http://192.168.18.129:9200/_analyze请求{"analyzer":"standard",#标准分词器"text":"hello world"}响应{"tokens": [{"token": "hello","start_offset": 0,"end...

17245

649人浏览 · 2020-12-19 16:39:00

17245 · 2020-12-19 16:39:00 发布

分词API

请求方式: post

URL：http://192.168.18.129:9200/_analyze

请求

{
    "analyzer":"standard",  #标准分词器
    "text":"hello world"
}

响应

{
    "tokens": [
        {
            "token": "hello",
            "start_offset": 0,
            "end_offset": 5,
            "type": "<ALPHANUM>",
            "position": 0
        },
        {
            "token": "world",
            "start_offset": 6,
            "end_offset": 11,
            "type": "<ALPHANUM>",
            "position": 1
        }
    ]
}

中文分词设置

因为es提供的分词是英文分词，对于中文的分词就做的非常不好了，因此我们需要一个中文分词器来用于搜索和使用。常见的分词器：IK, jieba ...

安装IK分词器

本地安装方式

1.github下载对应的分词插件（对应ES的版本下载）下载慢建议用浏览器github加速插件进行加速

https://github.com/medcl/elasticsearch-analysis-ik/releases

2.安装

上传文件并解压到 elasticsearch安装路径下的plugins目录下

远程安装

   elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.0.0/elasticsearch-analysis-ik-7.0.0.zip

安装完成，重启ES会看到加载了ik分词了

验证

post请求：http://192.168.18.129:9200/_analyze
请求

{
    "analyzer":"ik_max_word",
    "text":"中文分词"
}

响应

{
    "analyzer":"ik_max_word",
    "text":"中文分词"
}

例子

新建一个名称为accounts的 Index，里面有一个名称为person的 Type。person有三个字段。user title desc。这三个字段都是中文，而且类型都是文本（text），所以需要指定中文分词器，不能使用默认的英文分词器。Elastic 的分词器称为 analyzer。我们对每个字段指定分词器。

$ curl -X PUT '192.168.18.129:9200/accounts' -d '
{
  "mappings": {
    "person": {
      "properties": {
        "user": {
          "type": "text",
          "analyzer": "ik_max_word",
          "search_analyzer": "ik_max_word"
        },
        "title": {
          "type": "text",
          "analyzer": "ik_max_word",
          "search_analyzer": "ik_max_word"
        },
        "desc": {
          "type": "text",
          "analyzer": "ik_max_word",
          "search_analyzer": "ik_max_word"
        }
      }
    }
  }
}'