ElasticSearch中的分词器详解

概述

分词器（Analyzer）是Elasticsearch全文检索的核心组件，负责将文本内容拆分为一系列独立的词项（Term），同时完成大小写转换、特殊字符过滤、同义词替换、停词移除等预处理工作，直接决定检索的准确性和性能。

一个完整的分词器由三部分组成：

Character Filter（字符过滤器）：预处理原始文本，如删除HTML标签、替换特殊字符
Tokenizer（分词器核心）：将文本拆分为词项
Token Filter（词项过滤器）：对拆分后的词项做二次处理，如转小写、删除停词、添加同义词

内置分词器

ES中内置了挺多的分词器，可以简单看一下。

standard

这是ES中默认的分词器，通常用于英文文本等通用场景，其是按单词边界拆分，转小写，支持删除停词（默认关闭），不适合中文，中文会拆分为单个汉字。

示例：

中文

可以看到standard把每个中文都拆分为了一个词

# 请求
POST /_analyze
{
  "analyzer": "standard",
  "text": "我是中国人"
}

# 响应
{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "",
      "position" : 1
    },
    {
      "token" : "中",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "",
      "position" : 2
    },
    {
      "token" : "国",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "",
      "position" : 3
    },
    {
      "token" : "人",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "",
      "position" : 4
    }
  ]
}

英文

可以看到standard把以空格为分界线，把每个单词都转为小写提取出来

# 请求
POST /_analyze
{
  "analyzer": "standard",
  "text": "I Love You Very Much"
}

# 响应
{
  "tokens" : [
    {
      "token" : "i",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "",
      "position" : 0
    },
    {
      "token" : "love",
      "start_offset" : 2,
      "end_offset" : 6,
      "type" : "",
      "position" : 1
    },
    {
      "token" : "you",
      "start_offset" : 7,
      "end_offset" : 10,
      "type" : "",
      "position" : 2
    },
    {
      "token" : "very",
      "start_offset" : 11,
      "end_offset" : 15,
      "type" : "",
      "position" : 3
    },
    {
      "token" : "much",
      "start_offset" : 16,
      "end_offset" : 20,
      "type" : "",
      "position" : 4
    }
  ]
}

simple

适用于简单英文文本，其按非字母字符拆分，自动转小写，数字、特殊字符会被完全过滤

示例：

纯中文

# 请求
POST /_analyze
{
  "analyzer": "simple",
  "text": "我是中国人"
}

# 响应
{
  "tokens" : [
    {
      "token" : "我是中国人",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "word",
      "position" : 0
    }
  ]
}

中文+数字

可以看到其是按照数字分割出字符

# 请求
POST /_analyze
{
  "analyzer": "simple",
  "text": "我1是2中国3人"
}

# 响应
{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "中国",
      "start_offset" : 4,
      "end_offset" : 6,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "人",
      "start_offset" : 7,
      "end_offset" : 8,
      "type" : "word",
      "position" : 3
    }
  ]
}

英文+数字

# 请求
POST /_analyze
{
  "analyzer": "simple",
  "text": "Lo1ve Y2ou"
}

# 响应
{
  "tokens" : [
    {
      "token" : "lo",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "ve",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "y",
      "start_offset" : 6,
      "end_offset" : 7,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "ou",
      "start_offset" : 8,
      "end_offset" : 10,
      "type" : "word",
      "position" : 3
    }
  ]
}

keyword

其适用于精确匹配字段（手机号、身份证、枚举值），keyword不做任何拆分，将整个文本作为一个词项，不支持模糊检索，适合需要精确匹配的字段。

示例：

英文

POST /_analyze
{
  "analyzer": "keyword",
  "text": "I Love You"
}

# 响应
{
  "tokens" : [
    {
      "token" : "I Love You",
      "start_offset" : 0,
      "end_offset" : 10,
      "type" : "word",
      "position" : 0
    }
  ]
}

中文

# 请求
POST /_analyze
{
  "analyzer": "simple",
  "text": "我是中国人"
}

# 响应
{
  "tokens" : [
    {
      "token" : "我是中国人",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "word",
      "position" : 0
    }
  ]
}

whitespace

whitespace适用于空格分隔的结构化文本，它仅按空格拆分，不做其他处理，适合已经预分词的文本。

示例：

中文

POST /_analyze
{
  "analyzer": "whitespace",
  "text": "我是 中国人" #这儿有空格
}

# 响应
{
  "tokens" : [
    {
      "token" : "我是",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "中国人",
      "start_offset" : 3,
      "end_offset" : 6,
      "type" : "word",
      "position" : 1
    }
  ]
}

英文

POST /_analyze
{
  "analyzer": "whitespace",
  "text": "I Love You"
}

# 响应
{
  "tokens" : [
    {
      "token" : "I",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "Love",
      "start_offset" : 2,
      "end_offset" : 6,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "You",
      "start_offset" : 7,
      "end_offset" : 10,
      "type" : "word",
      "position" : 2
    }
  ]
}

stop

stop适用于英文纯文本场景，它基于基于Simple Analyzer，额外移除英文停词（the/a/an等），停词列表可自定义，不支持中文停词。

示例

# 省略，stop分词器在国内基本不使用，这儿没什么好写的，可自行尝试

pattern

pattern适用于格式固定的文本，它基于正则表达式拆分文本，正则性能较差，避免用于大文本字段
示例：

# 省略，pattern分词器在生产环境基本不使用，这儿没什么好写的，可自行尝试

fingerprint

fingerprint适用于去重、聚类场景，它对文本归一化后生成唯一指纹，用于内容去重，适合新闻、文档重复的场景。

示例：

# 请求
POST /_analyze
{
  "analyzer": "fingerprint",
  "text": "我是中国人，你也是中国人"
}

# 响应
{
  "tokens" : [
    {
      "token" : "中 也 人 你 国 我 是",
      "start_offset" : 0,
      "end_offset" : 12,
      "type" : "fingerprint",
      "position" : 0
    }
  ]
}

生产环境常用第三方分词器

通过上面可以发现，内置的分词器很多都是仅支持英文，对中文的支持度很低。

英文这东西都是国外开发的，国内吗，嘿嘿，你懂的

IK分词器（最主流中文分词，生产首选）

IK分词器基于正向最大匹配（Forward Maximum Matching, FMM）和逆向最大匹配（Backward Maximum Matching, BMM）等算法，通过对文本的多遍扫描和匹配，实现中文词汇的准确切分。这种算法能够较为准确地处理中文文本中的词汇边界问题。

核心特点

支持两种分词模式：
- ik_max_word：最细粒度拆分，尽可能多的匹配词，适合索引阶段使用
- ik_smart：最粗粒度拆分，避免重复，适合查询阶段使用
支持自定义扩展词典、停词词典
支持词典热更新（无需重启ES）

安装IK分词器

官网：https://github.com/infinilabs/analysis-ik

在线安装

集群中所有节点执行：

# 注意版本号需要和ES集群的版本号一致
root@master:~# elasticsearch-plugin install https://get.infini.cloud/elasticsearch/analysis-ik/7.17.26
-> Installing https://get.infini.cloud/elasticsearch/analysis-ik/7.17.26
-> Downloading https://get.infini.cloud/elasticsearch/analysis-ik/7.17.26
[=================================================] 100%   
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@     WARNING: plugin requires additional permissions     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
* java.net.SocketPermission * connect,resolve
See https://docs.oracle.com/javase/8/docs/technotes/guides/security/permissions.html
for descriptions of what these permissions allow and the associated risks.

Continue with installation? [y/N]y # 这里输入y
-> Installed analysis-ik
-> Please restart Elasticsearch to activate any plugins installed
# 修改所属者
root@master:~# chown elasticsearch:elasticsearch -R /data00/software/elasticsearch-7.17.26
# 查看一下
root@master:~# ll /data00/software/elasticsearch-7.17.26/plugins/
total 8
# 安装的ik分词器
drwxr-xr-x 2 elasticsearch elasticsearch 4096 Apr 20 11:26 analysis-ik
drwxr-xr-x 2 elasticsearch elasticsearch 4096 Apr 16 15:45 repository-s3

# 最后滚动重启ES，保证业务不受影响
root@master:~# systemctl restart elasticsearch.service

离线安装

##下载 IK，将下载好的包上传至集群中
https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.17.26/elasticsearch-analysis-ik-7.17.26.zip

# 创建目录
root@master:~# mkdir /data00/software/elasticsearch-7.17.26/plugins/ik

# 解压包
root@master:~# unzip elasticsearch-analysis-ik-7.17.26.zip -d /data00/software/elasticsearch-7.17.26/plugins/ik/

# 修改所属者
root@master:~# chown -R elasticsearch:elasticsearch /data00/software/elasticsearch-7.17.26/plugins/ik/

# 最后滚动重启所有ES节点，保证业务不受影响
root@master:~# systemctl restart elasticsearch.service

测试IK分词器使用

ik_smart

最粗粒度拆分，避免重复，适合查询阶段使用，ik_smart通常分词较于ik_max_word比较合理，精准度也比较高

# 请求
POST /_analyze
{
  "analyzer": "ik_smart",
  "text": "我是中国人"
}

# 响应
{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "中国人",
      "start_offset" : 2,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 2
    }
  ]
}

ik_max_word

最细粒度拆分，尽可能多的匹配词，适合索引阶段使用，它是细粒度分词，穷尽所有可能，召回率高

示例：

#请求
POST /_analyze
{
  "analyzer": "ik_max_word",
  "text": "我是中国人"
}

# 响应
{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "中国人",
      "start_offset" : 2,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "中国",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "国人",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 4
    }
  ]
}

扩展，配置IK本地词典。

需求，当我们发现分词器拆分出来的词不符合我们的要求时，可以自定义一下。

实操，所有节点都需执行

# 修改ik配置文件
root@master:~# vim /data00/software/elasticsearch-7.17.26/config/analysis-ik/IKAnalyzer.cfg.xml
# 文件内容



        IK Analyzer 扩展配置
        
        ik_diy.dic # 主要是这里
         
        
        
        
        
        


# 修改词典内容
root@master:~# vim /data00/software/elasticsearch-7.17.26/config/analysis-ik/ik_diy.dic
我是中国人
你也是中国人
中国
中华人民共和国

# 修改权限
root@master:~# chown elasticsearch:elasticsearch -R /data00/software/elasticsearch-7.17.26

# 滚动更新重启ES
root@master:~# systemctl restart elasticsearch.service

验证是否生效

ik_smart

POST /_analyze
{
  "analyzer": "ik_smart",
  "text": "我是中国人"
}

# 响应
{
  "tokens" : [
    {
      "token" : "我是中国人",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 0
    }
  ]
}

ik_max_word

POST /_analyze
{
  "analyzer": "ik_max_word",
  "text": "我是中国人"
}

# 响应
{
  "tokens" : [
    {
      "token" : "我是中国人",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "中国人",
      "start_offset" : 2,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "中国",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "国人",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 3
    }
  ]
}

扩展，配置IK远程词典（生产环境推荐）

远程词典是生产环境首选的词典管理方案，无需重启ES节点即可实现词典热更新，适合需要频繁更新业务词、网络热词、停词的场景（如电商、内容平台、社交产品等）。

支持扩展词、停词两种远程词典类型
自动检测更新：默认每分钟检查一次词典是否变更
支持多个远程词典同时加载
全节点自动同步，保证所有节点分词结果一致

IK远程词典服务端配置要求

只要是支持HTTP/HTTPS的静态资源服务都可以作为远程词典服务（Nginx、Apache、对象存储、自研接口均可），需要满足以下要求：

响应内容：纯文本，UTF-8 无BOM编码，每行一个词/停词，行分隔符用\n（不要用Windows的\r\n）
响应头必须包含Last-Modified和ETag两个字段（IK分词器通过这两个字段判断词典是否需要更新）
响应Content-Type: text/plain; charset=utf-8

配置步骤

以Nginx举例：

配置nginx

# 安装nginx
root@master:~# apt install nginx
# 启动nginx
root@master:~# systemctl start nginx

# 创建词典目录和文件
root@master:~# mkdir -p /data00/data/nginx/es-dict
root@master:~# cat /data00/data/nginx/es-dict/ext_dict.txt
chatGPT
GPT4
文心一言
通义千问

# 创建nginx的配置文件
root@master:~# cat /etc/nginx/conf.d/es-dict.conf 
server {
    listen 81;
    server_name es-dict.example.com;
    root /data00/data/nginx/es-dict;
    location / {
        add_header Content-Type "text/plain; charset=utf-8";
        # 允许ES节点IP访问，生产建议加访问控制
        allow 10.37.0.0/16;
        deny all;
    }
}

# 检查配置文件是否正常
root@master:~# nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful

# 重启nginx
root@master:~# systemctl restart nginx.service

配置ES集群节点，所有节点配置

# 修改hosts文件
root@master:~# echo '10.37.97.56   es-dict.example.com' >> /etc/hosts

# 测试访问，看是否能访问通
root@master:~# curl http://es-dict.example.com:81/ext_dict.txt
chatGPT
GPT4
文心一言
通义千问

root@node01:~# curl -I http://es-dict.example.com:81/ext_dict.txt
HTTP/1.1 200 OK
Server: nginx/1.14.2
Date: Mon, 20 Apr 2026 08:05:37 GMT
Content-Type: text/plain
Content-Length: 39
Last-Modified: Mon, 20 Apr 2026 07:47:04 GMT
Connection: keep-alive
ETag: "69e5d9f8-27"
Content-Type: text/plain; charset=utf-8
Accept-Ranges: bytes


# 编辑ES配置目录下的IKAnalyzer.cfg.xml：
root@master:~# cat /data00/software/elasticsearch-7.17.26/config/analysis-ik/IKAnalyzer.cfg.xml



        IK Analyzer 扩展配置
        
        ik_diy.dic
         
        
        
        
        http://es-dict.example.com:81/ext_dict.txt # 主要修改这里
        
        


# 滚动重启ES节点
root@master:~# systemctl restart elasticsearch.service

测试

# ik_smart请求
POST /_analyze
{
  "analyzer": "ik_smart",
  "text": "通义千问"
}

# 响应
{
  "tokens" : [
    {
      "token" : "通义千问",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 0
    }
  ]
}


# ik_max_word请求
POST /_analyze
{
  "analyzer": "ik_max_word",
  "text": "通义千问"
}

# 响应
{
  "tokens" : [
    {
      "token" : "通义千问",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "通义",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "千",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "TYPE_CNUM",
      "position" : 2
    },
    {
      "token" : "问",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "CN_CHAR",
      "position" : 3
    }
  ]
}

拼音分词器

拼音分词器是把汉字转换成拼音，和IK分词器的黄金搭档，通常适用于：商品搜索、姓名搜索、模糊搜索、拼音 / 首字母检索，其作用主要如下：

把汉字 → 转换成拼音
支持全拼、简拼（首字母）、声母、韵母
支持多音字
支持和 IK 分词组合使用（先切词再转拼音）

通常用于：

电商搜索：输入 shouji → 搜到手机
姓名搜索：输入 zs → 搜到张三
拼音 / 汉字混合搜索
模糊联想、容错搜索

安装

官网：https://github.com/infinilabs/analysis-pinyin

在线安装

所有节点操作

# 安装
root@master:~# elasticsearch-plugin install https://get.infini.cloud/elasticsearch/analysis-pinyin/7.17.26
-> Installing https://get.infini.cloud/elasticsearch/analysis-pinyin/7.17.26
-> Downloading https://get.infini.cloud/elasticsearch/analysis-pinyin/7.17.26
[=================================================] 100%   
-> Installed analysis-pinyin
-> Please restart Elasticsearch to activate any plugins installed
# 修改所属者
root@master:~# chown elasticsearch:elasticsearch -R /data00/software/elasticsearch-7.17.26
root@master:~# ll /data00/software/elasticsearch-7.17.26/plugins/
total 12
drwxr-xr-x 2 elasticsearch elasticsearch 4096 Apr 20 11:26 analysis-ik
drwxr-xr-x 2 elasticsearch elasticsearch 4096 Apr 20 16:15 analysis-pinyin
drwxr-xr-x 2 elasticsearch elasticsearch 4096 Apr 16 15:45 repository-s3

# 滚动更新所有ES节点
root@master:~# systemctl restart elasticsearch.service

离线安装

##下载 pinyin分词器，将下载好的包上传至集群中
https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.17.26/elasticsearch-analysis-pinyin-7.17.26.zip

# 创建目录
root@master:~# mkdir /data00/software/elasticsearch-7.17.26/plugins/pinyin

# 解压包
root@master:~# unzip elasticsearch-analysis-ik-7.17.26.zip -d /data00/software/elasticsearch-7.17.26/plugins/pinyin/

# 修改所属者
root@master:~# chown -R elasticsearch:elasticsearch /data00/software/elasticsearch-7.17.26/plugins/pinyin/

# 最后滚动重启所有ES节点，保证业务不受影响
root@master:~# systemctl restart elasticsearch.service

测试

POST /_analyze
{
  "analyzer": "pinyin",
  "text": "我爱中国"
}

# 响应
{
  "tokens" : [
    {
      "token" : "wo",
      "start_offset" : 0,
      "end_offset" : 0,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "wazg",
      "start_offset" : 0,
      "end_offset" : 0,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "ai",
      "start_offset" : 0,
      "end_offset" : 0,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "zhong",
      "start_offset" : 0,
      "end_offset" : 0,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "guo",
      "start_offset" : 0,
      "end_offset" : 0,
      "type" : "word",
      "position" : 3
    }
  ]
}

扩展，pinyin分词器配置词典

pinyin分词器也支持本地词典和远程词典，整体的配置步骤和IK分词器一样，可以参考上文即可

扩展，创建索引时指定IK分词器

示例：

生产环境可用

PUT /my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "ik_max_word_custom": {
          "type": "ik_smart",          // 必选：ik_max_word / ik_smart
          "use_smart": true,             // 是否使用智能分词
          "enable_lowercase": true,       // 英文是否转小写
          "enable_remote_dict": true,     // 是否开启远程词典
          "remote_dict_interval": 60,     // 远程词典刷新间隔(秒)
          "use_single_word": false,       // 未匹配到词时是否单字输出
          "convert_chinese_num": false,   // 是否把中文数字转为阿拉伯数字
          "use_stop_word": true           // 是否启用停用词
        },
        "ik_smart_custom": {
          "type": "ik_max_word",          // 必选：ik_max_word / ik_smart
          "use_smart": false,             // 是否使用智能分词
          "enable_lowercase": true,       // 英文是否转小写
          "enable_remote_dict": true,     // 是否开启远程词典
          "remote_dict_interval": 60,     // 远程词典刷新间隔(秒)
          "use_single_word": false,       // 未匹配到词时是否单字输出
          "convert_chinese_num": false,   // 是否把中文数字转为阿拉伯数字
          "use_stop_word": true           // 是否启用停用词
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "content": {
        "type": "text",
        "analyzer": "ik_max_word_custom",
        "search_analyzer": "ik_smart_custom"
      }
    }
  }
}

参数解析

type：必选，可选值：
- ik_max_word
- ik_smart
use_smart
- 是否使用智能分词模式
- true = ik_smart 效果
- false = ik_max_word 效果
- 一般和 type 保持一致即可
enable_lowercase（生产最常用）
- 英文是否自动转小写
- true：Hello → hello
- false：保持原样
- 生产建议：true
use_stop_word
- 是否启用停用词过滤
- true：过滤 “的、了、是、吗”
- 生产建议：true
enable_remote_dict
- 是否开启远程词典热更新
- true：开启
remote_dict_interval
- 远程词典自动刷新间隔（秒）
- 默认 60 秒
convert_chinese_num
- 是否把中文数字转为阿拉伯数字
- 一百 → 100
- 一般场景不需要

扩展，pinyin分词器和IK分词器联合使用

在生产环境中，pinyin分词器一般不单独使用，基本都是和IK分词器联合使用
下面是一个示例：

PUT /pinyin_index
{
  "settings": {
    "analysis": {
      "filter": {
        "my_pinyin_filter": {
          "type": "pinyin",
          "keep_full_pinyin": true,          // 保留全拼：中国 → zhongguo
          "keep_first_letter": true,        // 保留首字母：zg
          "keep_original": true,            // 保留原词：中国
          "keep_separate_first_letter": false, // 首字母分开：z g
          "limit_first_letter_length": 16,   // 首字母最大长度
          "lowercase": true,                 // 全部小写
          "ignore_pinyin_modifier": true,    // 忽略拼音声调
          "remove_duplicated_term": true,    // 自动去重
          "keep_joined_full_pinyin": false,   // 全拼连写
          "keep_none_chinese": true,         // 保留非中文字符
          "none_chinese_pinyin_tokenize": true // 非中文也分词
        }
      },
      "analyzer": {
        "my_pinyin_analyzer": {
          "type": "custom",
          "tokenizer": "ik_max_word",        // 先用IK切词
          "filter": [
            "my_pinyin_filter"               // 再转拼音
          ]
        },
        "my_pinyin_smart_analyzer": {
          "type": "custom",
          "tokenizer": "ik_smart",
          "filter": [
            "my_pinyin_filter"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "my_pinyin_analyzer",
        "search_analyzer": "my_pinyin_smart_analyzer"
      }
    }
  }
}