Python Elasticsearch Scroll API: 检索海量数据，不受size限制

Python 对 Elasticsearch 中不受 size 限制的检索方法是使用 Scroll API。该 API 允许在不受限制地检索大量数据时保持游标位置，并在后续请求中使用该游标来检索更多数据。

以下是使用 Scroll API 的 Python 代码示例：

from elasticsearch import Elasticsearch

# 创建 Elasticsearch 客户端
es = Elasticsearch()

# 定义搜索查询
query = {
    "query": {
        "match_all": {}
    }
}

# 开始滚动检索
scroll_size = 1000  # 每次检索返回的文档数量
results = es.search(index="my_index", body=query, scroll="2m", size=scroll_size)

# 循环处理滚动检索结果
while len(results["hits"]["hits"]) > 0:
    # 处理当前批次的检索结果
    for hit in results["hits"]["hits"]:
        # 处理文档
        print(hit["_source"])

    # 使用 scroll_id 检索下一批数据
scroll_id = results["_scroll_id"]
    results = es.scroll(scroll_id=scroll_id, scroll="2m")

在上面的代码中，我们首先定义一个查询，然后使用 search() 方法开始滚动检索。scroll_size 参数指定每次检索返回的文档数量，scroll 参数指定游标的存活时间。在循环中，我们处理当前批次的检索结果，并使用 scroll_id 参数检索下一批数据。当没有更多数据可用时，search() 方法将返回一个空的结果列表，循环将退出。

Python Elasticsearch Scroll API: 检索海量数据，不受size限制