有效的方式来检索ElasticSearch中的所有_ids

从ElasticSearch获得某个索引的所有_ids的最快方法是什么？有可能通过使用简单的查询吗？我的一个索引有大约20000个文件。

编辑：请阅读@Aleck Landgraf的答案

你只需要elasticsearch-internal _id字段？或从您的文件中的id字段？

对于前者，试试

 curl http://localhost:9200/index/type/_search?pretty=true -d ' { "query" : { "match_all" : {} }, "fields": [] } '

结果将只包含文档的“元数据”

 { "took" : 7, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 4, "max_score" : 1.0, "hits" : [ { "_index" : "index", "_type" : "type", "_id" : "36", "_score" : 1.0 }, { "_index" : "index", "_type" : "type", "_id" : "38", "_score" : 1.0 }, { "_index" : "index", "_type" : "type", "_id" : "39", "_score" : 1.0 }, { "_index" : "index", "_type" : "type", "_id" : "34", "_score" : 1.0 } ] } }

对于后者，如果要从文档中包含字段，只需将其添加到fields数组

 curl http://localhost:9200/index/type/_search?pretty=true -d ' { "query" : { "match_all" : {} }, "fields": ["document_field_to_be_returned"] } '

最好使用滚动和扫描来获得结果列表，以便elasticsearch不必对结果进行sorting和sorting。

通过elasticsearch-dsl python lib，可以通过以下方式来完成：

 from elasticsearch import Elasticsearch from elasticsearch_dsl import Search es = Elasticsearch() s = Search(using=es, index=ES_INDEX, doc_type=DOC_TYPE) s = s.fields([]) # only get ids, otherwise `fields` takes a list of field names ids = [h.meta.id for h in s.scan()]

控制台日志：

 GET http://localhost:9200/my_index/my_doc/_search?search_type=scan&scroll=5m [status:200 request:0.003s] GET http://localhost:9200/_search/scroll?scroll=5m [status:200 request:0.005s] GET http://localhost:9200/_search/scroll?scroll=5m [status:200 request:0.005s] GET http://localhost:9200/_search/scroll?scroll=5m [status:200 request:0.003s] GET http://localhost:9200/_search/scroll?scroll=5m [status:200 request:0.005s] ...

注意：滚动从查询中提取批量的结果，并保持光标打开一段时间（1分钟，2分钟，你可以更新）; 扫描禁用sorting。 scan助手函数返回一个可以安全地迭代的python生成器。

另外一个select

 curl 'http://localhost:9200/index/type/_search?pretty=true&fields='

将返回_index，_type，_id和_score。

对于elasticsearch 5.x，可以使用“ _source ”字段。

 GET /_search { "_source": false, "query" : { "term" : { "user" : "kimchy" } } }

"fields"已被弃用。（错误：“字段[字段]不再支持，请使用[stored_fields]检索存储的字段或_source过滤如果字段没有存储”）

受@ Aleck-Landgraf答案的启发，对我来说，它是通过在标准的elasticsearch python API中使用直接扫描function来实现的：

 from elasticsearch import Elasticsearch from elasticsearch.helpers import scan es = Elasticsearch() for dobj in scan(es, query={"query": {"match_all": {}}, "fields" : []}, index="your-index-name", doc_type="your-doc-type"): print dobj["_id"],

你也可以用python来做，它给你一个正确的列表：

 import elasticsearch es = elasticsearch.Elasticsearch() res = es.search( index=your_index, body={"query": {"match_all": {}}, "size": 30000, "fields": ["_id"]}) ids = [d['_id'] for d in res['hits']['hits']]

对@ Robert-Lujo和@ Aleck-Landgraf的2个答案进行详细说明（有权限的人可以很高兴地将其转换为注释）：如果您不想打印，但是从返回的生成器中获取列表中的所有内容，我用：

 from elasticsearch import Elasticsearch,helpers es = Elasticsearch(hosts=[YOUR_ES_HOST]) a=helpers.scan(es,query={"query":{"match_all": {}}},scroll='1m',index=INDEX_NAME)#like others so far IDs=[aa['_id'] for aa in a]

 Url -> http://localhost:9200/<index>/<type>/_query http method -> DELETE Query -> {"query": {"match_all": {}}, "size": 30000, "fields": ["_id"]})

有效的方式来检索ElasticSearch中的所有_ids

如何findElasticSearch安装插件的位置？

数千个文档的可search存档的最佳实践（pdf和/或xml）

刷新VS刷新

列出ElasticSearch服务器上的所有索引？

查询与filter

创buildElasticsearchcurl查询为非空且不为空（“”）

Elasticsearch“没有请求添加”批量API错误

如何用ElasticSearchsearch单词的一部分

ElasticSearch – 返回唯一值

ElasticSearch：将“not_analyzed”字段设置为“store”的影响：“是”？