Search DSL¶
The Search
object¶
The Search
object represents the entire search request:
- queries
- filters
- aggregations
- sort
- pagination
- additional parameters
- associated client
The API is designed to be chainable. With the exception of the
aggregations functionality this means that the Search
object is immutable -
all changes to the object will result in a copy being created which contains
the changes. This means you can safely pass the Search
object to foreign
code without fear of it modifying your objects.
You can pass an instance of the low-level elasticsearch client when
instantiating the Search
object:
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
client = Elasticsearch()
s = Search(client)
You can also define the client at a later time (for more options see the ~:ref:connections chapter):
s = s.using(client)
Note
All methods return a copy of the object, making it safe to pass to outside code.
The API is chainable, allowing you to combine multiple method calls in one statement:
s = Search().using(client).query("match", title="python")
Note
In some cases this approach is not possible due to python’s restriction on
identifiers - for example if your field is called @timestamp
. In that
case you have to fall back to unpacking a dictionary: s.query('range', **
{'@timestamp': {'lt': 'now'}})
To send the request to Elasticsearch:
response = s.execute()
If you just want to iterate over the hits returned by your search you can
iterate over the Search
object:
for hit in s:
print(hit.title)
Search results will be cached. Subsequent calls to execute
or trying to
iterate over an already executed Search
object will not trigger additional
requests being sent to Elasticsearch. To force a request specify
ignore_cache=True
when calling execute
.
For debugging purposes you can serialize the Search
object to a dict
explicitly:
print(s.to_dict())
Queries¶
The library provides classes for all Elasticsearch query types. Pass all the parameters as keyword arguments:
from elasticsearch_dsl.query import MultiMatch
# {"multi_match": {"query": "python django", "fields": ["title", "body"]}
MultiMatch(query='python django', fields=['title', 'body'])
You can use the Q
shortcut to construct the instance using a name with
parameters or the raw dict
:
Q("multi_match", query='python django', fields=['title', 'body'])
Q({"multi_match": {"query": "python django", "fields": ["title", "body"]})
To add the query to the Search
object, use the .query()
method:
q = Q("multi_match", query='python django', fields=['title', 'body'])
s = s.query(q)
The method also accepts all the parameters as the Q
shortcut:
s = s.query("multi_match", query='python django', fields=['title', 'body'])
If you already have a query object, or a dict
representing one, you can
just override the query used in the Search
object:
s.query = Q('bool', must=[Q('match', title='python'), Q('match', body='best')])
Query combination¶
Query objects can be combined using logical operators:
Q("match", title='python') | Q("match", title='django')
# {"bool": {"should": [...]}}
Q("match", title='python') & Q("match", title='django')
# {"bool": {"must": [...]}}
~Q("match", title="python")
# {"bool": {"must_not": [...]}}
You can also use the +
operator:
Q("match", title='python') + Q("match", title='django')
# {"bool": {"must": [...]}}
When using the +
operator with Bool
queries, it will merge them into a
single Bool
query:
Q("bool") + Q("bool")
# {"bool": {"..."}}
When you call the .query()
method multiple times, the +
operator will
be used internally:
s = s.query().query()
print(s.to_dict())
# {"query": {"bool": {...}}}
If you want to have precise control over the query form, use the Q
shortcut
to directly construct the combined query:
q = Q('bool',
must=[Q('match', title='python')],
should=[Q(...), Q(...)],
minimum_should_match=1
)
s = Search().query(q)
Filters¶
Filters behave similarly to queries - just use the F
shortcut and
.filter()
method. When you use the .filter()
method, the query will be
automatically wrapped in a filtered
query.
If you want to use the post_filter element for faceted navigation, use the
.post_filter()
method.
Aggregations¶
To define an aggregation, you can use the A
shortcut:
A('terms', field='tags')
# {"terms": {"field": "tags"}}
To nest aggregations, you can use the .bucket()
and .metric()
methods:
a = A('terms', field='category')
# {'terms': {'field': 'category'}}
a.metric('clicks_per_category', 'sum', field='clicks')\
.bucket('tags_per_category', 'terms', field='tags')
# {
# 'terms': {'field': 'category'},
# 'aggs': {
# 'clicks_per_category': {'sum': {'field': 'clicks'}},
# 'tags_per_category': {'terms': {'field': 'tags'}}
# }
# }
To add aggregations to the Search
object, use the .aggs
property, which
acts as a top-level aggregation:
s = Search()
a = A('terms', field='category')
s.aggs.bucket('category_terms', a)
# {
# 'aggs': {
# 'category_terms': {
# 'terms': {
# 'field': 'category'
# }
# }
# }
# }
or
s = Search()
s.aggs.bucket('per_category', 'terms', field='category')\
.metric('clicks_per_category', 'sum', field='clicks')\
.bucket('tags_per_category', 'terms', field='tags')
s.to_dict()
# {
# 'aggs': {
# 'per_category': {
# 'terms': {'field': 'category'},
# 'aggs': {
# 'clicks_per_category': {'sum': {'field': 'clicks'}},
# 'tags_per_category': {'terms': {'field': 'tags'}}
# }
# }
# }
# }
You can access an existing bucket by its name:
s = Search()
s.aggs.bucket('per_category', 'terms', field='category')
s.aggs['per_category'].metric('clicks_per_category', 'sum', field='clicks')
s.aggs['per_category'].bucket('tags_per_category', 'terms', field='tags')
Note
When chaining multiple aggregations, there is a difference between what
.bucket()
and .metric()
methods return - .bucket()
returns the
newly defined bucket while .metric()
returns its parent bucket to allow
further chaining.
As opposed to other methods on the Search
objects, defining aggregations is
done in-place (does not return a copy).
Sorting¶
To specify sorting order, use the .sort()
method:
s = Search().sort(
'category',
'-title',
{"lines" : {"order" : "asc", "mode" : "avg"}}
)
It accepts positional arguments which can be either strings or dictionaries.
String value is a field name, optionally prefixed by the -
sign to specify
a descending order.
To reset the sorting, just call the method with no arguments:
s = s.sort()
Pagination¶
To specify the from/size parameters, use the Python slicing API:
s = s[10:20]
# {"from": 10, "size": 10}
If you want to access all the documents matched by your query you can use the
scan
method which uses the scan/scroll elasticsearch API:
for hit in s.scan():
print(hit.title)
Note that in this case the results won’t be sorted.
Highlighting¶
To set common attributes for highlighting use the highlight_options
method:
s = s.highlight_options(order='score')
Enabling highlighting for individual fields is done using the highlight
method:
s = s.highlight('title')
# or, including parameters:
s = s.highlight('title', fragment_size=50)
The fragments in the response will then be available on reach Result
object
as .meta.highlight.FIELD
which will contain the list of fragments:
response = s.execute()
for hit in response:
for fragment in hit.meta.highlight.title:
print(fragment)
Suggestions¶
To specify a suggest request on your Search
object use the suggest
method:
s = s.suggest('my_suggestion', 'pyhton', term={'field': 'title'})
The first argument is the name of the suggestions (name under which it will be returned), second is the actual text you wish the suggester to work on and the keyword arguments will be added to the suggest’s json as-is.
If you only wish to run the suggestion part of the search (via the _suggest
endpoint) you can do so via execute_suggest
:
s = s.suggest('my_suggestion', 'pyhton', term={'field': 'title'})
suggestions = s.execute_suggest()
print(suggestions.my_suggestion)
Extra properties and parameters¶
To set extra properties of the search request, use the .extra()
method:
s = s.extra(explain=True)
To set query parameters, use the .params()
method:
s = s.params(search_type="count")
If you need to limit the fields being returned by elasticsearch, use the
fields()
method:
# only return the selected fields
s = s.fields(['title', 'body'])
# reset the field selection
s = s.fields()
# don't return any fields, just the metadata
s = s.fields([])
Serialization and Deserialization¶
The search object can be serialized into a dictionary by using the
.to_dict()
method.
You can also create a Search
object from a dict
:
s = Search.from_dict({"query": {"match": {"title": "python"}}})
Response¶
You can execute your search by calling the .execute()
method that will return
a Response
object. The Response
object allows you access to any key
from the response dictionary via attribute access. It also provides some
convenient helpers:
response = s.execute()
print(response.success())
# True
print(response.took)
# 12
print(response.hits.total)
print(response.suggest.my_suggestions)
If you want to inspect the contents of the response
objects, just use its
to_dict
method to get access to the raw data for pretty printing.
Hits¶
To access to the hits returned by the search, access the hits
property or
just iterate over the Response
object:
response = s.execute()
print('Total %d hits found.' % response.hits.total)
for h in response:
print(h.title, h.body)
Result¶
The individual hits is wrapped in a convenience class that allows attribute
access to the keys in the returned dictionary. All the metadata for the results
are accessible via meta
(without the leading _
):
response = s.execute()
h = response.hits[0]
print('/%s/%s/%s returned with score %f' % (
h.meta.index, h.meta.doc_type, h.meta.id, h.meta.score))
Note
If your document has a field called meta
you have to access it using
the get item syntax: hit['meta']
.
Aggregations¶
Aggregations are available through the aggregations
property:
for tag in response.aggregations.per_tag.buckets:
print(tag.key, tag.max_lines.value)
MultiSearch
¶
If you need to execute multiple searches at the same time you can use the
MultiSearch
class which will use the _msearch
API:
.. code:: python
from elasticsearch_dsl import MultiSearch, Search
ms = MultiSearch(index=’blogs’)
ms = ms.add(Search().filter(‘term’, tags=’python’)) ms = ms.add(Search().filter(‘term’, tags=’elasticsearch’))
responses = ms.execute()
- for response in responses:
print(“Results for query %r.” % response.search.filter) for hit in response:
print(hit.title)