Elasticsearch: The search engine database

What is Elasticsearch
ElasticSearch is a technology that many software engineers know, but each have different definition of the technology. In fact, Elasticsearch has many purposes and can be used in different ways. So, what is Elasticsearch ?
Elasticsearch is a distributed analytics and search engine built on Apache Lucene, a search framework. Elasticsearch is an open source NoSQL database with a focus on search functionalities. It allows you to store, search and analyze huge volumes of data quickly.
It is very easy to install and get started with Elasticsearch and it has a short learning curve. The last part of this article will guide you through the installation.
How does Elasticsearch work
Elasticsearch is essentially the configuration of one or more cluster. A cluster will manage multiple nodes which are interacting with the documents (the data). We are now going to describe the different components of the Elasticsearch stack.
Cluster
An Elasticsearch cluster is a group of one or more nodes interacting with each other. The cluster will distribute the different tasks among the nodes, to index, search and store the data.
Node
An Elasticsearch node is a single server used to store data and to provide search and indexing capabilities. A node can be parameterized in 3 types:
- Master node: Manages the cluster and the other nodes by creating/deleting indices or nodes.
- Data node: Stores documents (data) and executes queries around the data like search or aggregation.
- Client node: Redirects the different requests to the Master node or to the Data nodes given the type of the request (cluster-related or data-related).
Index
An Elasticsearch index is a collection of documents having similar characteristics. It is similar to a table in a relational database. All the documents in an index are logically related.
Document
An Elasticsearch document is the basic unit of information that can be indexed. It is similar to a row in a relational database. A document is defined in a JSON format. A document can represent different objects like numbers, strings, dates.
Shards
An Elasticsearch shard is an index piece from a subdivided index. Each shard is an independant and functional index. It allows you to split the data volume in different nodes to optimize the parallelism.
Replicas
An Elasticsearch replica is a copy of an original shard. It provides high availability and disaster recovery functionalities to the cluster.
Elasticsearch use cases
Elasticsearch has many possibilities and can be finetuned for multiple use cases.
Website, Applications and Full-text Search
Obviously, the search is the core capability of Elasticsearch so this is one of the main use case. It can be used for applications focusing on the access, retrieval and reporting of data. Another use case is websites search.
Logging and Log Analysis
Elasticsearch is also mainly used to ingest and analyze log data in near real-time. Elasticsearch can be used in combination with Kibana to obtain a pipeline of log analysis and visualization through dashboard.
Metrics and monitoring
Elasticsearch is also quite powerful on time-series data like metrics and applications events. Elasticsearch offers various features and interfaces to retrieve your data and events, no matter what the technology.
Data visualization
Elasticsearch can be used in combination with Kibana which provides dashboards, analysis and data visualization. Kibana is a very powerful and easay to use visualization tool. It is developed by the same company which made Elasticsearch.
Getting started with ElasticSearch
Elasticsearch can be installed easily in Linux, MacOS or Windows. You can install it through archives or using some package manager tool like deb
, rpm
, msi
, docker
or brew
.
Go to the official installation website https://www.elastic.co/guide/en/elasticsearch/reference/current/install-elasticsearch.html and select your preferred way to install it.
Once it is installed, you can export the elasticsearch
executable to your $PATH. Then you can start an Elasticsearch cluster by running this binary : elasticsearch
.
It will start a cluster and you will be able to access different information on it. By default, the cluster can be accessed in localhost
on the 9200
port.
If you want to use cURL you can retrieve informations of the cluster:
curl -X GET "localhost:9200/?pretty
If you want to get the health of the cluster you can execute this cURL command:
curl -X GET "localhost:9200/_cluster/health?pretty"
You should receive a response like this:
{ "cluster_name" : "testcluster", "status" : "yellow", "timed_out" : false, "number_of_nodes" : 1, "number_of_data_nodes" : 1, "active_primary_shards" : 1, "active_shards" : 1, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 1, "delayed_unassigned_shards": 0, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch": 0, "task_max_waiting_in_queue_millis": 0, "active_shards_percent_as_number": 50.0 }