Elasticsearch Fundamentals: Indexing and Querying Data/

...

Architecture of Elasticsearch

Learn about Elasticsearch’s concepts and its difference from other databases.

We'll cover the following...

Elasticsearch architecture
Elasticsearch vs. RDBMS

Press + to interact

Dish index

{
  "_id": "1",
  "name": "Pizza",
  "price": 12.99,
  "category": "Emtree",
  "ingredients": [
    "dough",
    "sauce",
    "cheese"
  ]
}

{
  "_id": "2",
  "name": "Spaghetti",
  "price": 10.99,
  "category": "Emtree",
  "ingredients": [
    "eggs",
    "bacon",
    "parmesan cheese",
    "pepper"
  ]
}

Actor index

{
    "_id": "1",
    "name": "leonardo dicaprio",
    "age": 48,
    "nationality": "American",
    "Occupations": [
        "actor",
        "film producer"
    ],
    "height": 1.83
}

{
    "_id": "2",
    "name": "John Smith",
    "age": 54,
    "nationality": "American",
    "Occupations": [
        "actor",
        "rapper",
        "film producer"
    ],
    "height": 1.88
}

A shard represents a partition of an index’s data. Each shard contains a subset of the index’s documents and is stored on a single node in the cluster. When we create an index, it is created with one shard by default, but we can configure it to have multiple shards that are distributed across different nodes.

Sharding is the process of dividing an index into smaller parts (shards) and distributing them across nodes in the cluster. This enables Elasticsearch to scale horizontally by adding more nodes to the cluster and allows for parallel processing of search requests across multiple shards.

The visualization below demonstrates the distribution of the dish and actor indices across three shards that are spread across three different nodes.

Press + to interact

The following are the benefits of using sharding in Elasticsearch:

Improved performance: By distributing data across multiple shards, search and indexing operations can be parallelized, resulting in improved performance.
Increased capacity: Sharding allows Elasticsearch to scale horizontally, allowing it to handle larger volumes of data.
Improved flexibility: Sharding allows Elasticsearch to distribute data across different hardware or infrastructure, providing flexibility in terms of resource allocation.

A replica is a copy of a shard. Replicas are used to provide high availability and improve fault tolerance. When a shard has one or more replicas, if the primary shard fails or becomes unavailable, one of the replicas can be promoted to primary to ensure that the index remains available for search and indexing operations.

Replicas can also be used to scale search performance. For example, when a search request is made, it can be executed on all shards (including replicas) in parallel, which improves search performance.

Press + to interact

There are some similarities between Elasticsearch and traditional databases (RDBMS). The following points explain the similarities and differences between Elasticsearch and RDBMS terms:

Cluster vs. database: In RDBMS, databases are used interchangeably and represent a set of schemas and several tables. In Elasticsearch, the set of indices available is grouped in a cluster.
Index vs. table: The table is a collection of rows. On the other hand, an index is a collection of documents.
Document vs. row: Documents and rows represent data stored in the index and table, respectively. Rows tend to be more restricted, but documents are more flexible.
Field vs. column: Both fields and columns represent individual data attributes within a structured data model.
Shard: This is the same in both Elasticsearch and RDBMS.
Replica: This is the same in both Elasticsearch and RDBMS.

Press + to interact

Introduction to Elasticsearch

Getting started on Elasticsearch

Text Analysis

Search on ElasticSearch

Aggregation

Conclusion

Integrate Elasticsearch in the Ruby on Rails Application

Architecture of Elasticsearch

Elasticsearch architecture

Node

Cluster

Document

Index

Shard

Replicas

Elasticsearch vs. RDBMS