Amazon OpenSearch Service

Explore Amazon OpenSearch Service, a managed AWS tool for indexing, searching, and visualizing large datasets. Learn about its architecture, ingestion methods, querying capabilities, and security features. Understand how to integrate OpenSearch with AWS services for real-time streaming or batch analytics, optimize performance, and troubleshoot common issues.

We'll cover the following...

Indexing and queries
- Example: Monitoring solution for IoT devices
Data source integration
Security and encryption
Scaling and optimization
Troubleshooting
- Conclusion

OpenSearch consists of clusters, which are made up of nodes, each of which stores data in indexes. These indexes are divided into shards, distributing data across nodes for fault tolerance and scalability.

The core components of OpenSearch include:

OpenSearch domain: An OpenSearch domain is essentially an ElasticSearch cluster. It is a collection of nodes and configurations that define an OpenSearch service environment. EC2 instances act as nodes in this cluster, which allows us to process large volumes, execute complex queries, and perform aggregations.
Data Prepper: OpenSearch Ingestion is a fully managed, serverless data collector that enables real-time ingestion of logs, metrics, and traces without third-party tools like LogStash. Powered by the open-source Data Prepper, it supports real-time streaming, batch processing, and data transformation with features like buffering, schema validation, and error handling. It benefits from automatic scaling, cost control, enhanced security with VPC integration, and regular updates.

Indexing and queries

When we send JSON-formatted documents to OpenSearch, it automatically indexes the data upon ingestion. This means that OpenSearch analyzes and stores the data to make it searchable, filterable, and aggregatable, without requiring manual indexing steps.

Suppose we require more control over how the data is indexed, for instance, specifying how certain fields are analyzed or stored. In that case, we can define custom mappings and index settings before data ingestion. Otherwise, OpenSearch will infer mappings dynamically based on the structure of the first ingested documents.

An index in OpenSearch is similar to a database table, and each document is a unit of searchable data, like a log entry, a product, or a telemetry record.