Secondary Indexes

Learn how to use secondary indexes on non-primary key fields to improve data access in partitioned systems. Understand the difference between document-based and term-based partitioning of secondary indexes, including their advantages and trade-offs in distributed environments.

We'll cover the following...

Document-based partitioning of secondary indexes
Term-based partitioning of secondary indexes

So far we have covered material on partitioning data. If the access patterns for the stored data also use columns/fields other than the primary key, then we may want to create secondary indexes on those columns/fields. Let’s understand the difference between a primary and secondary index first.

A primary index is based on the primary key of a table. The primary key comprises of a set of fields in the table, that together represent a unique value for each record. Furthermore, the records in a table can be thought of as being laid out (or sorted) in the order of the primary key, thus any searches for records using the primary index can be done using binary search. In contrast the the secondary index can be based on any fields of a table that may or may not be unique across all the records.

For instance, in our songs example if the records are ...

1.Basics

2.Kafka Producer

3.Kafka Consumer

4.Kafka Internals

5.Conclusion

6.Appendix

7.Reference: Replication

8.Reference: Partitioning

9.Reference: Transactions

10.Reference: Issues in Distributed Systems

Secondary Indexes