A primary key is normally declared as clustered, which means the data is physically sorted and stored depending on the primary key column(s). This can help to increase the query performance when querying the table using the clustered primary key.
In a relational database, indexes are used to improve the performance of data retrieval operations. Two main types of indexes are commonly used:
Clustered index: A clustered index determines the physical order of data rows in a table. Each table can have only one clustered index, which organizes the data. In other words, the data rows are physically sorted on disk based on the clustered index key.
Nonclustered index: A nonclustered index is a separate data structure that contains a copy of the indexed columns and a reference to the location of the corresponding data row. It does not determine the physical order of data rows on disk.
A primary key uniquely identifies each row in a table and enforces the entity integrity constraint. It is a candidate for a clustered index due to its uniqueness and importance for data retrieval. However, there are scenarios where declaring a primary key as nonclustered might be beneficial:
Wide key columns: If the primary key contains wide columns (e.g., large text or binary data), using it as a clustered index might lead to inefficient use of disk space and slower data retrieval. In such cases, a nonclustered primary key might be more appropriate.
Frequent updates: Tables with a clustered index experience overhead when performing updates because the data may need to be physically relocated. If a table with a primary key sees frequent updates, especially on the key columns, it might be better to use a nonclustered primary key to reduce the update overhead.
Let’s consider two examples to illustrate when a primary key should be declared as nonclustered.
Suppose we have a table Documents
with the following schema:
CREATE TABLE Documents (DocumentID INT PRIMARY KEY NONCLUSTERED,DocumentContent NVARCHAR(MAX),CreatedBy INT,CreatedDate DATETIME);
In this scenario, we’re storing large document content in the DocumentContent
column. We’ve declared the primary key DocumentID
as nonclustered because the primary focus is on efficient storage and retrieval of metadata (CreatedBy
and CreatedDate
), while the document content may be too large to efficiently store in a clustered index.
Consider a table UserActivity
with the following schema:
CREATE TABLE UserActivity (ActivityID INT PRIMARY KEY NONCLUSTERED,UserID INT,ActivityDate DATETIME,ActivityType VARCHAR(50));
Here, the UserActivity
table records various user activities. We’ve declared the primary key ActivityID
as nonclustered because this table experiences frequent updates, particularly on the ActivityDate
column. Using a nonclustered primary key helps avoid the overhead of physically rearranging data rows during updates.
Let’s perform some sample queries on the two tables to demonstrate how the nonclustered primary key choice affects data retrieval:
CREATE TABLE Documents ( DocumentID INT PRIMARY KEY NONCLUSTERED, DocumentContent NVARCHAR(MAX), CreatedBy INT, CreatedDate DATETIME ); CREATE TABLE UserActivity ( ActivityID INT PRIMARY KEY NONCLUSTERED, UserID INT, ActivityDate DATETIME, ActivityType VARCHAR(50) ); -- Insert a new document INSERT INTO Documents (DocumentID, DocumentContent, CreatedBy, CreatedDate) VALUES (1, 'Large document content...', 222, GETDATE()); -- Retrieve document metadata for a specific DocumentID SELECT DocumentID, CreatedBy, CreatedDate FROM Documents WHERE DocumentID = 1; -- Insert a new user activity INSERT INTO UserActivity (ActivityID, UserID, ActivityDate, ActivityType) VALUES (1, 333, GETDATE(), 'Login'); -- Update the ActivityType for a specific activity UPDATE UserActivity SET ActivityType = 'Logout' WHERE ActivityID = 1; SELECT * FROM UserActivity; DROP TABLE Documents, UserActivity;
In the above code:
Lines 1–6: Creating a table with named Documents
for wide key columns scenario.
Lines 7–12: Creating a table with named UserActivity
for frequent updates scenario.
Lines 15–18: The query retrieves metadata from the Documents
table without involving the potentially large DocumentContent
column. Since the primary key is declared as nonclustered, the query can efficiently use the index to retrieve the required information.
Lines 21–27: The UserActivity
table experiences frequent updates on the ActivityDate
column. By using a nonclustered primary key, the updates are less likely to cause significant performance degradation compared to using a clustered primary key.
Free Resources