How can we detect data modification?

Hashing algorithms: We can use cryptographic hash functions (e.g., MD5, SHA-256) to generate a fixed-size hash value (checksum) for the original data. Store the hash value securely. Periodically recompute the hash of the current data and compare it with the stored hash. If they differ, data modification may have occurred.
Error-checking codes: We can apply checksums or error-checking codes to the data. These are values calculated based on the data content. When data is modified, the checksum will change, indicating potential tampering.
Public key infrastructure (PKI): We can use digital signatures to verify the authenticity of the data. Digital signatures involve the use of public and private key pairs. The sender signs the data with their private key, and the recipient verifies the signature using the sender's public key. If the data is modified, the signature won't match.
Timestamping: Record timestamps for data creation and modification. Regularly check these timestamps to identify unexpected changes. This won't prevent modification but can help in identifying when it occurred.
Versioning systems: We can implement version control systems to keep track of changes to data over time. It allows us to roll back to previous versions and compare differences.
Database audit trails: This enables database auditing features to log data changes. Regularly review these audit trails for any unauthorized modifications.
Network and system monitoring: Employ monitoring systems and intrusion detection tools to detect unusual or unauthorized access to data. Unusual patterns or access from unfamiliar locations may indicate data modification attempts.
FIM software: We can use file integrity monitoring tools to constantly monitor and compare file attributes, such as size, checksum, and permissions. Any discrepancies can be flagged for further investigation.
Restrict access: We can implement strict access controls to limit who can modify data. Regularly review and update access permissions.
Encrypt data in transit and at rest: Make sure data is encrypted during transmission and storage. Unauthorized access is less likely to result in successful data modification if the data is encrypted.

import hashlib
import json

def calculate_hash(data):
    # Convert the data to a JSON string for hashing
    data_string = json.dumps(data, sort_keys=True)
    # Calculate the SHA-256 hash
    sha256_hash = hashlib.sha256(data_string.encode()).hexdigest()
    return sha256_hash

def save_data_and_hash(data, file_path):
    # Calculate the hash of the data
    data_hash = calculate_hash(data)
    # Save the data and hash to a file
    with open(file_path, 'w') as file:
        json.dump({"data": data, "hash": data_hash}, file)

def check_data_integrity(file_path, use_modified_data=False):
    # Load data and hash from the file
    with open(file_path, 'r') as file:
        stored_data = json.load(file)
        stored_hash = stored_data["hash"]
        stored_data = stored_data["data"]
    if use_modified_data:
        # Example: Modify the data
        modified_data = {"name": "Jane Doe", "age": 30, "city": "Modified City"}
        # Corrected: Save the modified data and its hash to the file
        save_data_and_hash(modified_data, file_path)
        data_to_check = modified_data
    else:
        data_to_check = stored_data
    # Recalculate the hash of the loaded data
    recalculated_hash = calculate_hash(data_to_check)
    # Compare the stored hash with the recalculated hash
    if recalculated_hash == stored_hash:
        print("Data integrity is intact. No modifications detected.")
        print("Data:")
        print(json.dumps(data_to_check, indent=2))
    else:
        print("Warning: Data has been modified.")
        print("Original Data:")
        print(json.dumps(stored_data, indent=2))
        print("Modified Data:")
        print(json.dumps(data_to_check, indent=2))

# Example usage:
file_path = "protected_data.json"
# Save the original data and its hash to a file
data_to_protect = {"name": "John Doe", "age": 25, "city": "Example City"}
save_data_and_hash(data_to_protect, file_path)

# Ask the user if they want to check with modified data
user_input = input("Do you want to check with modified data? (yes/no): ").lower()
if user_input == "yes":
    # Check data integrity with modified data
    check_data_integrity(file_path, use_modified_data=True)
else:
    # Check data integrity with original data
    check_data_integrity(file_path)

Using hashing (SHA-256) to ensure data integrity

Code explanation

In the above code:

Line 1: Import the hashlib module, which provides a secure way to generate hash functions, and it will be used to calculate the SHA-256 hash in this code.
Line 2: Import the json module, which is used for handling JSON data.
Lines 4–9: calculate_hash function takes a data parameter converts it to a JSON string (with keys sorted for consistency), calculates the SHA-256 hash, and returns the hexadecimal representation of the hash.
Lines 11–16: save_data_and_hash function takes data and file_path parameters calculates the hash of the data using the calculate_hash function, and then saves both the data and its hash to a JSON file specified by file_path.
Lines 18–44: check_data_integrity function checks the integrity of the data stored in a file. It loads the data and hash from the file and optionally modifies the data (if use_modified_data is True), recalculates the hash of the loaded data and compares it with the stored hash to determine if the data has been modified.
Lines 46–50: This section demonstrates how to use the functions. It sets the file path, creates original data to protect, and then saves the data along with its hash to a file.
Lines 53–59: The code prompts the user to decide whether they want to check data integrity with modified data. Based on the user's input, it calls the check_data_integrity function with or without modified data.

Output

The code implements a data integrity verification system using hashing. Initially, it saves the original data consisting of the name John Doe, age 25, and city Example City, along with its corresponding hash, to a JSON file named protected_data.json. When prompted, if the user chooses not to check with modified data (input no), the script confirms that the data integrity is intact as no modifications were detected, and it prints the original data. Conversely, if the user opts to check with modified data (input yes), the script detects that the data has been modified and issues a warning. It then displays both the original data and the modified data, which now contains the name Jane Doe, age 30, and city Modified City. This demonstrates the successful detection of data modification and highlights the importance of data integrity verification in maintaining data security and reliability.

Conclusion

In conclusion, ensuring data integrity is paramount in safeguarding sensitive information from unauthorized modifications. By employing techniques such as hashing algorithms, error-checking codes, digital signatures, and access controls, along with diligent monitoring and auditing, organizations can detect and respond to data modifications effectively. The provided code example demonstrates hashing (SHA-256) to verify data integrity, offering a practical approach to securing data storage systems against tampering and unauthorized alterations.

Free Resources

How can we detect data modification?

Coding example

Code explanation

Output

Conclusion