Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

python

How to get the line count of a file using mmap in Python

Yasir Latif

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

In Python, the modification to the file object can be done in two different ways. The first way is to modify the file from its physical storage location by making system calls such as open, read, write, and more. The second modification way is to modify directly in the RAM without system calls using Memory mapping, which allows us to map file data into memory.

The file is read at once and kept as a file object for modifications. The mmap module in Python uses the mmap method to map bytes of the file to memory and returns the file object. The returned object is then used to count the number of lines.

Description of the mmap function

The mmap function needs different arguments and is defined as given below.

mm_object = mmap.mmap(file_obj.fileno(), length=0,access=mmap.ACCESS_READ,offset=0)
How to use the mmap function in Python
  • file_obj.fileno() is the file descriptor
  • length=0 is the size of memory. It is defined 0 so that system automatically selects sufficient memory required for the file data
  • access=mmap.ACCESS_READ defines to the system how the program is going to access the memory. Other options are ACCESS_WRITE and ACCESS_COPY
  • offset=0 defines the system to map the file to the memory after the bytes defined by the offset.

Code

main.py
test.txt
import mmap
with open('test.txt', "r+") as file_obj:
    mm_object = mmap.mmap(file_obj.fileno(), length=0,access=mmap.ACCESS_READ,offset=0)
    lines_count = 0
    while mm_object.readline():
        lines_count += 1
print(lines_count)
Count number of lines

Explanation

  • Line 1: Import the mmap module from Python libraries.
  • Line 3: Map the file to the memory that returns an object mm_object to access the mapped file.
  • Line 5–6: Loops through the object to access each line using mm_object.readline() until the end of the file. It increments the lines_count variable to get total counts at the end.

RELATED TAGS

python

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

Keep Exploring