How to use strings in Assembly Language
To process strings in Assembly, we need to be able to process blocks of data in one go.
The 8088 microprocessor provides us a set of instructions called block processing or string instructions that allow us to achieve just that.
The five string instructions are:
- STOS: Store string
- LODS: Load string
- CMPS: Compare string
- SCAS: Scan string
- MOVS: Move string
Each of these instructions has two variants, a B or W at the end of the instruction. For example, MOVSB or MOVSW.
The variant determines how the offsets (DI and SI) will be incremented/decremented.
The DI and SI registers
Before moving on to the instructions themselves, we must first familiarize ourselves with the DI and SI registers.
DI and SI are used to access memory. SI and DI are called source index and destination index because of string instructions.
Whenever an instruction needs a memory source, DS:SI holds a pointer to it.
For the memory destination, the pointer is placed in ES:DI.
DS refers to the
Data Segmentof our memory. ES refers to theExtra Segmentof our memory.
REPE and REPNE
REPE repeats the following string instruction while the zero flag is set.
RENE repeats the following string instruction while the zero flag is clear.
They are used with the SCAS and CMPS instructions.
STOS
STOS transfers a byte or word from the register AL or AX to the memory element addressed by ES: DI and updates DI to the next location. STOS is often used to to clear a block of memory or fill it with a constant.
The source will always be AL or AX. If the data flag (DF) is clear, DI is incremented. If DF is set, DI will be decremented.
cld: clear data flag
std: set data flag
The increment/decrement of DI depends on the STOS variant. With STOSB, it is and with STOSW.
If REP is used with STOS, the process is repeated n times, where n is stored in CX.
mov ax, 0x720 # space char moved to ax
mov cx, 2000 # iteration count
cld # auto-increment
rep stosw # repeat cx times
LODS
LODS transfers a byte or word from the source location DS: SI to AL or AX. In essence, it works in reverse to STOS.
LODS is used in a loop rather than with REP to prevent the data in the register from being overwritten.
msg: db 'hello world'
len: dw 11
mov di, ax # point di to location
mov si, [msg] # point si to string
mov cx, [len] # load length in cx
cld # auto-increment
char_loop:
lodsb # load next char in al
loop char_loop
SCAS
SCAS compares a source byte or word in register AL or AX with the destination string element addressed by ES: DI and updates the flags.
DI is updated to the next location. SCAS is used to check equality/inequality in a string using REPE or REPNE.
It can be used to find a 0 in a null-terminated string to calculate its length.
msg: db 'hello world', 0 # null terminated string
mov di, [msg] # point di to str
mov cx, 0xffff # load MAX in cx
xor al, al # load 0 in al
repne scasb
mov ax, 0xffff
sub ax, cx # ax now contains length
CMPS
CMPS subtracts the source location DS: SI from the destination location ES: DI without affecting the source and destination themselves. SI and DI are updated accordingly.
If used with REPE or REPNE, it can be used to check equality/inequality. CMPS can also be used to find a substring within a string.
msg1: db 'Edpresso'
msg2: db 'EdpressO'
mov si, [msg1]
mov di, [msg2]
mov ax, 1 # true: strings are equal
repe cmpsb
je exit
mov ax, 0 # false: strings are unequal
exit:
mov ax, ox4c00
int 21h
MOVS
MOVS transfers a byte or word from DS: SI to ES: DI and updates SI and DI accordingly. It is used to move a block of memory.
REP allows the instruction to be repeated n times where n is the value stored in CX.
Free Resources