String Operations
In C, we have strcmp and memcmp. Here, we have scas and cmps. Similarly we have load, store and move strings
Pre: cld->instruction to clear direction flag. If it is cleared, then when the addresses are copied from one location to the other, addresses are incremented. Else they are decremented.
DF=0 (Clear): When the Direction Flag is clear, string operations process memory from lower addresses to higher addresses. This is known as "forward direction." Instructions like
MOVS
,LODS
,STOS
,CMPS
, andSCAS
will increment the index or pointer registers (such asSI
,DI
,ESI
, orEDI
) after each operation.DF=1 (Set): When the Direction Flag is set, string operations process memory from higher addresses to lower addresses. This is known as "backward direction." In this case, the index or pointer registers are decremented after each operation.
For example,
Initial Setup
Source String: Located at memory address
0x100
, contents:"ABCDEFGH"
Destination String: Located at memory address
0x110
, initial contents can be anything, let's say"ZZZZZZZZ"
We want to copy the source string to the destination string using the MOVS
instruction.
Expected Behavior (with DF=0)
Let's first examine the expected behavior when the Direction Flag is cleared (DF=0), which means we're copying from lower to higher memory addresses.
Before Operation: Source (0x100):
ABCDEFGH
Destination (0x110):ZZZZZZZZ
Operation: Execute
MOVS
instructions to copy each byte from source to destination.After Operation: Source (0x100):
ABCDEFGH
(unchanged) Destination (0x110):ABCDEFGH
(copied as expected)
Actual Behavior (with DF=1)
Now, let's see what happens if the Direction Flag is set (DF=1), and we accidentally copy from higher to lower addresses without realizing it.
Before Operation (same as before): Source (0x100):
ABCDEFGH
Destination (0x110):ZZZZZZZZ
Operation (DF=1):
The
MOVS
instruction starts copying from the end of the source string to the end of the destination string because the Direction Flag indicates backward copying.This backward operation is not what we intended for a straightforward copy from one buffer to another.
After Operation:
The operation proceeds in reverse, starting with copying 'H' to the last position of the destination, then 'G' to the second last, and so on.
However, since we're expecting a forward copy, our code logic might not correctly handle this reverse operation, leading to confusion or bugs, especially in scenarios where buffer boundaries, overlapping regions, or specific processing logic are critically important.
In short, to ensure "abcdefgh" is copied to destination as "abcdefgh" and not something unintended (like example below) use cld before string operations.
Comparing Strings
Input and comparisons of strings is done with "SCASX" and "CMPSX" where X is->B/W/D/Q respectively for 1,2,4,8 byte strings
SCASX-> Compares register al/ax/eax/rax (Depending if it is scasb,scasw,scasd,scasq) with memory referenced by RDI
CMPSX-> Compares memory referenced by RSI with RDI
So, SCAS-> memory to register comparison (source in RAX, to be compared with RDI)
CMPS-> memory-to-memory comparison (Source in RSI, to be compared with RDI)
; File for scasb, cmpsb etc.
global _start
section .text
_start:
; scasb/w/d/q
; Compare memory to register
cld ; clear direction flag
mov rax, 0x1234567890abcdef ; moves an absolute value in rax
lea rdi, [var1] ; loads address of var1 in rdi
scasq ; compares rax's absolute value with the value pointed to by rdi
lea rdi, [var2]
scasq
; cmpsb/w/d/q
; Compare memory to memory
cld
lea rsi, [var1]
lea rdi, [var3]
cmpsq
; exit the program
mov rax, 0x3c
mov rdi, 0
syscall
section .data
var1: dq 0x1234567890abcdef
var2: dq 0xfedcba1234567890
var3: dq 0x1234567890abcdef
Notice in the program that instruction 1 has moved an absolute value in rax. "lea" has loaded the address of variable 1

Also notice how "scasq" has now become "scas" and is comparing rax with QWORD pointed to by RDI. This is an internal operating trick where "scasq" or "scasw" etc are interpreted and typecasted.
Upon stepping this instruction, we can see that the comparison has now been done and Zero Flags is set.
Zero Flag is set because the comparison was successful.

Now, in the next instruction, the value to be compared is different. When SCAS is run, ZF is not set, indicating the comparison was false.

Similarly, in CMPS, two memory addresses are compared. Comparison happens between RSI and RDI. Here, var1 and var3 are same to ZF is set.

Loading, Storing, Moving Strings
LODSX -> Stores from memory to register
MOVSX -> Moves strings from memory to memory
STOSX -> Stores from register to memory
Where X could bd b,w,d,q for 1,2,4,8 bytes respectively
Program:
global _start
section .text
_start:
; Movsb/w/d/q
; Memory to Memory
cld
lea rsi, [HelloWorld]
lea rdi, [Copy]
movsq
; for second demo
cld
lea rsi, [HelloWorld]
xor rax, rax
mov qword [Copy], rax
; till this point we are just zeroing out rax and Copy variable
lea rdi, [Copy] ; copying destination memory address in rdi
mov rcx, len ; we would like to loop through x times where x is the length of the string which is len (equated in .text)
rep movsb ; loop though and copy byte after byte
; second demo ends
; stosb/w/d/q
; Register to Memory
mov rax, 0x0123456789abcdef
lea rdi, [var1]
stosq ; Moves rax's value to the memory address pointed by RDI
; lodsb/w/d/q
; Memory to Register
xor rax, rax
lea rsi, [var1]
lodsq ; Moves var1 to RAX
; exit the program gracefully
mov rax, 0x3c
mov rdi, 0
syscall
section .data
HelloWorld: db "Hello World"
len: equ $-HelloWorld
section .bss
Copy: resb len
var1: resb 8
Upon running the first block (MOVS) in GDB we observe something.
The source is our RSI string which is the "Hello World" string that is to be copied.
The destination RDI is loaded with the address of "Copy" variable is in our bss section. We have some unreserved memory location of the variable "Copy"
These registers are respectively loaded with memory addresses.

When we inspect the destination address initially, we see nothing there. After the movs instruction "Hello Wo" appears (8 bytes only since movsq is used)

To copy the entire string, programmers often have to loop through the source string until every byte is copied. We can use rep for this and put a counter in rcx. Here is where cld comes in useful. We want to copy byte after byte in an incremental order!!

Now, we loop len times and see the destination variable's value now. len is the length of "Hello World"

Let's actually see what gets copied when we don't use cld by tweaking our program. I removed cld here and rather used std (to set the direction flag). Now cld should be equal to 1 which is decremental copying. As we see, the program just keeps copying the first byte again and again.

This makes the program decrement addresses as it proceeds making it only copy "H" which is the first byte of the message.

This is happening because Direction Flag is forcing the decrement and the next byte after "H" is not reached at all.
Next, STOS is used to copy from register RAX to memory address stored in RDI (0x402017)

And the opposite with LODS. LODS stores value of data pointed by RSI in RAX
Last updated
Was this helpful?