String Operations
In C, we have strcmp and memcmp. Here, we have scas and cmps. Similarly we have load, store and move strings
Pre: cld->instruction to clear direction flag. If it is cleared, then when the addresses are copied from one location to the other, addresses are incremented. Else they are decremented.
DF=0 (Clear): When the Direction Flag is clear, string operations process memory from lower addresses to higher addresses. This is known as "forward direction." Instructions like
MOVS
,LODS
,STOS
,CMPS
, andSCAS
will increment the index or pointer registers (such asSI
,DI
,ESI
, orEDI
) after each operation.DF=1 (Set): When the Direction Flag is set, string operations process memory from higher addresses to lower addresses. This is known as "backward direction." In this case, the index or pointer registers are decremented after each operation.
For example,
Initial Setup
Source String: Located at memory address
0x100
, contents:"ABCDEFGH"
Destination String: Located at memory address
0x110
, initial contents can be anything, let's say"ZZZZZZZZ"
We want to copy the source string to the destination string using the MOVS
instruction.
Expected Behavior (with DF=0)
Let's first examine the expected behavior when the Direction Flag is cleared (DF=0), which means we're copying from lower to higher memory addresses.
Before Operation: Source (0x100):
ABCDEFGH
Destination (0x110):ZZZZZZZZ
Operation: Execute
MOVS
instructions to copy each byte from source to destination.After Operation: Source (0x100):
ABCDEFGH
(unchanged) Destination (0x110):ABCDEFGH
(copied as expected)
Actual Behavior (with DF=1)
Now, let's see what happens if the Direction Flag is set (DF=1), and we accidentally copy from higher to lower addresses without realizing it.
Before Operation (same as before): Source (0x100):
ABCDEFGH
Destination (0x110):ZZZZZZZZ
Operation (DF=1):
The
MOVS
instruction starts copying from the end of the source string to the end of the destination string because the Direction Flag indicates backward copying.This backward operation is not what we intended for a straightforward copy from one buffer to another.
After Operation:
The operation proceeds in reverse, starting with copying 'H' to the last position of the destination, then 'G' to the second last, and so on.
However, since we're expecting a forward copy, our code logic might not correctly handle this reverse operation, leading to confusion or bugs, especially in scenarios where buffer boundaries, overlapping regions, or specific processing logic are critically important.
In short, to ensure "abcdefgh" is copied to destination as "abcdefgh" and not something unintended (like example below) use cld before string operations.
Comparing Strings
Input and comparisons of strings is done with "SCASX" and "CMPSX" where X is->B/W/D/Q respectively for 1,2,4,8 byte strings
SCASX-> Compares register al/ax/eax/rax (Depending if it is scasb,scasw,scasd,scasq) with memory referenced by RDI
CMPSX-> Compares memory referenced by RSI with RDI
So, SCAS-> memory to register comparison (source in RAX, to be compared with RDI)
CMPS-> memory-to-memory comparison (Source in RSI, to be compared with RDI)
Notice in the program that instruction 1 has moved an absolute value in rax. "lea" has loaded the address of variable 1
Also notice how "scasq" has now become "scas" and is comparing rax with QWORD pointed to by RDI. This is an internal operating trick where "scasq" or "scasw" etc are interpreted and typecasted.
Upon stepping this instruction, we can see that the comparison has now been done and Zero Flags is set.
Zero Flag is set because the comparison was successful.
Now, in the next instruction, the value to be compared is different. When SCAS is run, ZF is not set, indicating the comparison was false.
Similarly, in CMPS, two memory addresses are compared. Comparison happens between RSI and RDI. Here, var1 and var3 are same to ZF is set.
Loading, Storing, Moving Strings
LODSX -> Stores from memory to register
MOVSX -> Moves strings from memory to memory
STOSX -> Stores from register to memory
Where X could bd b,w,d,q for 1,2,4,8 bytes respectively
Program:
Upon running the first block (MOVS) in GDB we observe something.
The source is our RSI string which is the "Hello World" string that is to be copied.
The destination RDI is loaded with the address of "Copy" variable is in our bss section. We have some unreserved memory location of the variable "Copy"
These registers are respectively loaded with memory addresses.
When we inspect the destination address initially, we see nothing there. After the movs instruction "Hello Wo" appears (8 bytes only since movsq is used)
To copy the entire string, programmers often have to loop through the source string until every byte is copied. We can use rep for this and put a counter in rcx. Here is where cld comes in useful. We want to copy byte after byte in an incremental order!!
Now, we loop len times and see the destination variable's value now. len is the length of "Hello World"
Let's actually see what gets copied when we don't use cld by tweaking our program. I removed cld here and rather used std (to set the direction flag). Now cld should be equal to 1 which is decremental copying. As we see, the program just keeps copying the first byte again and again.
This makes the program decrement addresses as it proceeds making it only copy "H" which is the first byte of the message.
This is happening because Direction Flag is forcing the decrement and the next byte after "H" is not reached at all.
Next, STOS is used to copy from register RAX to memory address stored in RDI (0x402017)
And the opposite with LODS. LODS stores value of data pointed by RSI in RAX
Last updated