git rekt — gemini-redirect.git (c36fb3ddc7e25d5d28171196c9cc6bdc4dd2ec7e): blog/woce-5/index.html

blog/woce-5/index.html (view raw)
  1<!DOCTYPE html><html lang=en><head><meta charset=utf-8><meta name=description content="Official Lonami's website"><meta name=viewport content="width=device-width, initial-scale=1.0, user-scalable=yes"><title> Writing our own Cheat Engine: Code finder | Lonami's Blog </title><link rel=stylesheet href=/style.css><body><article><nav class=sections><ul class=left><li><a href=/>lonami's site</a><li><a href=/blog class=selected>blog</a><li><a href=/golb>golb</a></ul><div class=right><a href=https://github.com/LonamiWebs><img src=/img/github.svg alt=github></a><a href=/blog/atom.xml><img src=/img/rss.svg alt=rss></a></div></nav><main><h1 class=title>Writing our own Cheat Engine: Code finder</h1><div class=time><p>2021-03-06</div><p>This is part 5 on the <em>Writing our own Cheat Engine</em> series:<ul><li><a href=/blog/woce-1>Part 1: Introduction</a> (start here if you're new to the series!)<li><a href=/blog/woce-2>Part 2: Exact Value scanning</a><li><a href=/blog/woce-3>Part 3: Unknown initial value</a><li><a href=/blog/woce-4>Part 4: Floating points</a><li>Part 5: Code finder</ul><p>In part 4 we spent a good deal of time trying to make our scans generic, and now we have something that works<sup class=footnote-reference><a href=#1>1</a></sup>! Now that the scanning is fairly powerful and all covered, the Cheat Engine tutorial shifts focus into slightly more advanced techniques that you will most certainly need in anything bigger than a toy program.<p>It's time to write our very own <strong>debugger</strong> in Rust!<h2 id=code-finder>Code finder</h2><details open><summary>Cheat Engine Tutorial: Step 5</summary> <blockquote><p>Sometimes the location something is stored at changes when you restart the game, or even while you're playing… In that case you can use 2 things to still make a table that works. In this step I'll try to describe how to use the Code Finder function.<p>The value down here will be at a different location each time you start the tutorial, so a normal entry in the address list wouldn't work. First try to find the address. (You've got to this point so I assume you know how to.)<p>When you've found the address, right-click the address in Cheat Engine and choose "Find out what writes to this address". A window will pop up with an empty list.<p>Then click on the Change value button in this tutorial, and go back to Cheat Engine. If everything went right there should be an address with assembler code there now.<p>Click it and choose the replace option to replace it with code that does nothing. That will also add the code address to the code list in the advanced options window. (Which gets saved if you save your table.)<p>Click on stop, so the game will start running normal again, and close to close the window. Now, click on Change value, and if everything went right the Next button should become enabled.<p>Note: When you're freezing the address with a high enough speed it may happen that next becomes visible anyhow</blockquote></details><h2 id=baby-steps-to-debugging>Baby steps to debugging</h2><p>Although I have used debuggers before, I have never had a need to write one myself so it's time for some research.<p>Searching on DuckDuckGo, I can find entire series to <a href=http://system.joekain.com/debugger/>Writing a Debugger</a>. We would be done by now if only that series wasn't written for Linux. The Windows documentation contains a section called <a href=https://docs.microsoft.com/en-us/windows/win32/debug/creating-a-basic-debugger>Creating a Basic Debugger</a>, but as far as I can tell, it only teaches you the <a href=https://docs.microsoft.com/en-us/windows/win32/debug/debugging-functions>functions</a> needed to configure the debugging loop. Which mind you, we will need, but in due time.<p>According to <a href=https://www.gironsec.com/blog/2013/12/writing-your-own-debugger-windows-in-c/>Writing your own windows debugger in C</a>, the steps needed to write a debugger are:<ul><li><a href=https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-suspendthread><code>SuspendThread(proc)</code></a>. It makes sense that we need to pause all the threads<sup class=footnote-reference><a href=#2>2</a></sup> before messing around with the code the program is executing, or things are very prone to go wrong.<li><a href=https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-getthreadcontext><code>GetThreadContext(proc)</code></a>. This function retrieves the appropriate context of the specified thread and is highly processor specific. It basically takes a snapshot of all the registers. Think of registers like extremely fast, but also extremely limited, memory the processor uses.<li><a href=https://docs.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-debugbreakprocess><code>DebugBreakProcess</code></a>. Essentially <a href=https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/x86-instructions#miscellaneous>writes out the 0xCC opcode</a>, <code>int 3</code> in assembly, also known as software breakpoint. It's written wherever the Register Instruction Pointer (RIP<sup class=footnote-reference><a href=#3>3</a></sup>) currently points to, so in essence, when the thread resumes, it will immediately <a href=https://stackoverflow.com/q/3915511/>trigger the breakpoint</a>.<li><a href=https://docs.microsoft.com/en-us/windows/win32/api/debugapi/nf-debugapi-continuedebugevent><code>ContinueDebugEvent</code></a>. Presumably continues debugging.</ul><p>There are pages documenting <a href=https://docs.microsoft.com/en-us/windows/win32/debug/debugging-events>all of the debug events</a> that our debugger will be able to handle.<p>Okay, nice! Software breakpoints seem to be done by writing out memory to the region where the program is reading instructions from. We know how to write memory, as that's what all the previous posts have been doing to complete the corresponding tutorial steps. After the breakpoint is executed, all we need to do is <a href=https://stackoverflow.com/q/3747852/>restore the original memory back</a> so that the next time the program executes the code it sees no difference.<p>But a software breakpoint will halt execution when the code executes the interrupt instruction. This step of the tutorial wants us to find <em>what writes to a memory location</em>. Where should we place the breakpoint to detect such location? Writing out the instruction to the memory we want to break in won't do; it's not an instruction, it's just data.<p>The name may have given it away. If we're talking about software breakpoints, it makes sense that there would exist such a thing as <a href=https://en.wikipedia.org/wiki/Breakpoint#Hardware><em>hardware</em> breakpoints</a>. Because they're tied to the hardware, they're highly processor-specific, but luckily for us, the processor on your usual desktop computer probably has them! Even the <a href=https://interrupt.memfault.com/blog/cortex-m-breakpoints>cortex-m</a> does. The wikipedia page also tells us the name of the thing we're looking for, watchpoints:<blockquote><p>Other kinds of conditions can also be used, such as the reading, writing, or modification of a specific location in an area of memory. This is often referred to as a conditional breakpoint, a data breakpoint, or a watchpoint.</blockquote><p>A breakpoint that triggers when a specific memory location is written to is exactly what we need, and <a href=https://stackoverflow.com/a/19109153/>x86 has debug registers D0 to D3 to track memory addresses</a>. As far as I can tell, there is no API in specific to mess with the registers. But we don't need any of that! We can just go ahead and <a href=https://doc.rust-lang.org/stable/unstable-book/library-features/asm.html>write some assembly by hand</a> to access these registers. At the time of writing, inline assembly is unstable, so we need a nightly compiler. Run <code>rustup toolchain install nightly</code> if you haven't yet, and execute the following code with <code>cargo +nightly run</code>:<pre><code class=language-rust data-lang=rust>#![feature(asm)] // top of the file
  2
  3fn main() {
  4    let x: u64 = 123;
  5    unsafe {
  6        asm!("mov dr7, {}", in(reg) x);
  7    }
  8}
  9
 10</code></pre><p><code>dr7</code> stands is the <a href=https://en.wikipedia.org/wiki/X86_debug_register>debug control register</a>, and running this we get…<pre><code>>cargo +nightly run
 11   Compiling memo v0.1.0
 12    Finished dev [unoptimized + debuginfo] target(s) in 0.74s
 13     Running `target\debug\memo.exe`
 14error: process didn't exit successfully: `target\debug\memo.exe` (exit code: 0xc0000096, STATUS_PRIVILEGED_INSTRUCTION)
 15</code></pre><p>…an exception! In all fairness, I have no idea what that code would have done. So maybe the <code>STATUS_PRIVILEGED_INSTRUCTION</code> is just trying to protect us. Can we read from the register instead, and see it's default value?<pre><code class=language-rust data-lang=rust>let x: u64;
 16unsafe {
 17    asm!("mov {}, dr7", out(reg) x);
 18}
 19assert_eq!(x, 5);
 20</code></pre><pre><code>>cargo +nightly run
 21...
 22error: process didn't exit successfully: `target\debug\memo.exe` (exit code: 0xc0000096, STATUS_PRIVILEGED_INSTRUCTION)
 23</code></pre><p>Nope. Okay, it seems directly reading from or writing to the debug register is a ring-0 thing. Surely there's a way around this. But first we should figure out how to enumerate and pause all the threads.<h2 id=pausing-all-the-threads>Pausing all the threads</h2><p>It seems there is no straightforward way to enumerate the threads. One has to <a href=https://stackoverflow.com/a/1206915/>create a "toolhelp"</a> and poll the entries. I won't bore you with the details. Let's add <code>tlhelp32</code> to the crate features of <code>winapi</code> and try it out:<pre><code class=language-rust data-lang=rust>
 24#[derive(Debug)]
 25pub struct Toolhelp {
 26    handle: winapi::um::winnt::HANDLE,
 27}
 28
 29impl Drop for Toolhelp {
 30    fn drop(&mut self) {
 31        unsafe { winapi::um::handleapi::CloseHandle(self.handle) };
 32    }
 33}
 34
 35pub fn enum_threads(pid: u32) -> io::Result&LTVec&LTu32>> {
 36    const ENTRY_SIZE: u32 = mem::size_of::&LTwinapi::um::tlhelp32::THREADENTRY32>() as u32;
 37
 38    // size_of(dwSize + cntUsage + th32ThreadID + th32OwnerProcessID)
 39    const NEEDED_ENTRY_SIZE: u32 = 4 * mem::size_of::&LTDWORD>() as u32;
 40
 41    // SAFETY: it is always safe to attempt to call this function.
 42    let handle = unsafe {
 43        winapi::um::tlhelp32::CreateToolhelp32Snapshot(winapi::um::tlhelp32::TH32CS_SNAPTHREAD, 0)
 44    };
 45    if handle == winapi::um::handleapi::INVALID_HANDLE_VALUE {
 46        return Err(io::Error::last_os_error());
 47    }
 48    let toolhelp = Toolhelp { handle };
 49
 50    let mut result = Vec::new();
 51    let mut entry = winapi::um::tlhelp32::THREADENTRY32 {
 52        dwSize: ENTRY_SIZE,
 53        cntUsage: 0,
 54        th32ThreadID: 0,
 55        th32OwnerProcessID: 0,
 56        tpBasePri: 0,
 57        tpDeltaPri: 0,
 58        dwFlags: 0,
 59    };
 60
 61    // SAFETY: we have a valid handle, and point to memory we own with the right size.
 62    if unsafe { winapi::um::tlhelp32::Thread32First(toolhelp.handle, &mut entry) } != FALSE {
 63        loop {
 64            if entry.dwSize >= NEEDED_ENTRY_SIZE && entry.th32OwnerProcessID == pid {
 65                result.push(entry.th32ThreadID);
 66            }
 67
 68            entry.dwSize = ENTRY_SIZE;
 69            // SAFETY: we have a valid handle, and point to memory we own with the right size.
 70            if unsafe { winapi::um::tlhelp32::Thread32Next(toolhelp.handle, &mut entry) } == FALSE {
 71                break;
 72            }
 73        }
 74    }
 75
 76    Ok(result)
 77}
 78</code></pre><p>Annoyingly, invalid handles returned by <a href=https://docs.microsoft.com/en-us/windows/win32/api/tlhelp32/nf-tlhelp32-createtoolhelp32snapshot><code>CreateToolhelp32Snapshot</code></a>, are <code>INVALID_HANDLE_VALUE</code> (which is -1), not null. But that's not a big deal, we simply can't use <code>NonNull</code> here. The function ignores the process identifier when using <code>TH32CS_SNAPTHREAD</code>, used to include all threads, and we need to compare the process identifier ourselves.<p>In summary, we create a "toolhelp" (wrapped in a helper <code>struct</code> so that whatever happens, <code>Drop</code> will clean it up), initialize a thread enntry (with everything but the structure size to zero) and call <code>Thread32First</code> the first time, <code>Thread32Next</code> subsequent times. It seems to work all fine!<pre><code class=language-rust data-lang=rust>dbg!(process::enum_threads(pid));
 79</code></pre><pre><code>[src\main.rs:46] process::enum_threads(pid) = Ok(
 80    [
 81        10560,
 82    ],
 83)
 84</code></pre><p>According to this, the Cheat Engine tutorial is only using one thread. Good to know. Much like processes, threads need to be opened before we can use them, with <a href=https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-openthread><code>OpenThread</code></a>:<pre><code class=language-rust data-lang=rust>pub struct Thread {
 85    tid: u32,
 86    handle: NonNull&LTc_void>,
 87}
 88
 89impl Thread {
 90    pub fn open(tid: u32) -> io::Result&LTSelf> {
 91        // SAFETY: the call doesn't have dangerous side-effects
 92        NonNull::new(unsafe {
 93            winapi::um::processthreadsapi::OpenThread(
 94                winapi::um::winnt::THREAD_SUSPEND_RESUME,
 95                FALSE,
 96                tid,
 97            )
 98        })
 99        .map(|handle| Self { tid, handle })
100        .ok_or_else(io::Error::last_os_error)
101    }
102
103    pub fn tid(&self) -> u32 {
104        self.tid
105    }
106}
107
108impl Drop for Thread {
109    fn drop(&mut self) {
110        unsafe { winapi::um::handleapi::CloseHandle(self.handle.as_mut()) };
111    }
112}
113</code></pre><p>Just your usual RAII pattern. The thread is opened with permission to suspend and resume it. Let's try to pause the handles with <a href=https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-suspendthread><code>SuspendThread</code></a> to make sure that this thread is actually the one we're looking for:<pre><code class=language-rust data-lang=rust>pub fn suspend(&mut self) -> io::Result&LTusize> {
114    // SAFETY: the handle is valid.
115    let ret = unsafe {
116        winapi::um::processthreadsapi::SuspendThread(self.handle.as_ptr())
117    };
118    if ret == -1i32 as u32 {
119        Err(io::Error::last_os_error())
120    } else {
121        Ok(ret as usize)
122    }
123}
124
125pub fn resume(&mut self) -> io::Result&LTusize> {
126    // SAFETY: the handle is valid.
127    let ret = unsafe {
128        winapi::um::processthreadsapi::ResumeThread(self.handle.as_ptr())
129    };
130    if ret == -1i32 as u32 {
131        Err(io::Error::last_os_error())
132    } else {
133        Ok(ret as usize)
134    }
135}
136</code></pre><p>Both suspend and resume return the previous "suspend count". It's kind of like a barrier or semaphore where the thread only runs if the suspend count is zero. Trying it out:<pre><code class=language-rust data-lang=rust>let mut threads = thread::enum_threads(pid)
137    .unwrap()
138    .into_iter()
139    .map(Thread::open)
140    .collect::&LTResult&LTVec<_>, _>>()
141    .unwrap();
142
143threads
144    .iter_mut()
145    .for_each(|thread| {
146        println!("Pausing thread {} for 10 seconds…", thread.tid());
147        thread.suspend().unwrap();
148
149        std::thread::sleep(std::time::Duration::from_secs(10));
150
151        println!("Wake up, {}!", thread.tid());
152        thread.resume().unwrap();
153    });
154</code></pre><p>If you run this code with the process ID of the Cheat Engine tutorial, you will see that the tutorial window freezes for ten seconds! Because the main and only thread is paused, it cannot process any window events, so it becomes unresponsive. It is now "safe" to mess around with the thread context.<h2 id=setting-hardware-breakpoints>Setting hardware breakpoints</h2><p>I'm definitely not the first person to wonder <a href=https://social.msdn.microsoft.com/Forums/en-US/0cb3360d-3747-42a7-bc0e-668c5d9ee1ee/how-to-set-a-hardware-breakpoint>How to set a hardware breakpoint?</a>. This is great, because it means I don't need to ask that question myself. It appears we need to change the debug register <em>via the thread context</em>.<p>One has to be careful to use the right context structure. Confusingly enough, <a href=https://stackoverflow.com/q/17504174/><code>WOW64_CONTEXT</code></a> is 32 bits, not 64. <code>CONTEXT</code> alone seems to be the right one:<pre><code class=language-rust data-lang=rust>pub fn get_context(&self) -> io::Result&LTwinapi::um::winnt::CONTEXT> {
155    let context = MaybeUninit::&LTwinapi::um::winnt::CONTEXT>::zeroed();
156    // SAFETY: it's a C struct, and all-zero is a valid bit-pattern for the type.
157    let mut context = unsafe { context.assume_init() };
158    context.ContextFlags = winapi::um::winnt::CONTEXT_ALL;
159
160    // SAFETY: the handle is valid and structure points to valid memory.
161    if unsafe {
162        winapi::um::processthreadsapi::GetThreadContext(self.handle.as_ptr(), &mut context)
163    } == FALSE
164    {
165        Err(io::Error::last_os_error())
166    } else {
167        Ok(context)
168    }
169}
170</code></pre><p>Trying it out:<pre><code class=language-rust data-lang=rust>thread.suspend().unwrap();
171
172let context = thread.get_context().unwrap();
173println!("Dr0: {:016x}", context.Dr0);
174println!("Dr7: {:016x}", context.Dr7);
175println!("Dr6: {:016x}", context.Dr6);
176println!("Rax: {:016x}", context.Rax);
177println!("Rbx: {:016x}", context.Rbx);
178println!("Rcx: {:016x}", context.Rcx);
179println!("Rip: {:016x}", context.Rip);
180</code></pre><pre><code>Dr0: 0000000000000000
181Dr7: 0000000000000000
182Dr6: 0000000000000000
183Rax: 0000000000001446
184Rbx: 0000000000000000
185Rcx: 0000000000000000
186Rip: 00007ffda4259904
187</code></pre><p>Looks about right! Hm, I wonder what happens if I use Cheat Engine to add the watchpoint on the memory location we care about?<pre><code>Dr0: 000000000157e650
188Dr7: 00000000000d0001
189</code></pre><p>Look at that! The debug registers changed! DR0 contains the location we want to watch for writes, and the debug control register DR7 changed. Cheat Engine sets the same values on all threads (for some reason I now see more than one thread printed for the tutorial, not sure what's up with that; maybe the single-thread is the weird one out).<p>Hmm, what happens if I watch for access instead of write?<pre><code>Dr0: 000000000157e650
190Dr7: 00000000000f0001
191</code></pre><p>What if I set both?<pre><code>Dr0: 000000000157e650
192Dr7: 0000000000fd0005
193</code></pre><p>Most intriguing! This was done by telling Cheat Engine to find "what writes" to the address, then "what accesses" the address. I wonder if the order matters?<pre><code>Dr0: 000000000157e650
194Dr7: 0000000000df0005
195</code></pre><p>"What accesses" and then "what writes" does change it. Very well! We're only concerned in a single breakpoint, so we won't worry about this, but it's good to know that we can inspect what Cheat Engine is doing. It's also interesting to see how Cheat Engine is using hardware breakpoints and not software breakpoints.<p>For simplicity, our code is going to assume that we're the only ones messing around with the debug registers, and that there will only be a single debug register in use. Make sure to add <code>THREAD_SET_CONTEXT</code> to the permissions when opening the thread handle:<pre><code class=language-rust data-lang=rust>pub fn set_context(&self, context: &winapi::um::winnt::CONTEXT) -> io::Result<()> {
196    // SAFETY: the handle is valid and structure points to valid memory.
197    if unsafe {
198        winapi::um::processthreadsapi::SetThreadContext(self.handle.as_ptr(), context)
199    } == FALSE
200    {
201        Err(io::Error::last_os_error())
202    } else {
203        Ok(())
204    }
205}
206
207pub fn watch_memory_write(&self, addr: usize) -> io::Result<()> {
208    let mut context = self.get_context()?;
209    context.Dr0 = addr as u64;
210    context.Dr7 = 0x00000000000d0001;
211    self.set_context(&context)?;
212    todo!()
213}
214</code></pre><p>If we do this (and temporarily get rid of the <code>todo!()</code>), trying to change the value in the Cheat Engine tutorial will greet us with a warm message:<blockquote><p><strong>Tutorial-x86_64</strong><p>External exception 80000004.<p>Press OK to ignore and risk data corruption.<br> Press Abort to kill the program.<p><kbd>OK</kbd> <kbd>Abort</kbd></blockquote><p>There is no debugger attached yet that could possibly handle this exception, so the exception just propagates. Let's fix that.<h2 id=handling-debug-events>Handling debug events</h2><p>Now that we've succeeded on setting breakpoints, we can actually follow the steps described in <a href=https://docs.microsoft.com/en-us/windows/win32/debug/creating-a-basic-debugger>Creating a Basic Debugger</a>. It starts by saying that we should use <a href=https://docs.microsoft.com/en-us/windows/win32/api/debugapi/nf-debugapi-debugactiveprocess><code>DebugActiveProcess</code></a> to attach our processor, the debugger, to the process we want to debug, the debuggee. This function lives under the <code>debugapi</code> header, so add it to <code>winapi</code> features:<pre><code class=language-rust data-lang=rust>pub struct DebugToken {
215    pid: u32,
216}
217
218pub fn debug(pid: u32) -> io::Result&LTDebugToken> {
219    if unsafe { winapi::um::debugapi::DebugActiveProcess(pid) } == FALSE {
220        return Err(io::Error::last_os_error());
221    };
222    let token = DebugToken { pid };
223    if unsafe { winapi::um::winbase::DebugSetProcessKillOnExit(FALSE) } == FALSE {
224        return Err(io::Error::last_os_error());
225    };
226    Ok(token)
227}
228
229impl Drop for DebugToken {
230    fn drop(&mut self) {
231        unsafe { winapi::um::debugapi::DebugActiveProcessStop(self.pid) };
232    }
233}
234</code></pre><p>Once again, we create a wrapper <code>struct</code> with <code>Drop</code> to stop debugging the process once the token is dropped. The call to <code>DebugSetProcessKillOnExit</code> in our <code>debug</code> method ensures that, if our process (the debugger) dies, the process we're debugging (the debuggee) stays alive. We don't want to be restarting the entire Cheat Engine tutorial every time our Rust code crashes!<p>With the debugger attached, we can wait for debug events. We will put this method inside of <code>impl DebugToken</code>, so that the only way you can call it is if you successfully attached to another process:<pre><code class=language-rust data-lang=rust>impl DebugToken {
235    pub fn wait_event(
236        &self,
237        timeout: Option&LTDuration>,
238    ) -> io::Result&LTwinapi::um::minwinbase::DEBUG_EVENT> {
239        let mut result = MaybeUninit::uninit();
240        let timeout = timeout
241            .map(|d| d.as_millis().try_into().ok())
242            .flatten()
243            .unwrap_or(winapi::um::winbase::INFINITE);
244
245        // SAFETY: can only wait for events with a token, so the debugger is active.
246        if unsafe { winapi::um::debugapi::WaitForDebugEvent(result.as_mut_ptr(), timeout) } == FALSE
247        {
248            Err(io::Error::last_os_error())
249        } else {
250            // SAFETY: the call returned non-zero, so the structure is initialized.
251            Ok(unsafe { result.assume_init() })
252        }
253    }
254}
255</code></pre><p><code>WaitForDebugEvent</code> wants a timeout in milliseconds, so our function lets the user pass the more Rusty <code>Duration</code> type. <code>None</code> will indicate "there is no timeout", i.e., it's infinite. If the duration is too large to fit in the <code>u32</code> (<code>try_into</code> fails), it will also be infinite.<p>If we attach the debugger, set the hardware watchpoint, and modify the memory location from the tutorial, an event with <code>dwDebugEventCode = 3</code> will be returned! Now, back to the page with the <a href=https://docs.microsoft.com/en-us/windows/win32/debug/debugging-events>Debugging Events</a>… Gah! It only has the name of the constants, not the values. Well, good thing <a href=https://docs.rs/>docs.rs</a> has a source view! We can just check the values in the <a href=https://docs.rs/winapi/0.3.9/src/winapi/um/minwinbase.rs.html#203-211>source code for <code>winapi</code></a>:<pre><code class=language-rust data-lang=rust>pub const EXCEPTION_DEBUG_EVENT: DWORD = 1;
256pub const CREATE_THREAD_DEBUG_EVENT: DWORD = 2;
257pub const CREATE_PROCESS_DEBUG_EVENT: DWORD = 3;
258pub const EXIT_THREAD_DEBUG_EVENT: DWORD = 4;
259pub const EXIT_PROCESS_DEBUG_EVENT: DWORD = 5;
260pub const LOAD_DLL_DEBUG_EVENT: DWORD = 6;
261pub const UNLOAD_DLL_DEBUG_EVENT: DWORD = 7;
262pub const OUTPUT_DEBUG_STRING_EVENT: DWORD = 8;
263pub const RIP_EVENT: DWORD = 9;
264</code></pre><p>So, we've got a <code>CREATE_PROCESS_DEBUG_EVENT</code>:<blockquote><p>Generated whenever a new process is created in a process being debugged or whenever the debugger begins debugging an already active process. The system generates this debugging event before the process begins to execute in user mode and before the system generates any other debugging events for the new process.</blockquote><p>It makes sense that this is our first event. By the way, if you were trying this out with a <code>sleep</code> lying around in your code, you may have noticed that the window froze until the debugger terminated. That's because:<blockquote><p>When the system notifies the debugger of a debugging event, it also suspends all threads in the affected process. The threads do not resume execution until the debugger continues the debugging event by using <a href=https://docs.microsoft.com/en-us/windows/win32/api/debugapi/nf-debugapi-continuedebugevent><code>ContinueDebugEvent</code></a>.</blockquote><p>Let's call <code>ContinueDebugMethod</code> but also wait on more than one event and see what happens:<pre><code class=language-rust data-lang=rust>for _ in 0..10 {
265    let event = debugger.wait_event(None).unwrap();
266    println!("Got {}", event.dwDebugEventCode);
267    debugger.cont(event, true).unwrap();
268}
269</code></pre><pre><code>Got 3
270Got 6
271Got 6
272Got 6
273Got 6
274Got 6
275Got 6
276Got 6
277Got 6
278Got 6
279</code></pre><p>That's a lot of <code>LOAD_DLL_DEBUG_EVENT</code>. Pumping it up to one hundred and also showing the index we get the following:<pre><code>0. Got 3
2801. Got 6
281...
28240. Got 6
28341. Got 2
28442. Got 1
28543. Got 4
286</code></pre><p>In order, we got:<ul><li>One <code>CREATE_PROCESS_DEBUG_EVENT</code>.<li>Forty <code>LOAD_DLL_DEBUG_EVENT</code>.<li>One <code>CREATE_THREAD_DEBUG_EVENT</code>.<li>One <code>EXCEPTION_DEBUG_EVENT</code>.<li>One <code>EXIT_THREAD_DEBUG_EVENT</code>.</ul><p>And, if after all this, you change the value in the Cheat Engine tutorial (thus triggering our watch point), we get <code>EXCEPTION_DEBUG_EVENT</code>!<blockquote><p>Generated whenever an exception occurs in the process being debugged. Possible exceptions include attempting to access inaccessible memory, executing breakpoint instructions, attempting to divide by zero, or any other exception noted in Structured Exception Handling.</blockquote><p>If we print out all the fields in the <a href=https://docs.microsoft.com/en-us/windows/win32/api/minwinbase/ns-minwinbase-exception_debug_info><code>EXCEPTION_DEBUG_INFO</code></a> structure:<pre><code>Watching writes to 10e3a0 for 10s
287First chance: 1
288ExceptionCode: 2147483652
289ExceptionFlags: 0
290ExceptionRecord: 0x0
291ExceptionAddress: 0x10002c5ba
292NumberParameters: 0
293ExceptionInformation: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
294</code></pre><p>The <code>ExceptionCode</code>, which is <code>0x80000004</code>, corresponds with <code>EXCEPTION_SINGLE_STEP</code>:<blockquote><p>A trace trap or other single-instruction mechanism signaled that one instruction has been executed.</blockquote><p>The <code>ExceptionAddress</code> is supposed to be "the address where the exception occurred". Very well! I have already completed this step of the tutorial, and I know the instruction is <code>mov [rax],edx</code> (or, as Cheat Engine shows, the bytes <code>89 10</code> in hexadecimal). The opcode for the <code>nop</code> instruction is <code>90</code> in hexadecimal, so if we replace two bytes at this address, we should be able to complete the tutorial.<p>Note that we also need to flush the instruction cache, as noted in the Windows documentation:<blockquote><p>Debuggers frequently read the memory of the process being debugged and write the memory that contains instructions to the instruction cache. After the instructions are written, the debugger calls the <a href=https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-flushinstructioncache><code>FlushInstructionCache</code></a> function to execute the cached instructions.</blockquote><p>So we add a new method to <code>impl Process</code>:<pre><code class=language-rust data-lang=rust>/// Flushes the instruction cache.
295///
296/// Should be called when writing to memory regions that contain code.
297pub fn flush_instruction_cache(&self) -> io::Result<()> {
298    // SAFETY: the call doesn't have dangerous side-effects.
299    if unsafe {
300        winapi::um::processthreadsapi::FlushInstructionCache(
301            self.handle.as_ptr(),
302            ptr::null(),
303            0,
304        )
305    } == FALSE
306    {
307        Err(io::Error::last_os_error())
308    } else {
309        Ok(())
310    }
311}
312</code></pre><p>And write some quick and dirty code to get this done:<pre><code class=language-rust data-lang=rust>let addr = ...;
313println!("Watching writes to {:x} for 10s", addr);
314threads.iter_mut().for_each(|thread| {
315    thread.watch_memory_write(addr).unwrap();
316});
317loop {
318    let event = debugger.wait_event(None).unwrap();
319    if event.dwDebugEventCode == 1 {
320        let exc = unsafe { event.u.Exception() };
321        if exc.ExceptionRecord.ExceptionCode == 2147483652 {
322            let addr = exc.ExceptionRecord.ExceptionAddress as usize;
323            match process.write_memory(addr, &[0x90, 0x90]) {
324                Ok(_) => eprintln!("Patched [{:x}] with NOP", addr),
325                Err(e) => eprintln!("Failed to patch [{:x}] with NOP: {}", addr, e),
326            };
327            process.flush_instruction_cache().unwrap();
328            debugger.cont(event, true).unwrap();
329            break;
330        }
331    }
332    debugger.cont(event, true).unwrap();
333}
334</code></pre><p>Although it seems to work:<pre><code>Watching writes to 15103f0 for 10s
335Patched [10002c5ba] with NOP
336</code></pre><p>It really doesn't:<blockquote><p><strong>Tutorial-x86_64</strong><p>Access violation.<p>Press OK to ignore and risk data corruption.<br> Press Abort to kill the program.<p><kbd>OK</kbd> <kbd>Abort</kbd></blockquote><p>Did we write memory somewhere we shouldn't? The documentation does mention "segment-relative" and "linear virtual addresses":<blockquote><p><code>GetThreadSelectorEntry</code> returns the descriptor table entry for a specified selector and thread. Debuggers use the descriptor table entry to convert a segment-relative address to a linear virtual address. The <code>ReadProcessMemory</code> and <code>WriteProcessMemory</code> functions require linear virtual addresses.</blockquote><p>But nope! This isn't the problem. The problem is that the <code>ExceptionRecord.ExceptionAddress</code> is <em>after</em> the execution happened, so it's already 2 bytes beyond where it should be. We were accidentally writing out the first half of the next instruction, which, yeah, could not end good.<p>So does it work if I do this instead?:<pre><code class=language-rust data-lang=rust>process.write_memory(addr - 2, &[0x90, 0x90])
337//                        ^^^ new
338</code></pre><p>This totally does work. Step 5: complete 🎉<h2 id=properly-patching-instructions>Properly patching instructions</h2><p>You may not be satisfied at all with our solution. Not only are we hardcoding some magic constants to set hardware watchpoints, we're also relying on knowledge specific to the Cheat Engine tutorial (insofar that we're replacing two bytes worth of instruction with NOPs).<p>Properly supporting more than one hardware breakpoint, along with supporting different types of breakpoints, is definitely doable. The meaning of the bits for the debug registers is well defined, and you can definitely study that to come up with <a href=https://github.com/mmorearty/hardware-breakpoints>something more sophisticated</a> and support multiple different breakpoints. But for now, that's out of the scope of this series. The tutorial only wants us to use an on-write watchpoint, and our solution is fine and portable for that use case.<p>However, relying on the size of the instructions is pretty bad. The instructions x86 executes are of variable length, so we can't possibly just look back until we find the previous instruction, or even naively determine its length. A lot of unrelated sequences of bytes are very likely instructions themselves. We need a disassembler. No, we're not writing our own<sup class=footnote-reference><a href=#4>4</a></sup>.<p>Searching on <a href=https://crates.io>crates.io</a> for "disassembler" yields a few results, and the first one I've found is <a href=https://crates.io/crates/iced-x86>iced-x86</a>. I like the name, it has a decent amount of GitHub stars, and it was last updated less than a month ago. I don't know about you, but I think we've just hit a jackpot!<p>It's quite heavy though, so I will add it behind a feature gate, and users that want it may opt into it:<pre><code class=language-toml data-lang=toml>[features]
339patch-nops = ["iced-x86"]
340
341[dependencies]
342iced-x86 = { version = "1.10.3", optional = true }
343</code></pre><p>You can make use of it with <code>cargo run --features=patch-nops</code>. I don't want to turn this blog post into a tutorial for <code>iced-x86</code>, but in essence, we need to make use of its <code>Decoder</code>. Here's the plan:<ol><li>Find the memory region corresponding to the address we want to patch.<li>Read the entire region.<li>Decode the read bytes until the instruction pointer reaches our address.<li>Because we just parsed the previous instruction, we know its length, and can be replaced with NOPs.</ol><pre><code class=language-rust data-lang=rust>#[cfg(feature = "patch-nops")]
344pub fn nop_last_instruction(&self, addr: usize) -> io::Result<()> {
345    use iced_x86::{Decoder, DecoderOptions, Formatter, Instruction, NasmFormatter};
346
347    let region = self
348        .memory_regions()
349        .into_iter()
350        .find(|region| {
351            let base = region.BaseAddress as usize;
352            base <= addr && addr < base + region.RegionSize
353        })
354        .ok_or_else(|| io::Error::new(io::ErrorKind::Other, "no matching region found"))?;
355
356    let bytes = self.read_memory(region.BaseAddress as usize, region.RegionSize)?;
357
358    let mut decoder = Decoder::new(64, &bytes, DecoderOptions::NONE);
359    decoder.set_ip(region.BaseAddress as _);
360
361    let mut instruction = Instruction::default();
362    while decoder.can_decode() {
363        decoder.decode_out(&mut instruction);
364        if instruction.next_ip() as usize == addr {
365            return self
366                .write_memory(instruction.ip() as usize, &vec![0x90; instruction.len()])
367                .map(drop);
368        }
369    }
370
371    Err(io::Error::new(
372        io::ErrorKind::Other,
373        "no matching instruction found",
374    ))
375}
376</code></pre><p>Pretty straightforward! We can set the "instruction pointer" of the decoder so that it matches with the address we're reading from. The <code>next_ip</code> method comes in really handy. Overall, it's a bit inefficient, because we could reuse the regions retrieved previously, but other than that, there is not much room for improvement.<p>With this, we are no longer hardcoding the instruction size or guessing which instruction is doing what. You may wonder, what if the region does not start with valid executable code? It could be possible that the instructions are in some memory region with garbage except for a very specific location with real code. I don't know how Cheat Engine handles this, but I think it's reasonable to assume that the region starts with valid code.<p>As far as I can tell (after having asked a bit around), the encoding is usually self synchronizing (similar to UTF-8), so eventually we should end up with correct instructions. But someone can still intentionally write real code between garbage data which we would then disassemble incorrectly. This is a problem on all variable-length ISAs. Half a solution is to <a href=https://stackoverflow.com/q/3983735/>start at the entry point</a>, decode all instructions, and follow the jumps. The other half would be correctly identifying jumps created just to trip a disassembler up, and jumps pointing to dynamically-calculated addresses!<h2 id=finale>Finale</h2><p>That was quite a deep dive! We have learnt about the existence of the various breakpoint types (software, hardware, and even behaviour, such as watchpoints), how to debug a separate process, and how to correctly update the code other process is running on-the-fly. The <a href=https://github.com/lonami/memo>code for this post</a> is available over at my GitHub. You can run <code>git checkout step5</code> after cloning the repository to get the right version of the code.<p>Although we've only talked about <em>setting</em> breakpoints, there are of course <a href=https://reverseengineering.stackexchange.com/a/16547>ways of detecting them</a>. There's <a href=https://www.codeproject.com/Articles/30815/An-Anti-Reverse-Engineering-Guide>entire guides about it</a>. Again, we currently hardcode the fact we want to add a single watchpoint using the first debug register. A proper solution here would be to actually calculate the needs that need to be set, as well as keeping track of how many breakpoints have been added so far.<p>Hardware breakpoints are also limited, since they're simply a bunch of registers, and our machine does not have infinite registers. How are other debuggers like <code>gdb</code> able to create a seemingly unlimited amount of breakpoints? Well, the GDB wiki actually has a page on <a href=https://sourceware.org/gdb/wiki/Internals%20Watchpoints>Internals Watchpoints</a>, and it's really interesting! <code>gdb</code> essentially single-steps through the entire program and tests the expressions after every instruction:<blockquote><p>Software watchpoints are very slow, since GDB needs to single-step the program being debugged and test the value of the watched expression(s) after each instruction.</blockquote><p>However, that's not the only way. One could <a href=https://stackoverflow.com/a/7805842/>change the protection level</a> of the region of interest (for example, remove the write permission), and when the program tries to write there, it will fail! In any case, the GDB wiki is actually a pretty nice resource. It also has a section on <a href=https://sourceware.org/gdb/wiki/Internals/Breakpoint%20Handling>Breakpoint Handling</a>, which contains some additional insight.<p>With regards to code improvements, <code>DebugToken::wait_event</code> could definitely be both nicer and safer to use, with a custom <code>enum</code>, so the user does not need to rely on magic constants or having to resort to <code>unsafe</code> access to get the right <code>union</code> variant.<p>In the next post, we'll tackle the sixth step of the tutorial: Pointers. It reuses the debugging techniques presented here to backtrack where the pointer for our desired value is coming from, so here we will need to actually <em>understand</em> what the instructions are doing, not just patching them out!<h3 id=footnotes>Footnotes</h3><div class=footnote-definition id=1><sup class=footnote-definition-label>1</sup><p>I'm not super happy about the design of it all, but we won't actually need anything beyond scanning for integers for the rest of the steps so it doesn't really matter.</div><div class=footnote-definition id=2><sup class=footnote-definition-label>2</sup><p>There seems to be a way to pause the entire process in one go, with the <a href=https://stackoverflow.com/a/4062698/>undocumented <code>NtSuspendProcess</code></a> function!</div><div class=footnote-definition id=3><sup class=footnote-definition-label>3</sup><p>It really is called that. The naming went from "IP" (instruction pointer, 16 bits), to "EIP" (extended instruction pointer, 32 bits) and currently "RIP" (64 bits). The naming convention for upgraded registers is the same (RAX, RBX, RCX, and so on). The <a href=https://wiki.osdev.org/CPU_Registers_x86_64>OS Dev wiki</a> is a great resource for this kind of stuff.</div><div class=footnote-definition id=4><sup class=footnote-definition-label>4</sup><p>Well, we don't need an entire disassembler. Knowing the length of each instruction is enough, but that on its own is also a lot of work.</div></main><footer><div><p>Share your thoughts, or simply come hang with me <a href=https://t.me/LonamiWebs><img src=/img/telegram.svg alt=Telegram></a> <a href=mailto:totufals@hotmail.com><img src=/img/mail.svg alt=Mail></a></div></footer></article><p class=abyss>Glaze into the abyss… Oh hi there!