all repos — gemini-redirect @ 0a8895be079f671db2aedbd86026adb956160a91

content/blog/woce-3.md (view raw)

  1+++
  2title = "Writing our own Cheat Engine: Unknown initial value"
  3date = 2021-02-19
  4updated = 2021-02-19
  5[taxonomies]
  6category = ["sw"]
  7tags = ["windows", "rust", "hacking"]
  8+++
  9
 10This is part 3 on the *Writing our own Cheat Engine* series:
 11
 12* [Part 1: Introduction](/blog/woce-1) (start here if you're new to the series!)
 13* [Part 2: Exact Value scanning](/blog/woce-2)
 14* Part 3: Unknown initial value
 15* [Part 4: Floating points](/blog/woce-4)
 16* [Part 5: Code finder](/blog/woce-5)
 17
 18In part 2 we left off with a bit of a cliff-hanger. Our little program is now able to scan for an exact value, remember the couple hundred addresses pointing to said value, and perform subsequent scans to narrow the list of addresses down until we're left with a handful of them.
 19
 20However, it is not always the case that you have an exact value to work with. The best you can do in these cases is guess what the software might be storing. For example, it could be a floating point for your current movement speed in a game, or an integer for your current health.
 21
 22The problem with this is that there are far too many possible locations storing our desired value. If you count misaligned locations, this means there is a different location to address every single byte in memory. A program with one megabyte of memory already has a *million* of addresses. Clearly, we need to do better than performing one million memory reads[^1].
 23
 24This post will shift focus a bit from using `winapi` to possible techniques to perform the various scans.
 25
 26## Unknown initial value
 27
 28<details open><summary>Cheat Engine Tutorial: Step 3</summary>
 29
 30> Ok, seeing that you've figured out how to find a value using exact value let's move on to the next step.
 31>
 32> First things first though. Since you are doing a new scan, you have to click on New Scan first, to start a new scan. (You may think this is straighforward, but you'd be surprised how many people get stuck on that step) I won't be explaining this step again, so keep this in mind
 33> Now that you've started a new scan, let's continue
 34>
 35> In the previous test we knew the initial value so we could do a exact value, but now we have a status bar where we don't know the starting value.
 36> We only know that the value is between 0 and 500. And each time you click 'hit me' you lose some health. The amount you lose each time is shown above the status bar.
 37>
 38> Again there are several different ways to find the value. (like doing a decreased value by... scan), but I'll only explain the easiest. "Unknown initial value", and decreased value.
 39> Because you don't know the value it is right now, a exact value wont do any good, so choose as scantype 'Unknown initial value', again, the value type is 4-bytes. (most windows apps use 4-bytes)click first scan and wait till it's done.
 40>
 41> When it is done click 'hit me'. You'll lose some of your health. (the amount you lost shows for a few seconds and then disappears, but you don't need that)
 42> Now go to Cheat Engine, and choose 'Decreased Value' and click 'Next Scan'
 43> When that scan is done, click hit me again, and repeat the above till you only find a few.
 44>
 45> We know the value is between 0 and 500, so pick the one that is most likely the address we need, and add it to the list.
 46> Now change the health to 5000, to proceed to the next step.
 47
 48</details>
 49
 50## Dense memory locations
 51
 52The key thing to notice here is that, when we read memory from another process, we do so over *entire regions*. A memory region is represented by a starting offset, a size, and a bunch of other things like protection level.
 53
 54When running the first scan for an unknown value, all we need to remember is the starting offset and size for every single region. All the candidate locations that could point to our value fall within this range, so it is enough for us to store the range definition, and not every location within it.
 55
 56To gain a better understanding of what this means, let's come up with a more specific scenario. With our current approach of doing things, we store an address (`usize`) for every location pointing to our desired value. In the case of unknown values, all locations are equally valid, since we don't know what value they should point to yet, and any value they point to is good. With this representation, we would end up with a very large vector:
 57
 58```rust
 59let locations = vec![0x2000, 0x2001, ..., 0x20ff, 0x2100];
 60```
 61
 62This representation is dense. Every single number in the range `0x2000..=0x2100` is present. So why bother storing the values individually when the range is enough?:
 63
 64```rust
 65let locations = EntireRegion { range: 0x2000..=0x2100 };
 66```
 67
 68Much better! With two `usize`, one for the starting location and another for the end, we can indicate that we care about all the locations falling in that range.
 69
 70In fact, some accessible memory regions immediately follow eachother, so we could even compact this further and merge regions which are together. But due to their potential differences with regards to protection levels, we will not attempt to merge regions.
 71
 72We don't want to get rid of the old way of storing locations, because once we start narrowing them down, we will want to go back to storing just a few candidates. To keep things tidy, let's introduce a new `enum` representing either possibility:
 73
 74```rust
 75use std::ops::Range;
 76
 77pub enum CandidateLocations {
 78    Discrete {
 79        locations: Vec<usize>,
 80    },
 81    Dense {
 82        range: Range<usize>,
 83    }
 84}
 85```
 86
 87Let's also introduce another `enum` to perform the different scan types. For the time being, we will only worry about looking for `i32` in memory:
 88
 89```rust
 90pub enum Scan {
 91    Exact(i32),
 92    Unknown,
 93}
 94```
 95
 96## Storing scanned values
 97
 98When scanning for exact values, it's not necessary to store the value found. We already know they're all the same, for example, value `42`. However, if the value is unknown, we do need to store it so that we can compare it in a subsequent scan to see if the value is the same or it changed. This means the value can be "any within" the read memory chunk:
 99
100```rust
101pub enum Value {
102    Exact(i32),
103    AnyWithin(Vec<u8>),
104}
105```
106
107For every region in memory, there will be some candidate locations and a value (or value range) we need to compare against in subsequent scans:
108
109```rust
110pub struct Region {
111    pub info: winapi::um::winnt::MEMORY_BASIC_INFORMATION,
112    pub locations: CandidateLocations,
113    pub value: Value,
114}
115```
116
117With all the data structures needed setup, we can finally refactor our old scanning code into a new method capable of dealing with all these cases. For brevity, I will omit the exact scan, as it remains mostly unchanged:
118
119```rust
120use winapi::um::winnt::MEMORY_BASIC_INFORMATION;
121
122...
123
124// inside `impl Process`
125pub fn scan_regions(&self, regions: &[MEMORY_BASIC_INFORMATION], scan: Scan) -> Vec<Region> {
126    regions
127        .iter()
128        .flat_map(|region| match scan {
129            Scan::Exact(n) => todo!("old scan implementation"),
130            Scan::Unknown => {
131                let base = region.BaseAddress as usize;
132                match self.read_memory(region.BaseAddress as _, region.RegionSize) {
133                    Ok(memory) => Some(Region {
134                        info: region.clone(),
135                        locations: CandidateLocations::Dense {
136                            range: base..base + region.RegionSize,
137                        },
138                        value: Value::AnyWithin(memory),
139                    }),
140                    Err(_) => None,
141                }
142            }
143        })
144        .collect()
145}
146```
147
148Time to try it out!
149
150```rust
151impl CandidateLocations {
152    pub fn len(&self) -> usize {
153        match self {
154            CandidateLocations::Discrete { locations } => locations.len(),
155            CandidateLocations::Dense { range } => range.len(),
156        }
157    }
158}
159
160...
161
162fn main() {
163    // -snip-
164
165    println!("Scanning {} memory regions", regions.len());
166    let last_scan = process.scan_regions(&regions, Scan::Unknown);
167    println!(
168        "Found {} locations",
169        last_scan.iter().map(|r| r.locations.len()).sum::<usize>()
170    );
171}
172```
173
174```
175Scanning 88 memory regions
176Found 3014656 locations
177```
178
179If we consider misaligned locations, there is a lot of potential addresses where we could look for. Running the same scan on Cheat Engine yields `2,449,408` addresses, which is pretty close. It's probably skipping some additional regions that we are considering. Emulating Cheat Engine to perfection is not a concern for us at the moment, so I'm not going to investigate what regions it actually uses.
180
181## Comparing scanned values
182
183Now that we have performed the initial scan and have stored all the `CandidateLocations` and `Value`, we can re-implement the "next scan" step to handle any variant of our `Scan` enum. This enables us to mix-and-match any `Scan` mode in any order. For example, one could perform an exact scan, then one for decreased values, or start with unknown scan and scan for unchanged values.
184
185The tutorial suggests using "decreased value" scan, so let's start with that:
186
187```rust
188pub enum Scan {
189    Exact(i32),
190    Unknown,
191    Decreased, // new!
192}
193```
194
195Other scanning modes, such as decreased by a known amount rather than any decrease, increased, unchanged, changed and so on, are not very different from the "decreased" scan, so I won't bore you with the details.
196
197I will use a different method to perform a "rescan", since the first one is a bit more special in that it doesn't start with any previous values:
198
199```rust
200pub fn rescan_regions(&self, regions: &[Region], scan: Scan) -> Vec<Region> {
201    regions
202        .iter()
203        .flat_map(|region| match scan {
204            Scan::Decreased => {
205                let mut locations = Vec::new();
206                match region.locations {
207                    CandidateLocations::Dense { range } => {
208                        match self.read_memory(range.start, range.end - range.start) {
209                            Ok(memory) => match region.value {
210                                Value::AnyWithin(previous) => {
211                                    memory
212                                        .windows(4)
213                                        .zip(previous.windows(4))
214                                        .enumerate()
215                                        .step_by(4)
216                                        .for_each(|(offset, (new, old))| {
217                                            let new = i32::from_ne_bytes([
218                                                new[0], new[1], new[2], new[3],
219                                            ]);
220                                            let old = i32::from_ne_bytes([
221                                                old[0], old[1], old[2], old[3],
222                                            ]);
223                                            if new < old {
224                                                locations.push(range.start + offset);
225                                            }
226                                        });
227
228                                    Some(Region {
229                                        info: region.info.clone(),
230                                        locations: CandidateLocations::Discrete { locations },
231                                        value: Value::AnyWithin(memory),
232                                    })
233                                }
234                                _ => todo!(),
235                            },
236                            _ => todo!(),
237                        }
238                    }
239                    _ => todo!(),
240                }
241            }
242            _ => todo!(),
243        })
244        .collect()
245}
246```
247
248If you've skimmed over that, I do not blame you. Here's the summary: for every existing region, when executing the scan mode "decreased", if the previous locations were dense, read the entire memory region. On success, if the previous values were a chunk of memory, iterate over the current and old memory at the same time, and for every aligned `i32`, if the new value is less, store it.
249
250It's also making me ill. Before I leave a mess on the floor, does it work?
251
252```rust
253std::thread::sleep(std::time::Duration::from_secs(10));
254let last_scan = process.rescan_regions(&last_scan, Scan::Decreased);
255println!(
256    "Found {} locations",
257    last_scan.iter().map(|r| r.locations.len()).sum::<usize>()
258);
259```
260
261```rust
262Found 3014656 locations
263Found 177 locations
264```
265
266Okay, great, let's clean up this mess…
267
268## Refactoring
269
270Does it also make you uncomfortable to be writing something that you know will end up *huge* unless you begin refactoring other parts right now? I definitely feel that way. But I think it's good discipline to push through with something that works first, even if it's nasty, before going on a tangent. Now that we have the basic implementation working, let's take on this monster before it eats us alive.
271
272First things first, that method is inside an `impl` block. The deepest nesting level is 13. I almost have to turn around my chair to read the entire thing out!
273
274Second, we're nesting four matches. Three of them we care about: scan, candidate location, and value. If each of these `enum` has `S`, `C` and `V` variants respectively, writing each of these by hand will require `S * C * V` different implementations! Cheat Engine offers 10 different scans, I can think of at least 3 different ways to store candidate locations, and another 3 ways to store the values found. That's `10 * 3 * 3 = 90` different combinations. I am not willing to write out all these[^2], so we need to start introducing some abstractions. Just imagine what a monster function you would end with! The horror!
275
276Third, why is the scan being executed in the process? This is something that should be done in the `impl Scan` instead!
277
278Let's begin the cleanup:
279
280```rust
281pub fn rescan_regions(&self, regions: &[Region], scan: Scan) -> Vec<Region> {
282    todo!()
283}
284```
285
286I already feel ten times better.
287
288Now, this method will unconditionally read the entire memory region, even if the scan or the previous candidate locations don't need it[^3]. In the worst case with a single discrete candidate location, we will be reading a very large chunk of memory when we could have read just the 4 bytes needed for the `i32`. On the bright side, if there *are* more locations in this memory region, we will get read of them at the same time[^4]. So even if we're moving more memory around all the time, it isn't *too* bad.
289
290```rust
291regions
292    .iter()
293    .flat_map(
294        |region| match self.read_memory(region.info.BaseAddress as _, region.info.RegionSize) {
295            Ok(memory) => todo!(),
296            Err(err) => {
297                eprintln!(
298                    "Failed to read {} bytes at {:?}: {}",
299                    region.info.RegionSize, region.info.BaseAddress, err,
300                );
301                None
302            }
303        },
304    )
305    .collect()
306```
307
308Great! If reading memory succeeds, we want to rerun the scan:
309
310```rust
311Ok(memory) => Some(scan.rerun(region, memory)),
312```
313
314The rerun will live inside `impl Scan`:
315
316```rust
317pub fn rerun(&self, region: &Region, memory: Vec<u8>) -> Region {
318    match self {
319        Scan::Exact(_) => self.run(region.info.clone(), memory),
320        Scan::Unknown => region.clone(),
321        Scan::Decreased => todo!(),
322    }
323}
324```
325
326An exact scan doesn't care about any previous values, so it behaves like a first scan. The first scan is done by the `run` function (it contains the implementation factored out of the `Process::scan_regions` method), which only needs the region information and the current memory chunk we just read.
327
328The unknown scan leaves the region unchanged: any value stored is still valid, because it is unknown what we're looking for.
329
330The decreased scan will have to iterate over all the candidate locations, and compare them with the current memory chunk. But this time, we'll abstract this iteration too:
331
332```rust
333impl Region {
334    fn iter_locations<'a>(
335        &'a self,
336        new_memory: &'a [u8],
337    ) -> impl Iterator<Item = (usize, i32, i32)> + 'a {
338        match &self.locations {
339            CandidateLocations::Dense { range } => range.clone().step_by(4).map(move |addr| {
340                let old = self.value_at(addr);
341                let new = i32::from_ne_bytes([
342                    new_memory[0],
343                    new_memory[1],
344                    new_memory[2],
345                    new_memory[3],
346                ]);
347                (addr, old, new)
348            }),
349            _ => todo!(),
350        }
351    }
352}
353```
354
355For a dense candidate location, we iterate over all the 4-aligned addresses (fast scan for `i32` values), and yield `(current address, old value, new value)`. This way, the `Scan` can do anything it wants with the old and new values, and if it finds a match, it can use the address.
356
357The `value_at` method will deal with all the `Value` variants:
358
359```rust
360fn value_at(&self, addr: usize) -> i32 {
361    match &self.value {
362        Value::AnyWithin(chunk) => {
363            let base = addr - self.info.BaseAddress as usize;
364            let bytes = &chunk[base..base + 4];
365            i32::from_ne_bytes([bytes[0], bytes[1], bytes[2], bytes[3]])
366        }
367        _ => todo!(),
368    }
369}
370```
371
372This way, `iter_locations` can easily use any value type. With this, we have all `enum` covered: `Scan` in `rerun`, `CandidateLocation` in `iter_locations`, and `Value` in `value_at`. Now we can add as many variants as we want, and we will only need to update a single `match` arm for each of them. Let's implement `Scan::Decreased` and try it out:
373
374```rust
375pub fn rerun(&self, region: &Region, memory: Vec<u8>) -> Region {
376    match self {
377        Scan::Decreased => Region {
378            info: region.info.clone(),
379            locations: CandidateLocations::Discrete {
380                locations: region
381                    .iter_locations(&memory)
382                    .flat_map(|(addr, old, new)| if new < old { Some(addr) } else { None })
383                    .collect(),
384            },
385            value: Value::AnyWithin(memory),
386        },,
387    }
388}
389```
390
391```
392Found 3014656 locations
393Found 223791 locations
394```
395
396Hmm… before we went down from `3014656` to `177` locations, and now we went down to `223791`. Where did we go wrong?
397
398After spending several hours on this, I can tell you where we went wrong. `iter_locations` is always accessing the memory range `0..4`, and not the right address. Here's the fix:
399
400```rust
401CandidateLocations::Dense { range } => range.clone().step_by(4).map(move |addr| {
402    let old = self.value_at(addr);
403    let base = addr - self.info.BaseAddress as usize;
404    let bytes = &new_memory[base..base + 4];
405    let new = i32::from_ne_bytes([bytes[0], bytes[1], bytes[2], bytes[3]]);
406    (addr, old, new)
407}),
408```
409
410## Going beyond
411
412Let's take a look at other possible `Scan` types. Cheat Engine supports the following initial scan types:
413
414* Exact Value
415* Bigger than…
416* Smaller than…
417* Value between…
418* Unknown initial value
419
420"Bigger than" and "Smaller than" can both be represented by "Value between", so it's pretty much just three.
421
422For subsequent scans, in addition to the scan types described above, we find:
423
424* Increased value
425* Increased value by…
426* Decreased value
427* Decreased value by…
428* Changed value
429* Unchanged value
430
431Not only does Cheat Engine provide all of these scans, but all of them can also be negated. For example, "find values that were not increased by 7". One could imagine to also support things like "increased value by range". For the increased and decreased scans, Cheat Engine also supports "at least xx%", so that if the value changed within the specified percentage interval, it will be considered.
432
433What about `CandidateLocations`? I can't tell you how Cheat Engine stores these, but I can tell you that `CandidateLocations::Discrete` can still be quite inefficient. Imagine you've started with a scan for unknown values and then ran a scan for unchanged valueus. Most values in memory will have been unchanged, but with our current implementation, we are now storing an entire `usize` address for each of these. One option would be to introduce `CandidateLocations::Sparse`, which would be a middle ground. You could implement it like `Dense` and include a vector of booleans telling you which values to consider, or go smaller and use a bitstring or bit vector. You could use a sparse vector data structure.
434
435`Value` is very much like `CandidateLocations`, except that it stores a value to compare against and not an address. Here we can either have an exact value, or an older copy of the memory. Again, keeping a copy of the entire memory chunk when all we need is a handful of values is inefficient. You could keep a mapping from addresses to values if you don't have too many. Or you could shrink and fragment the copied memory in a more optimal way. There's a lot of room for improvement!
436
437What if, despite all of the efforts above, we still don't have enough RAM to store all this information? The Cheat Engine Tutorial doesn't use a lot of memory, but as soon as you try scanning bigger programs, like games, you may find yourself needing several gigabytes worth of memory to remember all the found values in order to compare them in subsequent scans. You may even need to consider dumping all the regions to a file and read from it to run the comparisons. For example, running a scan for "unknown value" in Cheat Engine brings its memory up by the same amount of memory used by the process scanned (which makes sense), but as soon as I ran a scan for "unchanged value" over the misaligned values, Cheat Engine's disk usage skyrocketed to 1GB/s (!) for several seconds on my SSD. After it finished, memory usage went down to normal. It was very likely writing out all candidate locations to disk.
438
439## Finale
440
441There is a lot of things to learn from Cheat Engine just by observing its behaviour, and we're only scratching its surface.
442
443In the [next post](/blog/woce-4), we'll tackle the fourth step of the tutorial: Floating points. So far, we have only been working with `i32` for simplicity. We will need to update our code to be able to account for different data types, which will make it easy to support other types like `i16`, `i64`, or even strings, represented as an arbitrary sequence of bytes.
444
445As usual, you can [obtain the code for this post][code] over at my GitHub. You can run `git checkout step3` after cloning the repository to get the right version of the code. This version is a bit cleaner than the one presented in the blog, and contains some of the things described in the [Going beyond](#going-beyond) section. Until next time!
446
447### Footnotes
448
449[^1]: Well, technically, we will perform a million memory reads[^5]. The issue here is the million calls to `ReadProcessMemory`, not reading memory per se.
450
451[^2]: Not currently. After a basic implementation works, writing each implementation by hand and fine-tuning them by treating each of them as a special case could yield significant speed improvements. So although it would be a lot of work, this option shouldn't be ruled out completely.
452
453[^3]: You could ask the candidate locations where one should read, which would still keep the code reasonably simple.
454
455[^4]: You could also optimize for this case by determining both the smallest and largest address, and reading enough to cover them both. Or apply additional heuristics to only do so if the ratio of the size you're reading compared to the size you need isn't too large and abort the joint read otherwise. There is a lot of room for optimization here.
456
457[^5]: (A footnote in a footnote?) The machine registers, memory cache and compiler will all help lower this cost, so the generated executable might not actually need that many reads from RAM. But that's getting way too deep into the details now.
458
459[code]: https://github.com/lonami/memo