content/blog/woce-3.md (view raw)
1+++
2title = "Writing our own Cheat Engine: Unknown initial value"
3date = 2021-02-19
4updated = 2021-02-19
5[taxonomies]
6category = ["sw"]
7tags = ["windows", "rust", "hacking"]
8+++
9
10This is part 3 on the *Writing our own Cheat Engine* series:
11
12* [Part 1: Introduction](/blog/woce-1) (start here if you're new to the series!)
13* [Part 2: Exact Value scanning](/blog/woce-2)
14* Part 3: Unknown initial value
15* [Part 4: Floating points](/blog/woce-4)
16
17In part 2 we left off with a bit of a cliff-hanger. Our little program is now able to scan for an exact value, remember the couple hundred addresses pointing to said value, and perform subsequent scans to narrow the list of addresses down until we're left with a handful of them.
18
19However, it is not always the case that you have an exact value to work with. The best you can do in these cases is guess what the software might be storing. For example, it could be a floating point for your current movement speed in a game, or an integer for your current health.
20
21The problem with this is that there are far too many possible locations storing our desired value. If you count misaligned locations, this means there is a different location to address every single byte in memory. A program with one megabyte of memory already has a *million* of addresses. Clearly, we need to do better than performing one million memory reads[^1].
22
23This post will shift focus a bit from using `winapi` to possible techniques to perform the various scans.
24
25## Unknown initial value
26
27<details open><summary>Cheat Engine Tutorial: Step 3</summary>
28
29> Ok, seeing that you've figured out how to find a value using exact value let's move on to the next step.
30>
31> First things first though. Since you are doing a new scan, you have to click on New Scan first, to start a new scan. (You may think this is straighforward, but you'd be surprised how many people get stuck on that step) I won't be explaining this step again, so keep this in mind
32> Now that you've started a new scan, let's continue
33>
34> In the previous test we knew the initial value so we could do a exact value, but now we have a status bar where we don't know the starting value.
35> We only know that the value is between 0 and 500. And each time you click 'hit me' you lose some health. The amount you lose each time is shown above the status bar.
36>
37> Again there are several different ways to find the value. (like doing a decreased value by... scan), but I'll only explain the easiest. "Unknown initial value", and decreased value.
38> Because you don't know the value it is right now, a exact value wont do any good, so choose as scantype 'Unknown initial value', again, the value type is 4-bytes. (most windows apps use 4-bytes)click first scan and wait till it's done.
39>
40> When it is done click 'hit me'. You'll lose some of your health. (the amount you lost shows for a few seconds and then disappears, but you don't need that)
41> Now go to Cheat Engine, and choose 'Decreased Value' and click 'Next Scan'
42> When that scan is done, click hit me again, and repeat the above till you only find a few.
43>
44> We know the value is between 0 and 500, so pick the one that is most likely the address we need, and add it to the list.
45> Now change the health to 5000, to proceed to the next step.
46
47</details>
48
49## Dense memory locations
50
51The key thing to notice here is that, when we read memory from another process, we do so over *entire regions*. A memory region is represented by a starting offset, a size, and a bunch of other things like protection level.
52
53When running the first scan for an unknown value, all we need to remember is the starting offset and size for every single region. All the candidate locations that could point to our value fall within this range, so it is enough for us to store the range definition, and not every location within it.
54
55To gain a better understanding of what this means, let's come up with a more specific scenario. With our current approach of doing things, we store an address (`usize`) for every location pointing to our desired value. In the case of unknown values, all locations are equally valid, since we don't know what value they should point to yet, and any value they point to is good. With this representation, we would end up with a very large vector:
56
57```rust
58let locations = vec![0x2000, 0x2001, ..., 0x20ff, 0x2100];
59```
60
61This representation is dense. Every single number in the range `0x2000..=0x2100` is present. So why bother storing the values individually when the range is enough?:
62
63```rust
64let locations = EntireRegion { range: 0x2000..=0x2100 };
65```
66
67Much better! With two `usize`, one for the starting location and another for the end, we can indicate that we care about all the locations falling in that range.
68
69In fact, some accessible memory regions immediately follow eachother, so we could even compact this further and merge regions which are together. But due to their potential differences with regards to protection levels, we will not attempt to merge regions.
70
71We don't want to get rid of the old way of storing locations, because once we start narrowing them down, we will want to go back to storing just a few candidates. To keep things tidy, let's introduce a new `enum` representing either possibility:
72
73```rust
74use std::ops::Range;
75
76pub enum CandidateLocations {
77 Discrete {
78 locations: Vec<usize>,
79 },
80 Dense {
81 range: Range<usize>,
82 }
83}
84```
85
86Let's also introduce another `enum` to perform the different scan types. For the time being, we will only worry about looking for `i32` in memory:
87
88```rust
89pub enum Scan {
90 Exact(i32),
91 Unknown,
92}
93```
94
95## Storing scanned values
96
97When scanning for exact values, it's not necessary to store the value found. We already know they're all the same, for example, value `42`. However, if the value is unknown, we do need to store it so that we can compare it in a subsequent scan to see if the value is the same or it changed. This means the value can be "any within" the read memory chunk:
98
99```rust
100pub enum Value {
101 Exact(i32),
102 AnyWithin(Vec<u8>),
103}
104```
105
106For every region in memory, there will be some candidate locations and a value (or value range) we need to compare against in subsequent scans:
107
108```rust
109pub struct Region {
110 pub info: winapi::um::winnt::MEMORY_BASIC_INFORMATION,
111 pub locations: CandidateLocations,
112 pub value: Value,
113}
114```
115
116With all the data structures needed setup, we can finally refactor our old scanning code into a new method capable of dealing with all these cases. For brevity, I will omit the exact scan, as it remains mostly unchanged:
117
118```rust
119use winapi::um::winnt::MEMORY_BASIC_INFORMATION;
120
121...
122
123// inside `impl Process`
124pub fn scan_regions(&self, regions: &[MEMORY_BASIC_INFORMATION], scan: Scan) -> Vec<Region> {
125 regions
126 .iter()
127 .flat_map(|region| match scan {
128 Scan::Exact(n) => todo!("old scan implementation"),
129 Scan::Unknown => {
130 let base = region.BaseAddress as usize;
131 match self.read_memory(region.BaseAddress as _, region.RegionSize) {
132 Ok(memory) => Some(Region {
133 info: region.clone(),
134 locations: CandidateLocations::Dense {
135 range: base..base + region.RegionSize,
136 },
137 value: Value::AnyWithin(memory),
138 }),
139 Err(_) => None,
140 }
141 }
142 })
143 .collect()
144}
145```
146
147Time to try it out!
148
149```rust
150impl CandidateLocations {
151 pub fn len(&self) -> usize {
152 match self {
153 CandidateLocations::Discrete { locations } => locations.len(),
154 CandidateLocations::Dense { range } => range.len(),
155 }
156 }
157}
158
159...
160
161fn main() {
162 // -snip-
163
164 println!("Scanning {} memory regions", regions.len());
165 let last_scan = process.scan_regions(®ions, Scan::Unknown);
166 println!(
167 "Found {} locations",
168 last_scan.iter().map(|r| r.locations.len()).sum::<usize>()
169 );
170}
171```
172
173```
174Scanning 88 memory regions
175Found 3014656 locations
176```
177
178If we consider misaligned locations, there is a lot of potential addresses where we could look for. Running the same scan on Cheat Engine yields `2,449,408` addresses, which is pretty close. It's probably skipping some additional regions that we are considering. Emulating Cheat Engine to perfection is not a concern for us at the moment, so I'm not going to investigate what regions it actually uses.
179
180## Comparing scanned values
181
182Now that we have performed the initial scan and have stored all the `CandidateLocations` and `Value`, we can re-implement the "next scan" step to handle any variant of our `Scan` enum. This enables us to mix-and-match any `Scan` mode in any order. For example, one could perform an exact scan, then one for decreased values, or start with unknown scan and scan for unchanged values.
183
184The tutorial suggests using "decreased value" scan, so let's start with that:
185
186```rust
187pub enum Scan {
188 Exact(i32),
189 Unknown,
190 Decreased, // new!
191}
192```
193
194Other scanning modes, such as decreased by a known amount rather than any decrease, increased, unchanged, changed and so on, are not very different from the "decreased" scan, so I won't bore you with the details.
195
196I will use a different method to perform a "rescan", since the first one is a bit more special in that it doesn't start with any previous values:
197
198```rust
199pub fn rescan_regions(&self, regions: &[Region], scan: Scan) -> Vec<Region> {
200 regions
201 .iter()
202 .flat_map(|region| match scan {
203 Scan::Decreased => {
204 let mut locations = Vec::new();
205 match region.locations {
206 CandidateLocations::Dense { range } => {
207 match self.read_memory(range.start, range.end - range.start) {
208 Ok(memory) => match region.value {
209 Value::AnyWithin(previous) => {
210 memory
211 .windows(4)
212 .zip(previous.windows(4))
213 .enumerate()
214 .step_by(4)
215 .for_each(|(offset, (new, old))| {
216 let new = i32::from_ne_bytes([
217 new[0], new[1], new[2], new[3],
218 ]);
219 let old = i32::from_ne_bytes([
220 old[0], old[1], old[2], old[3],
221 ]);
222 if new < old {
223 locations.push(range.start + offset);
224 }
225 });
226
227 Some(Region {
228 info: region.info.clone(),
229 locations: CandidateLocations::Discrete { locations },
230 value: Value::AnyWithin(memory),
231 })
232 }
233 _ => todo!(),
234 },
235 _ => todo!(),
236 }
237 }
238 _ => todo!(),
239 }
240 }
241 _ => todo!(),
242 })
243 .collect()
244}
245```
246
247If you've skimmed over that, I do not blame you. Here's the summary: for every existing region, when executing the scan mode "decreased", if the previous locations were dense, read the entire memory region. On success, if the previous values were a chunk of memory, iterate over the current and old memory at the same time, and for every aligned `i32`, if the new value is less, store it.
248
249It's also making me ill. Before I leave a mess on the floor, does it work?
250
251```rust
252std::thread::sleep(std::time::Duration::from_secs(10));
253let last_scan = process.rescan_regions(&last_scan, Scan::Decreased);
254println!(
255 "Found {} locations",
256 last_scan.iter().map(|r| r.locations.len()).sum::<usize>()
257);
258```
259
260```rust
261Found 3014656 locations
262Found 177 locations
263```
264
265Okay, great, let's clean up this mess…
266
267## Refactoring
268
269Does it also make you uncomfortable to be writing something that you know will end up *huge* unless you begin refactoring other parts right now? I definitely feel that way. But I think it's good discipline to push through with something that works first, even if it's nasty, before going on a tangent. Now that we have the basic implementation working, let's take on this monster before it eats us alive.
270
271First things first, that method is inside an `impl` block. The deepest nesting level is 13. I almost have to turn around my chair to read the entire thing out!
272
273Second, we're nesting four matches. Three of them we care about: scan, candidate location, and value. If each of these `enum` has `S`, `C` and `V` variants respectively, writing each of these by hand will require `S * C * V` different implementations! Cheat Engine offers 10 different scans, I can think of at least 3 different ways to store candidate locations, and another 3 ways to store the values found. That's `10 * 3 * 3 = 90` different combinations. I am not willing to write out all these[^2], so we need to start introducing some abstractions. Just imagine what a monster function you would end with! The horror!
274
275Third, why is the scan being executed in the process? This is something that should be done in the `impl Scan` instead!
276
277Let's begin the cleanup:
278
279```rust
280pub fn rescan_regions(&self, regions: &[Region], scan: Scan) -> Vec<Region> {
281 todo!()
282}
283```
284
285I already feel ten times better.
286
287Now, this method will unconditionally read the entire memory region, even if the scan or the previous candidate locations don't need it[^3]. In the worst case with a single discrete candidate location, we will be reading a very large chunk of memory when we could have read just the 4 bytes needed for the `i32`. On the bright side, if there *are* more locations in this memory region, we will get read of them at the same time[^4]. So even if we're moving more memory around all the time, it isn't *too* bad.
288
289```rust
290regions
291 .iter()
292 .flat_map(
293 |region| match self.read_memory(region.info.BaseAddress as _, region.info.RegionSize) {
294 Ok(memory) => todo!(),
295 Err(err) => {
296 eprintln!(
297 "Failed to read {} bytes at {:?}: {}",
298 region.info.RegionSize, region.info.BaseAddress, err,
299 );
300 None
301 }
302 },
303 )
304 .collect()
305```
306
307Great! If reading memory succeeds, we want to rerun the scan:
308
309```rust
310Ok(memory) => Some(scan.rerun(region, memory)),
311```
312
313The rerun will live inside `impl Scan`:
314
315```rust
316pub fn rerun(&self, region: &Region, memory: Vec<u8>) -> Region {
317 match self {
318 Scan::Exact(_) => self.run(region.info.clone(), memory),
319 Scan::Unknown => region.clone(),
320 Scan::Decreased => todo!(),
321 }
322}
323```
324
325An exact scan doesn't care about any previous values, so it behaves like a first scan. The first scan is done by the `run` function (it contains the implementation factored out of the `Process::scan_regions` method), which only needs the region information and the current memory chunk we just read.
326
327The unknown scan leaves the region unchanged: any value stored is still valid, because it is unknown what we're looking for.
328
329The decreased scan will have to iterate over all the candidate locations, and compare them with the current memory chunk. But this time, we'll abstract this iteration too:
330
331```rust
332impl Region {
333 fn iter_locations<'a>(
334 &'a self,
335 new_memory: &'a [u8],
336 ) -> impl Iterator<Item = (usize, i32, i32)> + 'a {
337 match &self.locations {
338 CandidateLocations::Dense { range } => range.clone().step_by(4).map(move |addr| {
339 let old = self.value_at(addr);
340 let new = i32::from_ne_bytes([
341 new_memory[0],
342 new_memory[1],
343 new_memory[2],
344 new_memory[3],
345 ]);
346 (addr, old, new)
347 }),
348 _ => todo!(),
349 }
350 }
351}
352```
353
354For a dense candidate location, we iterate over all the 4-aligned addresses (fast scan for `i32` values), and yield `(current address, old value, new value)`. This way, the `Scan` can do anything it wants with the old and new values, and if it finds a match, it can use the address.
355
356The `value_at` method will deal with all the `Value` variants:
357
358```rust
359fn value_at(&self, addr: usize) -> i32 {
360 match &self.value {
361 Value::AnyWithin(chunk) => {
362 let base = addr - self.info.BaseAddress as usize;
363 let bytes = &chunk[base..base + 4];
364 i32::from_ne_bytes([bytes[0], bytes[1], bytes[2], bytes[3]])
365 }
366 _ => todo!(),
367 }
368}
369```
370
371This way, `iter_locations` can easily use any value type. With this, we have all `enum` covered: `Scan` in `rerun`, `CandidateLocation` in `iter_locations`, and `Value` in `value_at`. Now we can add as many variants as we want, and we will only need to update a single `match` arm for each of them. Let's implement `Scan::Decreased` and try it out:
372
373```rust
374pub fn rerun(&self, region: &Region, memory: Vec<u8>) -> Region {
375 match self {
376 Scan::Decreased => Region {
377 info: region.info.clone(),
378 locations: CandidateLocations::Discrete {
379 locations: region
380 .iter_locations(&memory)
381 .flat_map(|(addr, old, new)| if new < old { Some(addr) } else { None })
382 .collect(),
383 },
384 value: Value::AnyWithin(memory),
385 },,
386 }
387}
388```
389
390```
391Found 3014656 locations
392Found 223791 locations
393```
394
395Hmm… before we went down from `3014656` to `177` locations, and now we went down to `223791`. Where did we go wrong?
396
397After spending several hours on this, I can tell you where we went wrong. `iter_locations` is always accessing the memory range `0..4`, and not the right address. Here's the fix:
398
399```rust
400CandidateLocations::Dense { range } => range.clone().step_by(4).map(move |addr| {
401 let old = self.value_at(addr);
402 let base = addr - self.info.BaseAddress as usize;
403 let bytes = &new_memory[base..base + 4];
404 let new = i32::from_ne_bytes([bytes[0], bytes[1], bytes[2], bytes[3]]);
405 (addr, old, new)
406}),
407```
408
409## Going beyond
410
411Let's take a look at other possible `Scan` types. Cheat Engine supports the following initial scan types:
412
413* Exact Value
414* Bigger than…
415* Smaller than…
416* Value between…
417* Unknown initial value
418
419"Bigger than" and "Smaller than" can both be represented by "Value between", so it's pretty much just three.
420
421For subsequent scans, in addition to the scan types described above, we find:
422
423* Increased value
424* Increased value by…
425* Decreased value
426* Decreased value by…
427* Changed value
428* Unchanged value
429
430Not only does Cheat Engine provide all of these scans, but all of them can also be negated. For example, "find values that were not increased by 7". One could imagine to also support things like "increased value by range". For the increased and decreased scans, Cheat Engine also supports "at least xx%", so that if the value changed within the specified percentage interval, it will be considered.
431
432What about `CandidateLocations`? I can't tell you how Cheat Engine stores these, but I can tell you that `CandidateLocations::Discrete` can still be quite inefficient. Imagine you've started with a scan for unknown values and then ran a scan for unchanged valueus. Most values in memory will have been unchanged, but with our current implementation, we are now storing an entire `usize` address for each of these. One option would be to introduce `CandidateLocations::Sparse`, which would be a middle ground. You could implement it like `Dense` and include a vector of booleans telling you which values to consider, or go smaller and use a bitstring or bit vector. You could use a sparse vector data structure.
433
434`Value` is very much like `CandidateLocations`, except that it stores a value to compare against and not an address. Here we can either have an exact value, or an older copy of the memory. Again, keeping a copy of the entire memory chunk when all we need is a handful of values is inefficient. You could keep a mapping from addresses to values if you don't have too many. Or you could shrink and fragment the copied memory in a more optimal way. There's a lot of room for improvement!
435
436What if, despite all of the efforts above, we still don't have enough RAM to store all this information? The Cheat Engine Tutorial doesn't use a lot of memory, but as soon as you try scanning bigger programs, like games, you may find yourself needing several gigabytes worth of memory to remember all the found values in order to compare them in subsequent scans. You may even need to consider dumping all the regions to a file and read from it to run the comparisons. For example, running a scan for "unknown value" in Cheat Engine brings its memory up by the same amount of memory used by the process scanned (which makes sense), but as soon as I ran a scan for "unchanged value" over the misaligned values, Cheat Engine's disk usage skyrocketed to 1GB/s (!) for several seconds on my SSD. After it finished, memory usage went down to normal. It was very likely writing out all candidate locations to disk.
437
438## Finale
439
440There is a lot of things to learn from Cheat Engine just by observing its behaviour, and we're only scratching its surface.
441
442In the [next post](/blog/woce-4), we'll tackle the fourth step of the tutorial: Floating points. So far, we have only been working with `i32` for simplicity. We will need to update our code to be able to account for different data types, which will make it easy to support other types like `i16`, `i64`, or even strings, represented as an arbitrary sequence of bytes.
443
444As usual, you can [obtain the code for this post][code] over at my GitHub. You can run `git checkout step3` after cloning the repository to get the right version of the code. This version is a bit cleaner than the one presented in the blog, and contains some of the things described in the [Going beyond](#going-beyond) section. Until next time!
445
446### Footnotes
447
448[^1]: Well, technically, we will perform a million memory reads[^5]. The issue here is the million calls to `ReadProcessMemory`, not reading memory per se.
449
450[^2]: Not currently. After a basic implementation works, writing each implementation by hand and fine-tuning them by treating each of them as a special case could yield significant speed improvements. So although it would be a lot of work, this option shouldn't be ruled out completely.
451
452[^3]: You could ask the candidate locations where one should read, which would still keep the code reasonably simple.
453
454[^4]: You could also optimize for this case by determining both the smallest and largest address, and reading enough to cover them both. Or apply additional heuristics to only do so if the ratio of the size you're reading compared to the size you need isn't too large and abort the joint read otherwise. There is a lot of room for optimization here.
455
456[^5]: (A footnote in a footnote?) The machine registers, memory cache and compiler will all help lower this cost, so the generated executable might not actually need that many reads from RAM. But that's getting way too deep into the details now.
457
458[code]: https://github.com/lonami/memo