content/blog/woce-3.md (view raw)
1+++
2title = "Writing our own Cheat Engine: Unknown initial value"
3date = 2021-02-19
4updated = 2021-02-19
5[taxonomies]
6category = ["sw"]
7tags = ["windows", "rust", "hacking"]
8+++
9
10This is part 3 on the *Writing our own Cheat Engine* series:
11
12* [Part 1: Introduction](/blog/woce-1) (start here if you're new to the series!)
13* [Part 2: Exact Value scanning](/blog/woce-2)
14* Part 3: Unknown initial value
15
16In part 2 we left off with a bit of a cliff-hanger. Our little program is now able to scan for an exact value, remember the couple hundred addresses pointing to said value, and perform subsequent scans to narrow the list of addresses down until we're left with a handful of them.
17
18However, it is not always the case that you have an exact value to work with. The best you can do in these cases is guess what the software might be storing. For example, it could be a floating point for your current movement speed in a game, or an integer for your current health.
19
20The problem with this is that there are far too many possible locations storing our desired value. If you count misaligned locations, this means there is a different location to address every single byte in memory. A program with one megabyte of memory already has a *million* of addresses. Clearly, we need to do better than performing one million memory reads[^1].
21
22This post will shift focus a bit from using `winapi` to possible techniques to perform the various scans.
23
24## Unknown initial value
25
26<details open><summary>Cheat Engine Tutorial: Step 3</summary>
27
28> Ok, seeing that you've figured out how to find a value using exact value let's move on to the next step.
29>
30> First things first though. Since you are doing a new scan, you have to click on New Scan first, to start a new scan. (You may think this is straighforward, but you'd be surprised how many people get stuck on that step) I won't be explaining this step again, so keep this in mind
31> Now that you've started a new scan, let's continue
32>
33> In the previous test we knew the initial value so we could do a exact value, but now we have a status bar where we don't know the starting value.
34> We only know that the value is between 0 and 500. And each time you click 'hit me' you lose some health. The amount you lose each time is shown above the status bar.
35>
36> Again there are several different ways to find the value. (like doing a decreased value by... scan), but I'll only explain the easiest. "Unknown initial value", and decreased value.
37> Because you don't know the value it is right now, a exact value wont do any good, so choose as scantype 'Unknown initial value', again, the value type is 4-bytes. (most windows apps use 4-bytes)click first scan and wait till it's done.
38>
39> When it is done click 'hit me'. You'll lose some of your health. (the amount you lost shows for a few seconds and then disappears, but you don't need that)
40> Now go to Cheat Engine, and choose 'Decreased Value' and click 'Next Scan'
41> When that scan is done, click hit me again, and repeat the above till you only find a few.
42>
43> We know the value is between 0 and 500, so pick the one that is most likely the address we need, and add it to the list.
44> Now change the health to 5000, to proceed to the next step.
45
46</details>
47
48## Dense memory locations
49
50The key thing to notice here is that, when we read memory from another process, we do so over *entire regions*. A memory region is represented by a starting offset, a size, and a bunch of other things like protection level.
51
52When running the first scan for an unknown value, all we need to remember is the starting offset and size for every single region. All the candidate locations that could point to our value fall within this range, so it is enough for us to store the range definition, and not every location within it.
53
54To gain a better understanding of what this means, let's come up with a more specific scenario. With our current approach of doing things, we store an address (`usize`) for every location pointing to our desired value. In the case of unknown values, all locations are equally valid, since we don't know what value they should point to yet, and any value they point to is good. With this representation, we would end up with a very large vector:
55
56```rust
57let locations = vec![0x2000, 0x2001, ..., 0x20ff, 0x2100];
58```
59
60This representation is dense. Every single number in the range `0x2000..=0x2100` is present. So why bother storing the values individually when the range is enough?:
61
62```rust
63let locations = EntireRegion { range: 0x2000..=0x2100 };
64```
65
66Much better! With two `usize`, one for the starting location and another for the end, we can indicate that we care about all the locations falling in that range.
67
68In fact, some accessible memory regions immediately follow eachother, so we could even compact this further and merge regions which are together. But due to their potential differences with regards to protection levels, we will not attempt to merge regions.
69
70We don't want to get rid of the old way of storing locations, because once we start narrowing them down, we will want to go back to storing just a few candidates. To keep things tidy, let's introduce a new `enum` representing either possibility:
71
72```rust
73use std::ops::Range;
74
75pub enum CandidateLocations {
76 Discrete {
77 locations: Vec<usize>,
78 },
79 Dense {
80 range: Range<usize>,
81 }
82}
83```
84
85Let's also introduce another `enum` to perform the different scan types. For the time being, we will only worry about looking for `i32` in memory:
86
87```rust
88pub enum Scan {
89 Exact(i32),
90 Unknown,
91}
92```
93
94## Storing scanned values
95
96When scanning for exact values, it's not necessary to store the value found. We already know they're all the same, for example, value `42`. However, if the value is unknown, we do need to store it so that we can compare it in a subsequent scan to see if the value is the same or it changed. This means the value can be "any within" the read memory chunk:
97
98```rust
99pub enum Value {
100 Exact(i32),
101 AnyWithin(Vec<u8>),
102}
103```
104
105For every region in memory, there will be some candidate locations and a value (or value range) we need to compare against in subsequent scans:
106
107```rust
108pub struct Region {
109 pub info: winapi::um::winnt::MEMORY_BASIC_INFORMATION,
110 pub locations: CandidateLocations,
111 pub value: Value,
112}
113```
114
115With all the data structures needed setup, we can finally refactor our old scanning code into a new method capable of dealing with all these cases. For brevity, I will omit the exact scan, as it remains mostly unchanged:
116
117```rust
118use winapi::um::winnt::MEMORY_BASIC_INFORMATION;
119
120...
121
122// inside `impl Process`
123pub fn scan_regions(&self, regions: &[MEMORY_BASIC_INFORMATION], scan: Scan) -> Vec<Region> {
124 regions
125 .iter()
126 .flat_map(|region| match scan {
127 Scan::Exact(n) => todo!("old scan implementation"),
128 Scan::Unknown => {
129 let base = region.BaseAddress as usize;
130 match self.read_memory(region.BaseAddress as _, region.RegionSize) {
131 Ok(memory) => Some(Region {
132 info: region.clone(),
133 locations: CandidateLocations::Dense {
134 range: base..base + region.RegionSize,
135 },
136 value: Value::AnyWithin(memory),
137 }),
138 Err(_) => None,
139 }
140 }
141 })
142 .collect()
143}
144```
145
146Time to try it out!
147
148```rust
149impl CandidateLocations {
150 pub fn len(&self) -> usize {
151 match self {
152 CandidateLocations::Discrete { locations } => locations.len(),
153 CandidateLocations::Dense { range } => range.len(),
154 }
155 }
156}
157
158...
159
160fn main() {
161 // -snip-
162
163 println!("Scanning {} memory regions", regions.len());
164 let last_scan = process.scan_regions(®ions, Scan::Unknown);
165 println!(
166 "Found {} locations",
167 last_scan.iter().map(|r| r.locations.len()).sum::<usize>()
168 );
169}
170```
171
172```
173Scanning 88 memory regions
174Found 3014656 locations
175```
176
177If we consider misaligned locations, there is a lot of potential addresses where we could look for. Running the same scan on Cheat Engine yields `2,449,408` addresses, which is pretty close. It's probably skipping some additional regions that we are considering. Emulating Cheat Engine to perfection is not a concern for us at the moment, so I'm not going to investigate what regions it actually uses.
178
179## Comparing scanned values
180
181Now that we have performed the initial scan and have stored all the `CandidateLocations` and `Value`, we can re-implement the "next scan" step to handle any variant of our `Scan` enum. This enables us to mix-and-match any `Scan` mode in any order. For example, one could perform an exact scan, then one for decreased values, or start with unknown scan and scan for unchanged values.
182
183The tutorial suggests using "decreased value" scan, so let's start with that:
184
185```rust
186pub enum Scan {
187 Exact(i32),
188 Unknown,
189 Decreased, // new!
190}
191```
192
193Other scanning modes, such as decreased by a known amount rather than any decrease, increased, unchanged, changed and so on, are not very different from the "decreased" scan, so I won't bore you with the details.
194
195I will use a different method to perform a "rescan", since the first one is a bit more special in that it doesn't start with any previous values:
196
197```rust
198pub fn rescan_regions(&self, regions: &[Region], scan: Scan) -> Vec<Region> {
199 regions
200 .iter()
201 .flat_map(|region| match scan {
202 Scan::Decreased => {
203 let mut locations = Vec::new();
204 match region.locations {
205 CandidateLocations::Dense { range } => {
206 match self.read_memory(range.start, range.end - range.start) {
207 Ok(memory) => match region.value {
208 Value::AnyWithin(previous) => {
209 memory
210 .windows(4)
211 .zip(previous.windows(4))
212 .enumerate()
213 .step_by(4)
214 .for_each(|(offset, (new, old))| {
215 let new = i32::from_ne_bytes([
216 new[0], new[1], new[2], new[3],
217 ]);
218 let old = i32::from_ne_bytes([
219 old[0], old[1], old[2], old[3],
220 ]);
221 if new < old {
222 locations.push(range.start + offset);
223 }
224 });
225
226 Some(Region {
227 info: region.info.clone(),
228 locations: CandidateLocations::Discrete { locations },
229 value: Value::AnyWithin(memory),
230 })
231 }
232 _ => todo!(),
233 },
234 _ => todo!(),
235 }
236 }
237 _ => todo!(),
238 }
239 }
240 _ => todo!(),
241 })
242 .collect()
243}
244```
245
246If you've skimmed over that, I do not blame you. Here's the summary: for every existing region, when executing the scan mode "decreased", if the previous locations were dense, read the entire memory region. On success, if the previous values were a chunk of memory, iterate over the current and old memory at the same time, and for every aligned `i32`, if the new value is less, store it.
247
248It's also making me ill. Before I leave a mess on the floor, does it work?
249
250```rust
251std::thread::sleep(std::time::Duration::from_secs(10));
252let last_scan = process.rescan_regions(&last_scan, Scan::Decreased);
253println!(
254 "Found {} locations",
255 last_scan.iter().map(|r| r.locations.len()).sum::<usize>()
256);
257```
258
259```rust
260Found 3014656 locations
261Found 177 locations
262```
263
264Okay, great, let's clean up this mess…
265
266## Refactoring
267
268Does it also make you uncomfortable to be writing something that you know will end up *huge* unless you begin refactoring other parts right now? I definitely feel that way. But I think it's good discipline to push through with something that works first, even if it's nasty, before going on a tangent. Now that we have the basic implementation working, let's take on this monster before it eats us alive.
269
270First things first, that method is inside an `impl` block. The deepest nesting level is 13. I almost have to turn around my chair to read the entire thing out!
271
272Second, we're nesting four matches. Three of them we care about: scan, candidate location, and value. If each of these `enum` has `S`, `C` and `V` variants respectively, writing each of these by hand will require `S * C * V` different implementations! Cheat Engine offers 10 different scans, I can think of at least 3 different ways to store candidate locations, and another 3 ways to store the values found. That's `10 * 3 * 3 = 90` different combinations. I am not willing to write out all these[^2], so we need to start introducing some abstractions. Just imagine what a monster function you would end with! The horror!
273
274Third, why is the scan being executed in the process? This is something that should be done in the `impl Scan` instead!
275
276Let's begin the cleanup:
277
278```rust
279pub fn rescan_regions(&self, regions: &[Region], scan: Scan) -> Vec<Region> {
280 todo!()
281}
282```
283
284I already feel ten times better.
285
286Now, this method will unconditionally read the entire memory region, even if the scan or the previous candidate locations don't need it[^3]. In the worst case with a single discrete candidate location, we will be reading a very large chunk of memory when we could have read just the 4 bytes needed for the `i32`. On the bright side, if there *are* more locations in this memory region, we will get read of them at the same time[^4]. So even if we're moving more memory around all the time, it isn't *too* bad.
287
288```rust
289regions
290 .iter()
291 .flat_map(
292 |region| match self.read_memory(region.info.BaseAddress as _, region.info.RegionSize) {
293 Ok(memory) => todo!(),
294 Err(err) => {
295 eprintln!(
296 "Failed to read {} bytes at {:?}: {}",
297 region.info.RegionSize, region.info.BaseAddress, err,
298 );
299 None
300 }
301 },
302 )
303 .collect()
304```
305
306Great! If reading memory succeeds, we want to rerun the scan:
307
308```rust
309Ok(memory) => Some(scan.rerun(region, memory)),
310```
311
312The rerun will live inside `impl Scan`:
313
314```rust
315pub fn rerun(&self, region: &Region, memory: Vec<u8>) -> Region {
316 match self {
317 Scan::Exact(_) => self.run(region.info.clone(), memory),
318 Scan::Unknown => region.clone(),
319 Scan::Decreased => todo!(),
320 }
321}
322```
323
324An exact scan doesn't care about any previous values, so it behaves like a first scan. The first scan is done by the `run` function (it contains the implementation factored out of the `Process::scan_regions` method), which only needs the region information and the current memory chunk we just read.
325
326The unknown scan leaves the region unchanged: any value stored is still valid, because it is unknown what we're looking for.
327
328The decreased scan will have to iterate over all the candidate locations, and compare them with the current memory chunk. But this time, we'll abstract this iteration too:
329
330```rust
331impl Region {
332 fn iter_locations<'a>(
333 &'a self,
334 new_memory: &'a [u8],
335 ) -> impl Iterator<Item = (usize, i32, i32)> + 'a {
336 match &self.locations {
337 CandidateLocations::Dense { range } => range.clone().step_by(4).map(move |addr| {
338 let old = self.value_at(addr);
339 let new = i32::from_ne_bytes([
340 new_memory[0],
341 new_memory[1],
342 new_memory[2],
343 new_memory[3],
344 ]);
345 (addr, old, new)
346 }),
347 _ => todo!(),
348 }
349 }
350}
351```
352
353For a dense candidate location, we iterate over all the 4-aligned addresses (fast scan for `i32` values), and yield `(current address, old value, new value)`. This way, the `Scan` can do anything it wants with the old and new values, and if it finds a match, it can use the address.
354
355The `value_at` method will deal with all the `Value` variants:
356
357```rust
358fn value_at(&self, addr: usize) -> i32 {
359 match &self.value {
360 Value::AnyWithin(chunk) => {
361 let base = addr - self.info.BaseAddress as usize;
362 let bytes = &chunk[base..base + 4];
363 i32::from_ne_bytes([bytes[0], bytes[1], bytes[2], bytes[3]])
364 }
365 _ => todo!(),
366 }
367}
368```
369
370This way, `iter_locations` can easily use any value type. With this, we have all `enum` covered: `Scan` in `rerun`, `CandidateLocation` in `iter_locations`, and `Value` in `value_at`. Now we can add as many variants as we want, and we will only need to update a single `match` arm for each of them. Let's implement `Scan::Decreased` and try it out:
371
372```rust
373pub fn rerun(&self, region: &Region, memory: Vec<u8>) -> Region {
374 match self {
375 Scan::Decreased => Region {
376 info: region.info.clone(),
377 locations: CandidateLocations::Discrete {
378 locations: region
379 .iter_locations(&memory)
380 .flat_map(|(addr, old, new)| if new < old { Some(addr) } else { None })
381 .collect(),
382 },
383 value: Value::AnyWithin(memory),
384 },,
385 }
386}
387```
388
389```
390Found 3014656 locations
391Found 223791 locations
392```
393
394Hmm… before we went down from `3014656` to `177` locations, and now we went down to `223791`. Where did we go wrong?
395
396After spending several hours on this, I can tell you where we went wrong. `iter_locations` is always accessing the memory range `0..4`, and not the right address. Here's the fix:
397
398```rust
399CandidateLocations::Dense { range } => range.clone().step_by(4).map(move |addr| {
400 let old = self.value_at(addr);
401 let base = addr - self.info.BaseAddress as usize;
402 let bytes = &new_memory[base..base + 4];
403 let new = i32::from_ne_bytes([bytes[0], bytes[1], bytes[2], bytes[3]]);
404 (addr, old, new)
405}),
406```
407
408## Going beyond
409
410Let's take a look at other possible `Scan` types. Cheat Engine supports the following initial scan types:
411
412* Exact Value
413* Bigger than…
414* Smaller than…
415* Value between…
416* Unknown initial value
417
418"Bigger than" and "Smaller than" can both be represented by "Value between", so it's pretty much just three.
419
420For subsequent scans, in addition to the scan types described above, we find:
421
422* Increased value
423* Increased value by…
424* Decreased value
425* Decreased value by…
426* Changed value
427* Unchanged value
428
429Not only does Cheat Engine provide all of these scans, but all of them can also be negated. For example, "find values that were not increased by 7". One could imagine to also support things like "increased value by range". For the increased and decreased scans, Cheat Engine also supports "at least xx%", so that if the value changed within the specified percentage interval, it will be considered.
430
431What about `CandidateLocations`? I can't tell you how Cheat Engine stores these, but I can tell you that `CandidateLocations::Discrete` can still be quite inefficient. Imagine you've started with a scan for unknown values and then ran a scan for unchanged valueus. Most values in memory will have been unchanged, but with our current implementation, we are now storing an entire `usize` address for each of these. One option would be to introduce `CandidateLocations::Sparse`, which would be a middle ground. You could implement it like `Dense` and include a vector of booleans telling you which values to consider, or go smaller and use a bitstring or bit vector. You could use a sparse vector data structure.
432
433`Value` is very much like `CandidateLocations`, except that it stores a value to compare against and not an address. Here we can either have an exact value, or an older copy of the memory. Again, keeping a copy of the entire memory chunk when all we need is a handful of values is inefficient. You could keep a mapping from addresses to values if you don't have too many. Or you could shrink and fragment the copied memory in a more optimal way. There's a lot of room for improvement!
434
435What if, despite all of the efforts above, we still don't have enough RAM to store all this information? The Cheat Engine Tutorial doesn't use a lot of memory, but as soon as you try scanning bigger programs, like games, you may find yourself needing several gigabytes worth of memory to remember all the found values in order to compare them in subsequent scans. You may even need to consider dumping all the regions to a file and read from it to run the comparisons. For example, running a scan for "unknown value" in Cheat Engine brings its memory up by the same amount of memory used by the process scanned (which makes sense), but as soon as I ran a scan for "unchanged value" over the misaligned values, Cheat Engine's disk usage skyrocketed to 1GB/s (!) for several seconds on my SSD. After it finished, memory usage went down to normal. It was very likely writing out all candidate locations to disk.
436
437## Finale
438
439There is a lot of things to learn from Cheat Engine just by observing its behaviour, and we're only scratching its surface.
440
441In the next post, we'll tackle the fourth step of the tutorial: Floating points. So far, we have only been working with `i32` for simplicity. We will need to update our code to be able to account for different data types, which will make it easy to support other types like `i16`, `i64`, or even strings, represented as an arbitrary sequence of bytes.
442
443As usual, you can [obtain the code for this post][code] over at my GitHub. You can run `git checkout step3` after cloning the repository to get the right version of the code. This version is a bit cleaner than the one presented in the blog, and contains some of the things described in the [Going beyond](#going-beyond) section. Until next time!
444
445### Footnotes
446
447[^1]: Well, technically, we will perform a million memory reads[^5]. The issue here is the million calls to `ReadProcessMemory`, not reading memory per se.
448
449[^2]: Not currently. After a basic implementation works, writing each implementation by hand and fine-tuning them by treating each of them as a special case could yield significant speed improvements. So although it would be a lot of work, this option shouldn't be ruled out completely.
450
451[^3]: You could ask the candidate locations where one should read, which would still keep the code reasonably simple.
452
453[^4]: You could also optimize for this case by determining both the smallest and largest address, and reading enough to cover them both. Or apply additional heuristics to only do so if the ratio of the size you're reading compared to the size you need isn't too large and abort the joint read otherwise. There is a lot of room for optimization here.
454
455[^5]: (A footnote in a footnote?) The machine registers, memory cache and compiler will all help lower this cost, so the generated executable might not actually need that many reads from RAM. But that's getting way too deep into the details now.
456
457[script-kid]: https://www.urbandictionary.com/define.php?term=script%20kiddie
458[code]: https://github.com/lonami/memo