content/blog/woce-3.md (view raw)
1+++
2title = "Writing our own Cheat Engine: Unknown initial value"
3date = 2021-02-19
4updated = 2021-02-19
5[taxonomies]
6category = ["sw"]
7tags = ["windows", "rust", "hacking"]
8+++
9
10This is part 3 on the *Writing our own Cheat Engine* series:
11
12* [Part 1: Introduction](/blog/woce-1) (start here if you're new to the series!)
13* [Part 2: Exact Value scanning](/blog/woce-2)
14* Part 3: Unknown initial value
15* [Part 4: Floating points](/blog/woce-4)
16* [Part 5: Code finder](/blog/woce-5)
17* [Part 6: Pointers](/blog/woce-6)
18
19In part 2 we left off with a bit of a cliff-hanger. Our little program is now able to scan for an exact value, remember the couple hundred addresses pointing to said value, and perform subsequent scans to narrow the list of addresses down until we're left with a handful of them.
20
21However, it is not always the case that you have an exact value to work with. The best you can do in these cases is guess what the software might be storing. For example, it could be a floating point for your current movement speed in a game, or an integer for your current health.
22
23The problem with this is that there are far too many possible locations storing our desired value. If you count misaligned locations, this means there is a different location to address every single byte in memory. A program with one megabyte of memory already has a *million* of addresses. Clearly, we need to do better than performing one million memory reads[^1].
24
25This post will shift focus a bit from using `winapi` to possible techniques to perform the various scans.
26
27## Unknown initial value
28
29<details open><summary>Cheat Engine Tutorial: Step 3</summary>
30
31> Ok, seeing that you've figured out how to find a value using exact value let's move on to the next step.
32>
33> First things first though. Since you are doing a new scan, you have to click on New Scan first, to start a new scan. (You may think this is straighforward, but you'd be surprised how many people get stuck on that step) I won't be explaining this step again, so keep this in mind
34> Now that you've started a new scan, let's continue
35>
36> In the previous test we knew the initial value so we could do a exact value, but now we have a status bar where we don't know the starting value.
37> We only know that the value is between 0 and 500. And each time you click 'hit me' you lose some health. The amount you lose each time is shown above the status bar.
38>
39> Again there are several different ways to find the value. (like doing a decreased value by... scan), but I'll only explain the easiest. "Unknown initial value", and decreased value.
40> Because you don't know the value it is right now, a exact value wont do any good, so choose as scantype 'Unknown initial value', again, the value type is 4-bytes. (most windows apps use 4-bytes)click first scan and wait till it's done.
41>
42> When it is done click 'hit me'. You'll lose some of your health. (the amount you lost shows for a few seconds and then disappears, but you don't need that)
43> Now go to Cheat Engine, and choose 'Decreased Value' and click 'Next Scan'
44> When that scan is done, click hit me again, and repeat the above till you only find a few.
45>
46> We know the value is between 0 and 500, so pick the one that is most likely the address we need, and add it to the list.
47> Now change the health to 5000, to proceed to the next step.
48
49</details>
50
51## Dense memory locations
52
53The key thing to notice here is that, when we read memory from another process, we do so over *entire regions*. A memory region is represented by a starting offset, a size, and a bunch of other things like protection level.
54
55When running the first scan for an unknown value, all we need to remember is the starting offset and size for every single region. All the candidate locations that could point to our value fall within this range, so it is enough for us to store the range definition, and not every location within it.
56
57To gain a better understanding of what this means, let's come up with a more specific scenario. With our current approach of doing things, we store an address (`usize`) for every location pointing to our desired value. In the case of unknown values, all locations are equally valid, since we don't know what value they should point to yet, and any value they point to is good. With this representation, we would end up with a very large vector:
58
59```rust
60let locations = vec![0x2000, 0x2001, ..., 0x20ff, 0x2100];
61```
62
63This representation is dense. Every single number in the range `0x2000..=0x2100` is present. So why bother storing the values individually when the range is enough?:
64
65```rust
66let locations = EntireRegion { range: 0x2000..=0x2100 };
67```
68
69Much better! With two `usize`, one for the starting location and another for the end, we can indicate that we care about all the locations falling in that range.
70
71In fact, some accessible memory regions immediately follow eachother, so we could even compact this further and merge regions which are together. But due to their potential differences with regards to protection levels, we will not attempt to merge regions.
72
73We don't want to get rid of the old way of storing locations, because once we start narrowing them down, we will want to go back to storing just a few candidates. To keep things tidy, let's introduce a new `enum` representing either possibility:
74
75```rust
76use std::ops::Range;
77
78pub enum CandidateLocations {
79 Discrete {
80 locations: Vec<usize>,
81 },
82 Dense {
83 range: Range<usize>,
84 }
85}
86```
87
88Let's also introduce another `enum` to perform the different scan types. For the time being, we will only worry about looking for `i32` in memory:
89
90```rust
91pub enum Scan {
92 Exact(i32),
93 Unknown,
94}
95```
96
97## Storing scanned values
98
99When scanning for exact values, it's not necessary to store the value found. We already know they're all the same, for example, value `42`. However, if the value is unknown, we do need to store it so that we can compare it in a subsequent scan to see if the value is the same or it changed. This means the value can be "any within" the read memory chunk:
100
101```rust
102pub enum Value {
103 Exact(i32),
104 AnyWithin(Vec<u8>),
105}
106```
107
108For every region in memory, there will be some candidate locations and a value (or value range) we need to compare against in subsequent scans:
109
110```rust
111pub struct Region {
112 pub info: winapi::um::winnt::MEMORY_BASIC_INFORMATION,
113 pub locations: CandidateLocations,
114 pub value: Value,
115}
116```
117
118With all the data structures needed setup, we can finally refactor our old scanning code into a new method capable of dealing with all these cases. For brevity, I will omit the exact scan, as it remains mostly unchanged:
119
120```rust
121use winapi::um::winnt::MEMORY_BASIC_INFORMATION;
122
123...
124
125// inside `impl Process`
126pub fn scan_regions(&self, regions: &[MEMORY_BASIC_INFORMATION], scan: Scan) -> Vec<Region> {
127 regions
128 .iter()
129 .flat_map(|region| match scan {
130 Scan::Exact(n) => todo!("old scan implementation"),
131 Scan::Unknown => {
132 let base = region.BaseAddress as usize;
133 match self.read_memory(region.BaseAddress as _, region.RegionSize) {
134 Ok(memory) => Some(Region {
135 info: region.clone(),
136 locations: CandidateLocations::Dense {
137 range: base..base + region.RegionSize,
138 },
139 value: Value::AnyWithin(memory),
140 }),
141 Err(_) => None,
142 }
143 }
144 })
145 .collect()
146}
147```
148
149Time to try it out!
150
151```rust
152impl CandidateLocations {
153 pub fn len(&self) -> usize {
154 match self {
155 CandidateLocations::Discrete { locations } => locations.len(),
156 CandidateLocations::Dense { range } => range.len(),
157 }
158 }
159}
160
161...
162
163fn main() {
164 // -snip-
165
166 println!("Scanning {} memory regions", regions.len());
167 let last_scan = process.scan_regions(®ions, Scan::Unknown);
168 println!(
169 "Found {} locations",
170 last_scan.iter().map(|r| r.locations.len()).sum::<usize>()
171 );
172}
173```
174
175```
176Scanning 88 memory regions
177Found 3014656 locations
178```
179
180If we consider misaligned locations, there is a lot of potential addresses where we could look for. Running the same scan on Cheat Engine yields `2,449,408` addresses, which is pretty close. It's probably skipping some additional regions that we are considering. Emulating Cheat Engine to perfection is not a concern for us at the moment, so I'm not going to investigate what regions it actually uses.
181
182## Comparing scanned values
183
184Now that we have performed the initial scan and have stored all the `CandidateLocations` and `Value`, we can re-implement the "next scan" step to handle any variant of our `Scan` enum. This enables us to mix-and-match any `Scan` mode in any order. For example, one could perform an exact scan, then one for decreased values, or start with unknown scan and scan for unchanged values.
185
186The tutorial suggests using "decreased value" scan, so let's start with that:
187
188```rust
189pub enum Scan {
190 Exact(i32),
191 Unknown,
192 Decreased, // new!
193}
194```
195
196Other scanning modes, such as decreased by a known amount rather than any decrease, increased, unchanged, changed and so on, are not very different from the "decreased" scan, so I won't bore you with the details.
197
198I will use a different method to perform a "rescan", since the first one is a bit more special in that it doesn't start with any previous values:
199
200```rust
201pub fn rescan_regions(&self, regions: &[Region], scan: Scan) -> Vec<Region> {
202 regions
203 .iter()
204 .flat_map(|region| match scan {
205 Scan::Decreased => {
206 let mut locations = Vec::new();
207 match region.locations {
208 CandidateLocations::Dense { range } => {
209 match self.read_memory(range.start, range.end - range.start) {
210 Ok(memory) => match region.value {
211 Value::AnyWithin(previous) => {
212 memory
213 .windows(4)
214 .zip(previous.windows(4))
215 .enumerate()
216 .step_by(4)
217 .for_each(|(offset, (new, old))| {
218 let new = i32::from_ne_bytes([
219 new[0], new[1], new[2], new[3],
220 ]);
221 let old = i32::from_ne_bytes([
222 old[0], old[1], old[2], old[3],
223 ]);
224 if new < old {
225 locations.push(range.start + offset);
226 }
227 });
228
229 Some(Region {
230 info: region.info.clone(),
231 locations: CandidateLocations::Discrete { locations },
232 value: Value::AnyWithin(memory),
233 })
234 }
235 _ => todo!(),
236 },
237 _ => todo!(),
238 }
239 }
240 _ => todo!(),
241 }
242 }
243 _ => todo!(),
244 })
245 .collect()
246}
247```
248
249If you've skimmed over that, I do not blame you. Here's the summary: for every existing region, when executing the scan mode "decreased", if the previous locations were dense, read the entire memory region. On success, if the previous values were a chunk of memory, iterate over the current and old memory at the same time, and for every aligned `i32`, if the new value is less, store it.
250
251It's also making me ill. Before I leave a mess on the floor, does it work?
252
253```rust
254std::thread::sleep(std::time::Duration::from_secs(10));
255let last_scan = process.rescan_regions(&last_scan, Scan::Decreased);
256println!(
257 "Found {} locations",
258 last_scan.iter().map(|r| r.locations.len()).sum::<usize>()
259);
260```
261
262```rust
263Found 3014656 locations
264Found 177 locations
265```
266
267Okay, great, let's clean up this mess…
268
269## Refactoring
270
271Does it also make you uncomfortable to be writing something that you know will end up *huge* unless you begin refactoring other parts right now? I definitely feel that way. But I think it's good discipline to push through with something that works first, even if it's nasty, before going on a tangent. Now that we have the basic implementation working, let's take on this monster before it eats us alive.
272
273First things first, that method is inside an `impl` block. The deepest nesting level is 13. I almost have to turn around my chair to read the entire thing out!
274
275Second, we're nesting four matches. Three of them we care about: scan, candidate location, and value. If each of these `enum` has `S`, `C` and `V` variants respectively, writing each of these by hand will require `S * C * V` different implementations! Cheat Engine offers 10 different scans, I can think of at least 3 different ways to store candidate locations, and another 3 ways to store the values found. That's `10 * 3 * 3 = 90` different combinations. I am not willing to write out all these[^2], so we need to start introducing some abstractions. Just imagine what a monster function you would end with! The horror!
276
277Third, why is the scan being executed in the process? This is something that should be done in the `impl Scan` instead!
278
279Let's begin the cleanup:
280
281```rust
282pub fn rescan_regions(&self, regions: &[Region], scan: Scan) -> Vec<Region> {
283 todo!()
284}
285```
286
287I already feel ten times better.
288
289Now, this method will unconditionally read the entire memory region, even if the scan or the previous candidate locations don't need it[^3]. In the worst case with a single discrete candidate location, we will be reading a very large chunk of memory when we could have read just the 4 bytes needed for the `i32`. On the bright side, if there *are* more locations in this memory region, we will get read of them at the same time[^4]. So even if we're moving more memory around all the time, it isn't *too* bad.
290
291```rust
292regions
293 .iter()
294 .flat_map(
295 |region| match self.read_memory(region.info.BaseAddress as _, region.info.RegionSize) {
296 Ok(memory) => todo!(),
297 Err(err) => {
298 eprintln!(
299 "Failed to read {} bytes at {:?}: {}",
300 region.info.RegionSize, region.info.BaseAddress, err,
301 );
302 None
303 }
304 },
305 )
306 .collect()
307```
308
309Great! If reading memory succeeds, we want to rerun the scan:
310
311```rust
312Ok(memory) => Some(scan.rerun(region, memory)),
313```
314
315The rerun will live inside `impl Scan`:
316
317```rust
318pub fn rerun(&self, region: &Region, memory: Vec<u8>) -> Region {
319 match self {
320 Scan::Exact(_) => self.run(region.info.clone(), memory),
321 Scan::Unknown => region.clone(),
322 Scan::Decreased => todo!(),
323 }
324}
325```
326
327An exact scan doesn't care about any previous values, so it behaves like a first scan. The first scan is done by the `run` function (it contains the implementation factored out of the `Process::scan_regions` method), which only needs the region information and the current memory chunk we just read.
328
329The unknown scan leaves the region unchanged: any value stored is still valid, because it is unknown what we're looking for.
330
331The decreased scan will have to iterate over all the candidate locations, and compare them with the current memory chunk. But this time, we'll abstract this iteration too:
332
333```rust
334impl Region {
335 fn iter_locations<'a>(
336 &'a self,
337 new_memory: &'a [u8],
338 ) -> impl Iterator<Item = (usize, i32, i32)> + 'a {
339 match &self.locations {
340 CandidateLocations::Dense { range } => range.clone().step_by(4).map(move |addr| {
341 let old = self.value_at(addr);
342 let new = i32::from_ne_bytes([
343 new_memory[0],
344 new_memory[1],
345 new_memory[2],
346 new_memory[3],
347 ]);
348 (addr, old, new)
349 }),
350 _ => todo!(),
351 }
352 }
353}
354```
355
356For a dense candidate location, we iterate over all the 4-aligned addresses (fast scan for `i32` values), and yield `(current address, old value, new value)`. This way, the `Scan` can do anything it wants with the old and new values, and if it finds a match, it can use the address.
357
358The `value_at` method will deal with all the `Value` variants:
359
360```rust
361fn value_at(&self, addr: usize) -> i32 {
362 match &self.value {
363 Value::AnyWithin(chunk) => {
364 let base = addr - self.info.BaseAddress as usize;
365 let bytes = &chunk[base..base + 4];
366 i32::from_ne_bytes([bytes[0], bytes[1], bytes[2], bytes[3]])
367 }
368 _ => todo!(),
369 }
370}
371```
372
373This way, `iter_locations` can easily use any value type. With this, we have all `enum` covered: `Scan` in `rerun`, `CandidateLocation` in `iter_locations`, and `Value` in `value_at`. Now we can add as many variants as we want, and we will only need to update a single `match` arm for each of them. Let's implement `Scan::Decreased` and try it out:
374
375```rust
376pub fn rerun(&self, region: &Region, memory: Vec<u8>) -> Region {
377 match self {
378 Scan::Decreased => Region {
379 info: region.info.clone(),
380 locations: CandidateLocations::Discrete {
381 locations: region
382 .iter_locations(&memory)
383 .flat_map(|(addr, old, new)| if new < old { Some(addr) } else { None })
384 .collect(),
385 },
386 value: Value::AnyWithin(memory),
387 },,
388 }
389}
390```
391
392```
393Found 3014656 locations
394Found 223791 locations
395```
396
397Hmm… before we went down from `3014656` to `177` locations, and now we went down to `223791`. Where did we go wrong?
398
399After spending several hours on this, I can tell you where we went wrong. `iter_locations` is always accessing the memory range `0..4`, and not the right address. Here's the fix:
400
401```rust
402CandidateLocations::Dense { range } => range.clone().step_by(4).map(move |addr| {
403 let old = self.value_at(addr);
404 let base = addr - self.info.BaseAddress as usize;
405 let bytes = &new_memory[base..base + 4];
406 let new = i32::from_ne_bytes([bytes[0], bytes[1], bytes[2], bytes[3]]);
407 (addr, old, new)
408}),
409```
410
411## Going beyond
412
413Let's take a look at other possible `Scan` types. Cheat Engine supports the following initial scan types:
414
415* Exact Value
416* Bigger than…
417* Smaller than…
418* Value between…
419* Unknown initial value
420
421"Bigger than" and "Smaller than" can both be represented by "Value between", so it's pretty much just three.
422
423For subsequent scans, in addition to the scan types described above, we find:
424
425* Increased value
426* Increased value by…
427* Decreased value
428* Decreased value by…
429* Changed value
430* Unchanged value
431
432Not only does Cheat Engine provide all of these scans, but all of them can also be negated. For example, "find values that were not increased by 7". One could imagine to also support things like "increased value by range". For the increased and decreased scans, Cheat Engine also supports "at least xx%", so that if the value changed within the specified percentage interval, it will be considered.
433
434What about `CandidateLocations`? I can't tell you how Cheat Engine stores these, but I can tell you that `CandidateLocations::Discrete` can still be quite inefficient. Imagine you've started with a scan for unknown values and then ran a scan for unchanged valueus. Most values in memory will have been unchanged, but with our current implementation, we are now storing an entire `usize` address for each of these. One option would be to introduce `CandidateLocations::Sparse`, which would be a middle ground. You could implement it like `Dense` and include a vector of booleans telling you which values to consider, or go smaller and use a bitstring or bit vector. You could use a sparse vector data structure.
435
436`Value` is very much like `CandidateLocations`, except that it stores a value to compare against and not an address. Here we can either have an exact value, or an older copy of the memory. Again, keeping a copy of the entire memory chunk when all we need is a handful of values is inefficient. You could keep a mapping from addresses to values if you don't have too many. Or you could shrink and fragment the copied memory in a more optimal way. There's a lot of room for improvement!
437
438What if, despite all of the efforts above, we still don't have enough RAM to store all this information? The Cheat Engine Tutorial doesn't use a lot of memory, but as soon as you try scanning bigger programs, like games, you may find yourself needing several gigabytes worth of memory to remember all the found values in order to compare them in subsequent scans. You may even need to consider dumping all the regions to a file and read from it to run the comparisons. For example, running a scan for "unknown value" in Cheat Engine brings its memory up by the same amount of memory used by the process scanned (which makes sense), but as soon as I ran a scan for "unchanged value" over the misaligned values, Cheat Engine's disk usage skyrocketed to 1GB/s (!) for several seconds on my SSD. After it finished, memory usage went down to normal. It was very likely writing out all candidate locations to disk.
439
440## Finale
441
442There is a lot of things to learn from Cheat Engine just by observing its behaviour, and we're only scratching its surface.
443
444In the [next post](/blog/woce-4), we'll tackle the fourth step of the tutorial: Floating points. So far, we have only been working with `i32` for simplicity. We will need to update our code to be able to account for different data types, which will make it easy to support other types like `i16`, `i64`, or even strings, represented as an arbitrary sequence of bytes.
445
446As usual, you can [obtain the code for this post][code] over at my GitHub. You can run `git checkout step3` after cloning the repository to get the right version of the code. This version is a bit cleaner than the one presented in the blog, and contains some of the things described in the [Going beyond](#going-beyond) section. Until next time!
447
448### Footnotes
449
450[^1]: Well, technically, we will perform a million memory reads[^5]. The issue here is the million calls to `ReadProcessMemory`, not reading memory per se.
451
452[^2]: Not currently. After a basic implementation works, writing each implementation by hand and fine-tuning them by treating each of them as a special case could yield significant speed improvements. So although it would be a lot of work, this option shouldn't be ruled out completely.
453
454[^3]: You could ask the candidate locations where one should read, which would still keep the code reasonably simple.
455
456[^4]: You could also optimize for this case by determining both the smallest and largest address, and reading enough to cover them both. Or apply additional heuristics to only do so if the ratio of the size you're reading compared to the size you need isn't too large and abort the joint read otherwise. There is a lot of room for optimization here.
457
458[^5]: (A footnote in a footnote?) The machine registers, memory cache and compiler will all help lower this cost, so the generated executable might not actually need that many reads from RAM. But that's getting way too deep into the details now.
459
460[code]: https://github.com/lonami/memo