See all Paradigm transcripts on Youtube

youtube thumbnail

Rust x Ethereum Day - Heimdall, An Advanced Bytecode Analysis Toolkit

5 minutes 48 seconds

🇬🇧 English

S1

Speaker 1

00:02

Hello, I'm John Becker and I'm going to talk about Heimdall. It's basically an advanced bytecode analysis toolkit. So what can it do? Well, it can decompile bytecode,

S2

Speaker 2

00:18

generate control flow graphs for bytecode, decode arbitrary call data, dump contract storage and generate bytecode snapshots. I'll talk about what all those mean in a sec. But first I'm going to talk about symbolic execution.

S2

Speaker 2

00:34

Symbolic execution is basically a complex process of finding all the possible paths that a program can take. It's pretty blurry, but up there, that's just byte code. Symbolic execution will run this byte code with some call data and then every time it encounters a jump on instruction, it will take both paths and that allows us to trace what the actual EVM is doing. You probably can't see that either, but it's a bunch of assembly.

S2

Speaker 2

01:08

And then in between the assembly are lines and arrows to other blocks of assembly. And this is called the control flow graph. Basically, what that means is if you start at that first block, it will follow the EVM and just continue executing things. And this basically represents all the possible steps that the EVM could take.

S2

Speaker 2

01:33

And Heimdall is able to convert that symbolic execution into Solidity code, which you also can't read. Yeah. So now let's talk about what this powers. So this symbolic execution powers the decompile module, the CFG module, and the snapshot module.

S2

Speaker 2

01:52

Let's talk about the decompile module. So decompilation is the process of converting machine code or bytecode into a readable format like Solidity. And we're able to do this from symbolic execution like I said before, but basically you run Heimdall, put it in a contract address, you don't need to be verified and it will execute that contract symbolically and then generate an ABI and some representation of the source. It's very useful for pen testing or finding out what a contract does if it's not verified.

S2

Speaker 2

02:27

Taking it a step back, control flow graph. You've already seen this a few slides ago, but basically it shows every possible path that the contract could take. The snapshot module is new. I added it a few weeks ago.

S2

Speaker 2

02:42

It's similar to decompile, but takes another step back and will only show you relevant information. It will run things and display gas consumption, events emitted, custom errors, storage accesses, modifiers, access control, et cetera. And It can actually resolve signatures from the 4 byte directory which is pretty cool, which is how Heimdall is able to give you the exact ABI of any contract you put in. And then another module that does not rely on symbolic execution is the dump module, which basically would fetch all the transactions a contract has made or been interacted with and replay them.

S2

Speaker 2

03:26

And then give you a nice TUI format and CSV format of the storage slots within the contract, including things like mapping. And if you could read that, which you probably can't, this is the wrapped ether contract. And you can see the first 3 slots are wrapped ether, and then the decimals and stuff, and symbol. And then the last module I'm going to talk about is the code, which basically you can put in arbitrary call data or a transaction and it will decode the call data using the 4 byte directory.

S2

Speaker 2

04:04

So the first 4 bytes of the actual call data will be looked up. And then it'll take all those possibilities and attempt to decode them. And whatever succeeds will be the output. So in this example, which, oh, you can read that.

S2

Speaker 2

04:19

This example is just a withdraw. I think it's also from Rappeteeth there. But put in the transaction hash, you don't need the ABI, it will just tell you it's a withdraw of whatever the hell that uint is and you can also explain what the call does roughly using gbt4. So what's next?

S2

Speaker 2

04:38

I'm going to be working on a monitor module which basically will just watch the mempool for some patterns and then call cron or whatever you want it to do whenever something happens in the mempool. Then I have some improvements like improving symbolic execution because it's rough and loop detection is very hard, breaking out of loops is hard. Improving the decompilation output so that you can actually recompile it or just making it better. Recursive call data decoding like multi-call.

S2

Speaker 2

05:11

VyprSupport is also going to be rough because it's different.

S1

Speaker 1

05:19

And then

S2

Speaker 2

05:21

GPT-4 powered code cleanup is kind of on the back burner because AI is not deterministic, and I don't like that. And then raw trace decoding will also be added eventually. You can get in touch.

S2

Speaker 2

05:37

This is very short, but it's technical, so if you wanna talk to me afterwards about anything, feel free. Also like otters, so there's an otter. Thank you. Thank you.

S2

Speaker 2

05:47

Thank you.