| jmcph4 |

Decoding Solidity Metadata

[2025-03-13 08:00:00 +1000] A monumental tomb for sovereigns of a great empire, Pierre-François-Léonard Fontaine

A little known feature of Solidity is that information pertaining to how your smart contract was built is appended to the end of the (runtime) bytecode. Not only is this a feature of the compiler, but this behaviour occurs by default. One consequence of this is that your favourite smart contract very likely has this metadata appended to its deployed bytecode - which means that it's there on-chain for you to look at with your friends. The idea behind this feature is that you (a responsible and diligent smart contract developer who cares deeply about decentralisation) will take the canonical metadata file that's also produced by Solidity and somehow distribute it via your distributed filesystem of choice; by default, the venerable IPFS. This metadata file contains a JSON-encoded description of various aspects of your smart contract: its ABI, the names of the other source files in your project (if any), and the entirety of its Solidity source code, to name a few things. The information contained in this file is potentially very valuable depending on who you are. If you're a security researcher or auditor, you likely want to know if the Solidity version used to compile the contract has any known vulnerabilities. If you're a proprietary trader performing due diligence on a new DEX integration, you likely want to ensure that the Solidity source code on your favourite block explorer really does match with the actual runtime bytecode deployed on-chain. Whether anyone actually performs this (fairly critical!) distribution step is a question for future work.

Decoding Metadata in Practice

So how is this metadata actually encoded? A format called CBOR is used. CBOR stands for Concise Binary Object Representation and is standardised in RFC 8949. It's like JSON but in binary and thus far more concise - making it a reasonable choice for this application. The next question is where abouts in the bytecode it's stored. Earlier we had established that the compiler appends it to the bytecode but this was a little white lie. The final two octets of the bytecode actually encode the length of the CBOR data (in a big-endian manner). Thus the CBOR bytes are immediately prior to this two-byte length field. Supposing that these two bytes represent the number $m$, then the CBOR data will be the $m$ bytes preceding this length field.

Now that we understand how to extract the CBOR bytes, the next question is what schema they represent. CBOR, being like JSON, encodes arbitrary, typed key-value mappings - but what keys are we expecting? From the documentation,

{
  "ipfs": "<metadata hash>",
  // If "bytecodeHash" was "bzzr1" in compiler settings not "ipfs" but "bzzr1"
  "bzzr1": "<metadata hash>",
  // Previous versions were using "bzzr0" instead of "bzzr1"
  "bzzr0": "<metadata hash>",
  // If any experimental features that affect code generation are used
  "experimental": true,
  "solc": "<compiler version>"
}

There's an extremely useful tool, called the CBOR playground, that we can use to decode and visualise an arbitrary sequence of CBOR bytes. Similarly, there's a tool specifically for Solidity metadata called the metadata playground. Together, these tools can get us pretty far. For instance, consider the ETH2 deposit contract at 0x00000000219ab540356cBB839Cbe05303d7705Fa:

Figure 1: Output of Sourcify's metadata tool. The contract depicted is the mainnet ETH2 deposit contract.
Figure 2: Output of the CBOR playground tool.

In Figure 1, we can see the entire CBOR mapping with the relevant byte subsequences highlighted accordingly as well as the fully decoded, human-readable values that it contains. The final bytes of the bytecode are highlighted red and represent the length of the CBOR data (in this case, 51 bytes). The green byte subsequence is the actual CBOR data. Figure 2 illustrates the low-level decoding of this data. The mapping contains two keys: an IPFS hash and the Solidity version used to compile the contract. The IPFS hash is QmdD3hpMj6mEFVy9DP4QqjHaoeYbhKsYvApX1YZNfjTVWp and Solidity 0.6.11 was used to compile the code. Fortunately, the EF did in fact publish the associated metadata file for the deposit contract and it's available via IPFS as you might expect.

$ ipfs get QmdD3hpMj6mEFVy9DP4QqjHaoeYbhKsYvApX1YZNfjTVWp | jq
{
  "compiler": {
    "version": "0.6.11+commit.5ef660b1"
  },
  "language": "Solidity",
  "output": {
    "abi": [
      {
        "inputs": [],
        "stateMutability": "nonpayable",
        "type": "constructor"
      },
      {
        "anonymous": false,
        "inputs": [
          {
            "indexed": false,
            "internalType": "bytes",
            "name": "pubkey",
            "type": "bytes"
          },
          {
            "indexed": false,
            "internalType": "bytes",
            "name": "withdrawal_credentials",
            "type": "bytes"
          },
          {
            "indexed": false,
            "internalType": "bytes",
            "name": "amount",
            "type": "bytes"
          },
          {
            "indexed": false,
            "internalType": "bytes",
            "name": "signature",
            "type": "bytes"
          },
          {
            "indexed": false,
            "internalType": "bytes",
            "name": "index",
            "type": "bytes"
          }
        ],
        "name": "DepositEvent",
        "type": "event"
      },
      {
        "inputs": [
          {
            "internalType": "bytes",
            "name": "pubkey",
            "type": "bytes"
          },
          {
            "internalType": "bytes",
            "name": "withdrawal_credentials",
            "type": "bytes"
          },
          {
            "internalType": "bytes",
            "name": "signature",
            "type": "bytes"
          },
          {
            "internalType": "bytes32",
            "name": "deposit_data_root",
            "type": "bytes32"
          }
        ],
        "name": "deposit",
        "outputs": [],
        "stateMutability": "payable",
        "type": "function"
      },
      {
        "inputs": [],
        "name": "get_deposit_count",
        "outputs": [
          {
            "internalType": "bytes",
            "name": "",
            "type": "bytes"
          }
        ],
        "stateMutability": "view",
        "type": "function"
      },
      {
        "inputs": [],
        "name": "get_deposit_root",
        "outputs": [
          {
            "internalType": "bytes32",
            "name": "",
            "type": "bytes32"
          }
        ],
        "stateMutability": "view",
        "type": "function"
      },
      {
        "inputs": [
          {
            "internalType": "bytes4",
            "name": "interfaceId",
            "type": "bytes4"
          }
        ],
        "name": "supportsInterface",
        "outputs": [
          {
            "internalType": "bool",
            "name": "",
            "type": "bool"
          }
        ],
        "stateMutability": "pure",
        "type": "function"
      }
    ],
    "devdoc": {
      "kind": "dev",
      "methods": {
        "deposit(bytes,bytes,bytes,bytes32)": {
          "params": {
            "deposit_data_root": "The SHA-256 hash of the SSZ-encoded DepositData object. Used as a protection against malformed input.",
            "pubkey": "A BLS12-381 public key.",
            "signature": "A BLS12-381 signature.",
            "withdrawal_credentials": "Commitment to a public key for withdrawals."
          }
        },
        "get_deposit_count()": {
          "returns": {
            "_0": "The deposit count encoded as a little endian 64-bit number."
          }
        },
        "get_deposit_root()": {
          "returns": {
            "_0": "The deposit root hash."
          }
        },
        "supportsInterface(bytes4)": {
          "details": "Interface identification is specified in ERC-165. This function  uses less than 30,000 gas.",
          "params": {
            "interfaceId": "The interface identifier, as specified in ERC-165"
          },
          "returns": {
            "_0": "`true` if the contract implements `interfaceId` and  `interfaceId` is not 0xffffffff, `false` otherwise"
          }
        }
      },
      "version": 1
    },
    "userdoc": {
      "events": {
        "DepositEvent(bytes,bytes,bytes,bytes,bytes)": {
          "notice": "A processed deposit event."
        }
      },
      "kind": "user",
      "methods": {
        "deposit(bytes,bytes,bytes,bytes32)": {
          "notice": "Submit a Phase 0 DepositData object."
        },
        "get_deposit_count()": {
          "notice": "Query the current deposit count."
        },
        "get_deposit_root()": {
          "notice": "Query the current deposit root hash."
        },
        "supportsInterface(bytes4)": {
          "notice": "Query if a contract implements an interface"
        }
      },
      "notice": "This is the Ethereum 2.0 deposit contract interface. For more information see the Phase 0 specification under https://github.com/ethereum/eth2.0-specs",
      "version": 1
    }
  },
  "settings": {
    "compilationTarget": {
      "deposit_contract.sol": "DepositContract"
    },
    "evmVersion": "istanbul",
    "libraries": {},
    "metadata": {
      "bytecodeHash": "ipfs",
      "useLiteralContent": true
    },
    "optimizer": {
      "enabled": true,
      "runs": 5000000
    },
    "remappings": []
  },
  "sources": {
    "deposit_contract.sol": {
      "content": "// ┏━━━┓━┏┓━┏┓━━┏━━━┓━━┏━━━┓━━━━┏━━━┓━━━━━━━━━━━━━━━━━━━┏┓━━━━━┏━━━┓━━━━━━━━━┏┓━━━━━━━━━━━━━━┏┓━\n// ┃┏━━┛┏┛┗┓┃┃━━┃┏━┓┃━━┃┏━┓┃━━━━┗┓┏┓┃━━━━━━━━━━━━━━━━━━┏┛┗┓━━━━┃┏━┓┃━━━━━━━━┏┛┗┓━━━━━━━━━━━━┏┛┗┓\n// ┃┗━━┓┗┓┏┛┃┗━┓┗┛┏┛┃━━┃┃━┃┃━━━━━┃┃┃┃┏━━┓┏━━┓┏━━┓┏━━┓┏┓┗┓┏┛━━━━┃┃━┗┛┏━━┓┏━┓━┗┓┏┛┏━┓┏━━┓━┏━━┓┗┓┏┛\n// ┃┏━━┛━┃┃━┃┏┓┃┏━┛┏┛━━┃┃━┃┃━━━━━┃┃┃┃┃┏┓┃┃┏┓┃┃┏┓┃┃━━┫┣┫━┃┃━━━━━┃┃━┏┓┃┏┓┃┃┏┓┓━┃┃━┃┏┛┗━┓┃━┃┏━┛━┃┃━\n// ┃┗━━┓━┃┗┓┃┃┃┃┃┃┗━┓┏┓┃┗━┛┃━━━━┏┛┗┛┃┃┃━┫┃┗┛┃┃┗┛┃┣━━┃┃┃━┃┗┓━━━━┃┗━┛┃┃┗┛┃┃┃┃┃━┃┗┓┃┃━┃┗┛┗┓┃┗━┓━┃┗┓\n// ┗━━━┛━┗━┛┗┛┗┛┗━━━┛┗┛┗━━━┛━━━━┗━━━┛┗━━┛┃┏━┛┗━━┛┗━━┛┗┛━┗━┛━━━━┗━━━┛┗━━┛┗┛┗┛━┗━┛┗┛━┗━━━┛┗━━┛━┗━┛\n// ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┃┃━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\n// ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┗┛━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\n\n// SPDX-License-Identifier: CC0-1.0\n\npragma solidity 0.6.11;\n\n// This interface is designed to be compatible with the Vyper version.\n/// @notice This is the Ethereum 2.0 deposit contract interface.\n/// For more information see the Phase 0 specification under https://github.com/ethereum/eth2.0-specs\ninterface IDepositContract {\n    /// @notice A processed deposit event.\n    event DepositEvent(\n        bytes pubkey,\n        bytes withdrawal_credentials,\n        bytes amount,\n        bytes signature,\n        bytes index\n    );\n\n    /// @notice Submit a Phase 0 DepositData object.\n    /// @param pubkey A BLS12-381 public key.\n    /// @param withdrawal_credentials Commitment to a public key for withdrawals.\n    /// @param signature A BLS12-381 signature.\n    /// @param deposit_data_root The SHA-256 hash of the SSZ-encoded DepositData object.\n    /// Used as a protection against malformed input.\n    function deposit(\n        bytes calldata pubkey,\n        bytes calldata withdrawal_credentials,\n        bytes calldata signature,\n        bytes32 deposit_data_root\n    ) external payable;\n\n    /// @notice Query the current deposit root hash.\n    /// @return The deposit root hash.\n    function get_deposit_root() external view returns (bytes32);\n\n    /// @notice Query the current deposit count.\n    /// @return The deposit count encoded as a little endian 64-bit number.\n    function get_deposit_count() external view returns (bytes memory);\n}\n\n// Based on official specification in https://eips.ethereum.org/EIPS/eip-165\ninterface ERC165 {\n    /// @notice Query if a contract implements an interface\n    /// @param interfaceId The interface identifier, as specified in ERC-165\n    /// @dev Interface identification is specified in ERC-165. This function\n    ///  uses less than 30,000 gas.\n    /// @return `true` if the contract implements `interfaceId` and\n    ///  `interfaceId` is not 0xffffffff, `false` otherwise\n    function supportsInterface(bytes4 interfaceId) external pure returns (bool);\n}\n\n// This is a rewrite of the Vyper Eth2.0 deposit contract in Solidity.\n// It tries to stay as close as possible to the original source code.\n/// @notice This is the Ethereum 2.0 deposit contract interface.\n/// For more information see the Phase 0 specification under https://github.com/ethereum/eth2.0-specs\ncontract DepositContract is IDepositContract, ERC165 {\n    uint constant DEPOSIT_CONTRACT_TREE_DEPTH = 32;\n    // NOTE: this also ensures `deposit_count` will fit into 64-bits\n    uint constant MAX_DEPOSIT_COUNT = 2**DEPOSIT_CONTRACT_TREE_DEPTH - 1;\n\n    bytes32[DEPOSIT_CONTRACT_TREE_DEPTH] branch;\n    uint256 deposit_count;\n\n    bytes32[DEPOSIT_CONTRACT_TREE_DEPTH] zero_hashes;\n\n    constructor() public {\n        // Compute hashes in empty sparse Merkle tree\n        for (uint height = 0; height < DEPOSIT_CONTRACT_TREE_DEPTH - 1; height++)\n            zero_hashes[height + 1] = sha256(abi.encodePacked(zero_hashes[height], zero_hashes[height]));\n    }\n\n    function get_deposit_root() override external view returns (bytes32) {\n        bytes32 node;\n        uint size = deposit_count;\n        for (uint height = 0; height < DEPOSIT_CONTRACT_TREE_DEPTH; height++) {\n            if ((size & 1) == 1)\n                node = sha256(abi.encodePacked(branch[height], node));\n            else\n                node = sha256(abi.encodePacked(node, zero_hashes[height]));\n            size /= 2;\n        }\n        return sha256(abi.encodePacked(\n            node,\n            to_little_endian_64(uint64(deposit_count)),\n            bytes24(0)\n        ));\n    }\n\n    function get_deposit_count() override external view returns (bytes memory) {\n        return to_little_endian_64(uint64(deposit_count));\n    }\n\n    function deposit(\n        bytes calldata pubkey,\n        bytes calldata withdrawal_credentials,\n        bytes calldata signature,\n        bytes32 deposit_data_root\n    ) override external payable {\n        // Extended ABI length checks since dynamic types are used.\n        require(pubkey.length == 48, \"DepositContract: invalid pubkey length\");\n        require(withdrawal_credentials.length == 32, \"DepositContract: invalid withdrawal_credentials length\");\n        require(signature.length == 96, \"DepositContract: invalid signature length\");\n\n        // Check deposit amount\n        require(msg.value >= 1 ether, \"DepositContract: deposit value too low\");\n        require(msg.value % 1 gwei == 0, \"DepositContract: deposit value not multiple of gwei\");\n        uint deposit_amount = msg.value / 1 gwei;\n        require(deposit_amount <= type(uint64).max, \"DepositContract: deposit value too high\");\n\n        // Emit `DepositEvent` log\n        bytes memory amount = to_little_endian_64(uint64(deposit_amount));\n        emit DepositEvent(\n            pubkey,\n            withdrawal_credentials,\n            amount,\n            signature,\n            to_little_endian_64(uint64(deposit_count))\n        );\n\n        // Compute deposit data root (`DepositData` hash tree root)\n        bytes32 pubkey_root = sha256(abi.encodePacked(pubkey, bytes16(0)));\n        bytes32 signature_root = sha256(abi.encodePacked(\n            sha256(abi.encodePacked(signature[:64])),\n            sha256(abi.encodePacked(signature[64:], bytes32(0)))\n        ));\n        bytes32 node = sha256(abi.encodePacked(\n            sha256(abi.encodePacked(pubkey_root, withdrawal_credentials)),\n            sha256(abi.encodePacked(amount, bytes24(0), signature_root))\n        ));\n\n        // Verify computed and expected deposit data roots match\n        require(node == deposit_data_root, \"DepositContract: reconstructed DepositData does not match supplied deposit_data_root\");\n\n        // Avoid overflowing the Merkle tree (and prevent edge case in computing `branch`)\n        require(deposit_count < MAX_DEPOSIT_COUNT, \"DepositContract: merkle tree full\");\n\n        // Add deposit data root to Merkle tree (update a single `branch` node)\n        deposit_count += 1;\n        uint size = deposit_count;\n        for (uint height = 0; height < DEPOSIT_CONTRACT_TREE_DEPTH; height++) {\n            if ((size & 1) == 1) {\n                branch[height] = node;\n                return;\n            }\n            node = sha256(abi.encodePacked(branch[height], node));\n            size /= 2;\n        }\n        // As the loop should always end prematurely with the `return` statement,\n        // this code should be unreachable. We assert `false` just to be safe.\n        assert(false);\n    }\n\n    function supportsInterface(bytes4 interfaceId) override external pure returns (bool) {\n        return interfaceId == type(ERC165).interfaceId || interfaceId == type(IDepositContract).interfaceId;\n    }\n\n    function to_little_endian_64(uint64 value) internal pure returns (bytes memory ret) {\n        ret = new bytes(8);\n        bytes8 bytesValue = bytes8(value);\n        // Byteswapping during copying to bytes.\n        ret[0] = bytesValue[7];\n        ret[1] = bytesValue[6];\n        ret[2] = bytesValue[5];\n        ret[3] = bytesValue[4];\n        ret[4] = bytesValue[3];\n        ret[5] = bytesValue[2];\n        ret[6] = bytesValue[1];\n        ret[7] = bytesValue[0];\n    }\n}\n",
      "keccak256": "0xeb4884395e470268e1ff14dca32e7a030425557a23cb16013c4d25914fd1e4a1",
      "license": "CC0-1.0"
    }
  },
  "version": 1
}

I've implemented a decoder in around 160 lines of Rust called sme (the Solidity Metadata Extractor) as part of the broader Jetkit project. The output from this tool, when aimed at the ETH2 deposit contract above is as follows:

$ cast code --flashbots 0x00000000219ab540356cBB839Cbe05303d7705Fa | sme -m
Metadata {
    digest: Some(
        Ipfs(
            "QmdD3hpMj6mEFVy9DP4QqjHaoeYbhKsYvApX1YZNfjTVWp",
        ),
    ),
    experimental: false,
    solidity_version: Some(
        SolidityVersion {
            major: 0,
            minor: 6,
            patch: 11,
        },
    ),
}
ipfs://QmdD3hpMj6mEFVy9DP4QqjHaoeYbhKsYvApX1YZNfjTVWp

I've also written an Execution Extension (or ExEx) for Reth here which allows you to stream transactions live as they land on the canonical chain and extract the IPFS hashes for any newly deployed contracts.

Buzzing Bees of the Swarm

One of the interesting aspects of this entire metadata situation—and the reason that I wrote this article—is that it's relatively undiscussed. What's even more undiscussed is that the initially intended system for distributing metadata files wasn't IPFS, but a much more obscure (and now largely forgotten, sadly) system called Swarm.

At the risk of offending certain veteren developers or researchers, Swarm is essentially the Betamax to IPFS' VHS—it lost. According to Swarmscan, there are approximately 10,900 active nodes on the Swarm network and the latest academic literature claims that IPFS has over 280,000. Most critically, IPFS gateways, on average, actually respond to requests for content. I'm only aware of a single Swarm gateway, the official one. The data availability situation for Swarm must be somewhat dire as the Sourcify playground doesn't even attempt retrieval if the metadata points to a Swarm resource and essentially wishes the user good luck (see Figure 3).

Despite this seemingly low adoption by the modern Ethereum application layer, it seems that development of novel Swarm technology remains active. At some stage, the main client implementation, Bee, rebranded under the Ethersphere GitHub organisation. Version 2.5.0 was even released as I wrote parts of this article and weighs a whopping 127,800 lines of Go.

Figure 3: Output of Sourcify's metadata tool when presented with a Swarm hash (either bzzr0 or bzzr1).

Conclusion

The metadata included by Solidity contains several important pieces of information like the ABI, Solidity version, and even the whole source code. These can be used for a variety of different tasks like source code verification, indexing, and even automated vulnerability detection (either through static analysis or symbolically). Unfortunately, these all rely on the vital step of publishing the metadata file in an available and trustless way. This places the onus on application developers and also presents the usual challenges of decentralised file hosting like cost and censorship resistance. Overall, my intent for this piece was to bring more attention to a chronically underdiscussed aspect of smart contract development with Solidity.

If you've made it this far through the article then you've hopefully found it engaging and informative. To get more content like this, follow me on X, the everything app or even consider sponsoring me on GitHub.

Bibliography

  1. C. Bormann and P. Hoffman, "Concise Binary Object Representation (CBOR)," Internet Engineering Task Force (IETF), RFC 8949, Dec. 2020. [Online]. Available: https://datatracker.ietf.org/doc/html/rfc8949. Accessed: Mar. 7, 2025.
  2. Solidity Team, "Contract Metadata," Solidity Documentation, Version 0.8.28, Oct. 9, 2024. [Online]. Available: https://docs.soliditylang.org/en/v0.8.28/metadata.html. Accessed: Mar. 7, 2025.
  3. J. Benet, "IPFS - Content Addressed, Versioned, P2P File System," 2014. [Online]. Available: https://raw.githubusercontent.com/ipfs/papers/master/ipfs-cap2pfs/ipfs-p2p-file-system.pdf. Accessed: Mar. 7, 2025.
  4. Swarm Foundation, Swarm: The Decentralized Storage and Communication System for a Sovereign Digital Society, June 13, 2021. [Online]. Available: https://www.ethswarm.org/swarm-whitepaper.pdf. Accessed: Mar. 7, 2025.
  5. V. Trón, The Book of Swarm, Swarm Foundation, 2020. [Online]. Available: https://www.ethswarm.org/The-Book-of-Swarm.pdf. Accessed: Mar. 7, 2025.
  6. L. Rennert et al., "Swarm Network as Decentralized Economies of Scale," arXiv preprint arXiv:2205.14927, May 2022. [Online]. Available: https://arxiv.org/pdf/2205.14927. Accessed: Mar. 7, 2025.
  7. IPFS Community, "Nodes: Number of IPFS Nodes and the Most Popular Content," Reddit, Aug. 2023. [Online]. Available: https://old.reddit.com/r/ipfs/comments/15y9orp/nodes_number_of_ipfs_nodes_and_the_most_popular. Accessed: Mar. 7, 2025.