Extract Enum Variants in Rust

When working with large Rust enums, sometimes we need to extract just the variant names without their associated data. In this tutorial, we'll explore two approaches to extract enum variants: using Rust with the syn crate and using Python with regex.

The Problem

Let's say we have a large enum like TokenInstruction that looks like this:

#![allow(unused)]
fn main() {
pub enum TokenInstruction<'a> {
    InitializeMint {
        decimals: u8,
        mint_authority: Pubkey,
        freeze_authority: COption<Pubkey>,
    },
    InitializeAccount,
    // ... many more variants
}
}

And we want to extract just the variant names to get something like:

#![allow(unused)]
fn main() {
pub enum TokenInstruction<'a> {
    InitializeMint {},
    InitializeAccount {},
    // ... other variants
}
}

Solution 1: Using Rust with syn

The first approach uses Rust's syn crate to parse the code as an AST (Abstract Syntax Tree) and extract the variants. Here's how we can do it:

use proc_macro2::TokenStream;
use quote::quote;
use std::env;
use std::fs;
use syn::parse_str;
use syn::Item;

fn main() {
    let args: Vec<String> = env::args().collect();
    if args.len() != 2 {
        eprintln!("Usage: {} <rust_file>", args[0]);
        std::process::exit(1);
    }

    let file_path = &args[1];
    let content = match fs::read_to_string(file_path) {
        Ok(content) => content,
        Err(e) => {
            eprintln!("Error reading file: {}", e);
            std::process::exit(1);
        }
    };

    let ast = match syn::parse_file(&content) {
        Ok(ast) => ast,
        Err(e) => {
            eprintln!("Error parsing Rust file: {}", e);
            std::process::exit(1);
        }
    };

    // Iterate AST to search for enum
    for item in ast.items {
        if let Item::Enum(item_enum) = item {
            let enum_name = &item_enum.ident;
            let generics = &item_enum.generics;

            // print enum header part
            println!("pub enum {}{} {{", enum_name, quote!(#generics));

            // print each variant
            for variant in &item_enum.variants {
                let variant_name = &variant.ident;
                println!("    {} {{}},", variant_name);
            }

            // print end
            println!("}}");
        }
    }
}

This approach is more robust as it properly handles Rust syntax and preserves the enum's generic parameters.

Solution 2: Using Python with Regex

For a simpler but less robust approach, we can use Python with regular expressions:

import sys
from pathlib import Path
import re

def clean_rust_code(content):
    # remove multiline comments
    content = re.sub(r'///.*?\n', '\n', content, flags=re.MULTILINE)
    # remove single comments
    content = re.sub(r'//.*?\n', '\n', content)
    return content

def extract_enum_variants(file_path):
    with open(file_path, 'r') as f:
        content = f.read()

    # clean comments
    content = clean_rust_code(content)

    # extract enum name and generic parameter
    enum_pattern = re.compile(r'pub\s+enum\s+(\w+)(<.*?>)?')
    enum_match = enum_pattern.search(content)
    if not enum_match:
        print("No enum found")
        return

    enum_name = enum_match.group(1)
    enum_generic = enum_match.group(2) or ''

    # extract variants
    variant_pattern = re.compile(r'\s+(\w+)(?:\s*{[^}]*}|\s*,)')
    variants = variant_pattern.findall(content)

    # build the output
    output = [f"pub enum {enum_name}{enum_generic} {{"]
    for variant in variants:
        output.append(f"    {variant} {{}},")
    output.append("}")

    print("\n".join(output))

if __name__ == "__main__":
    if len(sys.argv) != 2:
        print("Usage: python extract_enum.py <file>")
        sys.exit(1)

    file_path = sys.argv[1]
    if not Path(file_path).exists():
        print(f"File {file_path} does not exist")
        sys.exit(1)

    extract_enum_variants(file_path)

Comparing the Approaches

ApproachProsCons
Rust with syn• Properly handles Rust syntax
• Maintains generic parameters
• More reliable for complex enums
• Requires additional dependencies
• More complex implementation
Python with regex• Simpler implementation
• No Rust-specific dependencies
• Faster to implement
• Less robust
• May break with complex Rust syntax
• Regex patterns might need adjustment for edge cases

Conclusion

Both approaches can help you extract enum variants, but choose based on your needs:

  • Use the Rust approach for production code or when dealing with complex Rust syntax
  • Use the Python approach for quick scripts or simple enums

Remember that the Rust approach using syn is generally more reliable as it properly parses the Rust syntax tree, while the Python regex approach is more suitable for quick, one-off tasks.