Parsing Text with the Parser Package

Learn how to parse structured text in Go using the text/scanner package

Go provides a robust way to parse structured text through the text/scanner package. This is particularly useful when you need a lightweight way to process text or create simple interpreters.

Basic Text Scanning

Here's a simple example demonstrating how to use the text/scanner package to tokenize an input string.

package main

import (
	"fmt"
	"text/scanner"
	"strings"
)

func main() {
	var s scanner.Scanner
	s.Init(strings.NewReader("42 + (1337 / foo)"))
	
	for tok := s.Scan(); tok != scanner.EOF; tok = s.Scan() {
		fmt.Printf("%s: %q\n", s.Position, s.TokenText())
	}
}

Handling Tokens with Scanning

You can further manipulate how tokens are handled using the text/scanner package:

package main

import (
	"fmt"
	"text/scanner"
	"strings"
)

func main() {
	var s scanner.Scanner
	s.Init(strings.NewReader("var x = 23; if x > 10 { x = x + 1 }"))
	s.Mode ^= scanner.SkipComments // include comments when scanning

	for tok := s.Scan(); tok != scanner.EOF; tok = s.Scan() {
		switch tok {
		case scanner.Ident:
			fmt.Printf("Identifier: %s\n", s.TokenText())
		case scanner.Int:
			fmt.Printf("Integer: %s\n", s.TokenText())
		default:
			fmt.Printf("Other: %s\n", s.TokenText())
		}
	}
}

Configuring Scanner Parameters

The scanner.Scanner type offers several configuration options:

  • Mode: Defines what types of tokens to recurse for.
  • Position: Allows detailed information about token positions.

Best Practices

  • Utilize s.Init() with different types of readers based on your input source.
  • Configure Mode to customize tokenization strategies to best suit your text parsing needs.
  • Make use of s.Error to handle syntax errors gracefully rather than crashing the application.

Common Pitfalls

  • Ignoring scanner errors can lead to silent parsing failures.
  • Not considering whitespace or comments that might affect parsing.
  • Assuming the same token type for all similar tokens (e.g., identifiers vs reserved words).

Performance Tips

  • Reuse scanner.Scanner to avoid repeated allocations, especially in loops or large-scale text processing.
  • Adjust Mode to skip undesired tokens to reduce overhead during scanning.
  • Profile your parsing logic to identify bottlenecks if dealing with large texts or complex parsing logic.