Lexical Analysis

Overview

Lexical analysis is the first phase of compilation, converting source code into tokens.

Concepts

  • Tokens: Smallest meaningful units (keywords, identifiers, operators)
  • Lexemes: Actual character sequences that form tokens
  • Regular Expressions: Pattern matching for tokens
  • Finite Automata: DFA and NFA for token recognition

Tools and Generators

  • Lex/Flex: Lexical analyzer generators
  • Regular expression patterns
  • Symbol tables

Common Tasks

  • Keyword recognition
  • Identifier validation
  • String and character literal handling
  • Comment removal

Error Handling

  • Unexpected character detection
  • Error recovery strategies