文章/答案/技术大牛

发布

社区首页 >问答首页 >使用ocamllex进行词法和包含指令

问使用ocamllex进行词法和包含指令
EN

Stack Overflow用户

提问于 2016-04-26 10:39:48

回答 1查看 513关注 0票数 1

我正在为类似C的语言编写一个编译器，它必须支持#include指令(仅在文件的开头)。

一种简单但不优雅的方法是创建一个子程序，该子程序查找指令的每一个匹配项，并在一个新的临时文件中替换相应的文件。

这一点也不太好。因此，我尝试了以下几点：

lexer = parse
    | "#include \""   ( [^'"' '\n']* as filename) '"'
    { lexer (Lexing.from_channel (open_in filename)) ; lexer lexbuf }

这样做的目的是:每当您找到包含时，使用给定的文件名打开一个新的通道，并递归调用该通道上的"lexer“规则。在那之后，继续你的当前的状态，你的词汇缓冲区和继续的词汇。

问题是，它从来没有起作用。

我还看到，当缓冲器lexbuf到达eof时，就有可能再做一个填充器。但我没能找到更多的信息。这使我有了将上述代码更改为以下内容的想法：

lexer = parse
    | "#include \""   ( [^'"' '\n']* as filename) '"'
    { addCurrentLexBufToAStack lexbuf ;lexer (Lexing.from_channel    (open_in filename)); }

在灌装机里，你可以从堆的顶端继续

但似乎很有野心去工作。

有什么想法吗？

附注：从另一个模块调用lexer (以及解析)(让我们称它为Main.ml)

ocaml

lex

ocamllex

回答 1

Stack Overflow用户

回答已采纳

发布于 2016-04-26 12:44:00

好吧，你对词法和解析不有点困惑吗？

我看到的是：

如果我的lexeme是#include ident，我想要解析ident指出的文件中的并添加它。

然后，您就混淆了解析和词法

您可以这样写：(这是一个小程序，但可以工作；-)

ast.mli

type operation = 
 | Plus of operation * operation 
 | Minus of operation * operation
 | Int of int

type prog = string list * operation list

lexer.mll

{
  open Parser
  open Lexing
  open Ast

  let current_pos b =
    lexeme_start_p b,
    lexeme_end_p b

}

let newline = '\n'
let space = [' ' '\t' '\r']

let digit = ['0' - '9']
let integer = digit+

rule token = parse
| newline { token lexbuf}
| space+ { token lexbuf}
| "#include \""   ( [^'"' '\n']* as filename) '"' { INCLUDE filename } 
| integer as i { INTEGER (int_of_string i) }
| "+" { PLUSI }
| "-" { MINUSI }
| ";" { SC }
| "main" { MAIN }
| eof
    { EOF }

parser.mly

%{

  open Ast

%}

%token <string> INCLUDE
%token EOF SC
%token PLUSI 
%token MINUSI
%token MAIN
%token <int> INTEGER

%left PLUSI MINUSI

%start <Ast.prog> prog

%%

prog:
include_list MAIN operations EOF { ($1, $3) }

include_list:
| { [] }
| INCLUDE include_list { $1 :: $2 }

operation:
| operation PLUSI operation { Plus ($1, $3) }
| operation MINUSI operation { Minus ($1, $3) }
| INTEGER { Int $1 }

operations:
| operation { [$1] }
| operation SC operations { $1 :: $3 }

所以，正如您所看到的，当我解析时，我会记住我必须解析的文件名，

main.ml

open Lexing
open Ast

let rec print_op fmt op =
  match op with
    | Plus (op1, op2) ->
      Format.fprintf fmt "(%a + %a)"
        print_op op1 print_op op2
    | Minus (op1, op2) ->
      Format.fprintf fmt "(%a - %a)"
        print_op op1 print_op op2
    | Int i -> Format.fprintf fmt "%d" i

let rec read_includes fl =
  List.fold_left (fun acc f ->
    let c = open_in f in
    let lb = Lexing.from_channel c in
    let fl, p = Parser.prog Lexer.token lb in
    close_in c;
    let acc' = read_includes fl in
    acc' @ p
  ) [] fl

let () =
  try
    let p = read_includes [Sys.argv.(1)] in
    List.iter (Format.eprintf "%a@." print_op) p
  with _ -> Format.eprintf "Bad Boy !@."

这意味着当我解析完第一个文件时，我会解析包含的文件。

最重要的是你对词汇的困惑(这是编译器中最愚蠢的事情，你只需问：“你看到的下一个标记是什么?”他回答说“我看到了#include "filename"”，解析器也没那么蠢，他说：“嘿，雷克萨斯看到了#include "filename"，所以我会记住这个文件名，因为我可能需要它，我会继续前进。

如果我有这三份文件

file1

#include "file2"
main 
6; 7

file2

#include "file3"
main 
4; 5

file3

main 
1; 2; 3

如果我调用./compile file1，我就有输出1 2 3 4 5 6，这就是我想要的。;-)

编辑

使用lexer处理的新版本包括：

ast.mli

type operation = 
  | Plus of operation * operation 
  | Minus of operation * operation
  | Int of int

type prog = operation list

lexer.mll

{
  open Parser
  let fset = Hashtbl.create 17
  (* set keeping all the filenames *)
}

let newline = '\n'
let space = [' ' '\t' '\r']

let digit = ['0' - '9']
let integer = digit+

rule token = parse
| newline { token lexbuf}
| space+ { token lexbuf}
| "#include \""   ( [^'"' '\n']* as filename) '"' 
    { if Hashtbl.mem fset filename then
        raise Exit
      else 
        let c = open_in filename in
        Hashtbl.add fset filename ();
        let lb = Lexing.from_channel c in
        let p = Parser.prog token lb in
        INCLUDE p
    }
| integer as i { INTEGER (int_of_string i) }
| "+" { PLUSI }
| "-" { MINUSI }
| ";" { SC }
| "main" { MAIN }
| eof
    { EOF }

parser.mly

%{

  open Ast

%}

%token <Ast.prog> INCLUDE
%token EOF SC
%token PLUSI 
%token MINUSI
%token MAIN
%token <int> INTEGER

%left PLUSI MINUSI

%start <Ast.prog> prog

%%

prog:
include_list MAIN operations EOF { List.rev_append (List.rev $1) $3  }

include_list:
| { [] }
| INCLUDE include_list { List.rev_append (List.rev $1) $2 }

operation:
| operation PLUSI operation { Plus ($1, $3) }
| operation MINUSI operation { Minus ($1, $3) }
| INTEGER { Int $1 }

operations:
| operation { [$1] }
| operation SC operations { $1 :: $3 }

main.ml

open Lexing
open Ast

let rec print_op fmt op =
  match op with
    | Plus (op1, op2) ->
      Format.fprintf fmt "(%a + %a)"
        print_op op1 print_op op2
    | Minus (op1, op2) ->
      Format.fprintf fmt "(%a - %a)"
        print_op op1 print_op op2
    | Int i -> Format.fprintf fmt "%d" i

let () =
  try
    let c = open_in Sys.argv.(1) in
    let lb = Lexing.from_channel c in
    let p = Parser.prog Lexer.token lb in
    close_in c;
    List.iter (Format.eprintf "%a@." print_op) p
  with _ -> Format.eprintf "Bad Boy !@."

因此，在lexer中，当我看到一个#include filename时，我会立即调用由filename链接的文件上的解析器，并返回解析到前面的解析调用的Ast.prog。

我希望这一切对你来说都很清楚-)

第二次编辑

我不能让这些代码像这样，我编辑它以避免包含循环(在lexer.mll中) ;-)

票数 4

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/36862521

复制

相似问题

问使用ocamllex进行词法和包含指令
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用ocamllex进行词法和包含指令EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用ocamllex进行词法和包含指令
EN