我正在为类似C的语言编写一个编译器,它必须支持#include指令(仅在文件的开头)。
一种简单但不优雅的方法是创建一个子程序,该子程序查找指令的每一个匹配项,并在一个新的临时文件中替换相应的文件。
这一点也不太好。因此,我尝试了以下几点:
lexer = parse
| "#include \"" ( [^'"' '\n']* as filename) '"'
{ lexer (Lexing.from_channel (open_in filename)) ; lexer lexbuf }这样做的目的是:每当您找到包含时,使用给定的文件名打开一个新的通道,并递归调用该通道上的"lexer“规则。在那之后,继续你的当前的状态,你的词汇缓冲区和继续的词汇。
问题是,它从来没有起作用。
我还看到,当缓冲器lexbuf到达eof时,就有可能再做一个填充器。但我没能找到更多的信息。这使我有了将上述代码更改为以下内容的想法:
lexer = parse
| "#include \"" ( [^'"' '\n']* as filename) '"'
{ addCurrentLexBufToAStack lexbuf ;lexer (Lexing.from_channel (open_in filename)); }在灌装机里,你可以从堆的顶端继续
但似乎很有野心去工作。
有什么想法吗?
附注:从另一个模块调用lexer (以及解析)(让我们称它为Main.ml)
发布于 2016-04-26 12:44:00
好吧,你对词法和解析不有点困惑吗?
我看到的是:
如果我的lexeme是#include ident,我想要解析ident指出的文件中的并添加它。
然后,您就混淆了解析和词法
您可以这样写:(这是一个小程序,但可以工作;-)
ast.mli
type operation =
| Plus of operation * operation
| Minus of operation * operation
| Int of int
type prog = string list * operation listlexer.mll
{
open Parser
open Lexing
open Ast
let current_pos b =
lexeme_start_p b,
lexeme_end_p b
}
let newline = '\n'
let space = [' ' '\t' '\r']
let digit = ['0' - '9']
let integer = digit+
rule token = parse
| newline { token lexbuf}
| space+ { token lexbuf}
| "#include \"" ( [^'"' '\n']* as filename) '"' { INCLUDE filename }
| integer as i { INTEGER (int_of_string i) }
| "+" { PLUSI }
| "-" { MINUSI }
| ";" { SC }
| "main" { MAIN }
| eof
{ EOF } parser.mly
%{
open Ast
%}
%token <string> INCLUDE
%token EOF SC
%token PLUSI
%token MINUSI
%token MAIN
%token <int> INTEGER
%left PLUSI MINUSI
%start <Ast.prog> prog
%%
prog:
include_list MAIN operations EOF { ($1, $3) }
include_list:
| { [] }
| INCLUDE include_list { $1 :: $2 }
operation:
| operation PLUSI operation { Plus ($1, $3) }
| operation MINUSI operation { Minus ($1, $3) }
| INTEGER { Int $1 }
operations:
| operation { [$1] }
| operation SC operations { $1 :: $3 }所以,正如您所看到的,当我解析时,我会记住我必须解析的文件名,
main.ml
open Lexing
open Ast
let rec print_op fmt op =
match op with
| Plus (op1, op2) ->
Format.fprintf fmt "(%a + %a)"
print_op op1 print_op op2
| Minus (op1, op2) ->
Format.fprintf fmt "(%a - %a)"
print_op op1 print_op op2
| Int i -> Format.fprintf fmt "%d" i
let rec read_includes fl =
List.fold_left (fun acc f ->
let c = open_in f in
let lb = Lexing.from_channel c in
let fl, p = Parser.prog Lexer.token lb in
close_in c;
let acc' = read_includes fl in
acc' @ p
) [] fl
let () =
try
let p = read_includes [Sys.argv.(1)] in
List.iter (Format.eprintf "%a@." print_op) p
with _ -> Format.eprintf "Bad Boy !@."这意味着当我解析完第一个文件时,我会解析包含的文件。
最重要的是你对词汇的困惑(这是编译器中最愚蠢的事情,你只需问:“你看到的下一个标记是什么?”他回答说“我看到了#include "filename"”,解析器也没那么蠢,他说:“嘿,雷克萨斯看到了#include "filename",所以我会记住这个文件名,因为我可能需要它,我会继续前进。
如果我有这三份文件
file1
#include "file2"
main
6; 7file2
#include "file3"
main
4; 5file3
main
1; 2; 3如果我调用./compile file1,我就有输出1 2 3 4 5 6,这就是我想要的。;-)
编辑
使用lexer处理的新版本包括:
ast.mli
type operation =
| Plus of operation * operation
| Minus of operation * operation
| Int of int
type prog = operation listlexer.mll
{
open Parser
let fset = Hashtbl.create 17
(* set keeping all the filenames *)
}
let newline = '\n'
let space = [' ' '\t' '\r']
let digit = ['0' - '9']
let integer = digit+
rule token = parse
| newline { token lexbuf}
| space+ { token lexbuf}
| "#include \"" ( [^'"' '\n']* as filename) '"'
{ if Hashtbl.mem fset filename then
raise Exit
else
let c = open_in filename in
Hashtbl.add fset filename ();
let lb = Lexing.from_channel c in
let p = Parser.prog token lb in
INCLUDE p
}
| integer as i { INTEGER (int_of_string i) }
| "+" { PLUSI }
| "-" { MINUSI }
| ";" { SC }
| "main" { MAIN }
| eof
{ EOF } parser.mly
%{
open Ast
%}
%token <Ast.prog> INCLUDE
%token EOF SC
%token PLUSI
%token MINUSI
%token MAIN
%token <int> INTEGER
%left PLUSI MINUSI
%start <Ast.prog> prog
%%
prog:
include_list MAIN operations EOF { List.rev_append (List.rev $1) $3 }
include_list:
| { [] }
| INCLUDE include_list { List.rev_append (List.rev $1) $2 }
operation:
| operation PLUSI operation { Plus ($1, $3) }
| operation MINUSI operation { Minus ($1, $3) }
| INTEGER { Int $1 }
operations:
| operation { [$1] }
| operation SC operations { $1 :: $3 }main.ml
open Lexing
open Ast
let rec print_op fmt op =
match op with
| Plus (op1, op2) ->
Format.fprintf fmt "(%a + %a)"
print_op op1 print_op op2
| Minus (op1, op2) ->
Format.fprintf fmt "(%a - %a)"
print_op op1 print_op op2
| Int i -> Format.fprintf fmt "%d" i
let () =
try
let c = open_in Sys.argv.(1) in
let lb = Lexing.from_channel c in
let p = Parser.prog Lexer.token lb in
close_in c;
List.iter (Format.eprintf "%a@." print_op) p
with _ -> Format.eprintf "Bad Boy !@."因此,在lexer中,当我看到一个#include filename时,我会立即调用由filename链接的文件上的解析器,并返回解析到前面的解析调用的Ast.prog。
我希望这一切对你来说都很清楚-)
第二次编辑
我不能让这些代码像这样,我编辑它以避免包含循环(在lexer.mll中) ;-)
https://stackoverflow.com/questions/36862521
复制相似问题