我正在用pyparsing解析一个文件。它工作得很好,但我认为可以通过在"parse_file = pp.OneOrMore(dbuPerMicron | diearea | components) + pp.StringEnd()“行使用OnlyOnce类而不是OneOrMore来缩短处理时间。在def文件的components部分之后,还有一些对我来说无用的部分,解析器需要很长时间才能完成这些行。通过在pase_file中使用OnlyOnce,它提供了:"AttributeError:'NoneType‘对象没有’searchString‘属性“。
我很感谢你的建议。
def parse_def(self):
ifile = open("path_to.def",'r')
def_string = ifile.read()
ifile.close()
EOL = pp.LineEnd().suppress()
linebreak = pp.Suppress(";" + pp.LineEnd())
identifier = pp.Word(pp.alphanums+'_!<>/')
number = pp.Word(pp.nums + ".")
word = pp.Word(pp.alphas)
# UNITS DISTANCE MICRONS
dbuPerMicron_id = pp.Keyword('UNITS DISTANCE MICRONS')
dbuPerMicron = pp.Group(dbuPerMicron_id + number('UnitsPerMicron')).setResultsName('dbuPerMicron')
# DIEAREA
diearea_id = pp.Keyword('DIEAREA')
diearea = pp.Group(pp.Suppress(diearea_id) + pp.OneOrMore(pp.Suppress('(') + number + number + pp.Suppress(')')) + pp.Suppress(linebreak)).setResultsName('DIEAREA')
# COMPONENTS
components_id = pp.Keyword('COMPONENTS')
end_components = pp.Keyword("END COMPONENTS").suppress()
begin_comp = pp.Keyword('-')
ws_comp = pp.Keyword('+') # parameter division in componentes
comment = pp.Keyword('#')
comp_name = identifier
compName = (comp_name('comp_name') + identifier('cell')).setResultsName('compName')
EEQMASTER = (pp.Suppress(ws_comp) + identifier('EEQMASTER') + identifier('macroName')).setResultsName('EEQMASTER')
SOURCE = (pp.Suppress(ws_comp) + identifier('SOURCE') + identifier('source_type')).setResultsName('SOURCE')
PLACEMENT_ids = pp.Keyword('FIXED') | pp.Keyword('COVER') | pp.Keyword('PLACED') | pp.Keyword('UNPLACED')
PLACEMENT_coord = pp.Suppress('(') + number('placement_x') + number('placement_y') + pp.Suppress(')')
PLACEMENT_orient = word('orientation')
PLACEMENT = PLACEMENT_ids + pp.ZeroOrMore(PLACEMENT_coord + PLACEMENT_orient)
PLACEMENT = (pp.Suppress(ws_comp) + PLACEMENT).setResultsName('PLACEMENT')
HALO = (pp.Suppress(ws_comp) + pp.Keyword('HALO') + pp.ZeroOrMore(pp.Keyword('SOFT')) + number('haloL') + number('haloB') + number('haloR') + number('haloT')).setResultsName('HALO')
ROUTEHALO = (pp.Suppress(ws_comp) + pp.Keyword('ROUTEHALO') + number('rhaloDist') + identifier('rhaloMinLayer') + identifier('rhaloMaxLayer')).setResultsName('ROUTEHALO')
WEIGHT = (pp.Suppress(ws_comp) + pp.Keyword('WEIGHT') + number('weight')).setResultsName('WEIGHT')
REGION = (pp.Suppress(ws_comp) + pp.Keyword('REGION') + identifier('region')).setResultsName('REGION')
PROPERTY = (pp.Suppress(ws_comp) + pp.Keyword('PROPERTY') + identifier('propName') + identifier('propVal')).setResultsName('PROPERTY')
subcomponent = pp.Group(pp.Suppress(begin_comp)
+ pp.OneOrMore(compName)
+ pp.ZeroOrMore(EEQMASTER)
+ pp.ZeroOrMore(SOURCE)
+ pp.OneOrMore(PLACEMENT)
+ pp.ZeroOrMore(HALO)
+ pp.ZeroOrMore(ROUTEHALO)
+ pp.ZeroOrMore(WEIGHT)
+ pp.ZeroOrMore(REGION)
+ pp.ZeroOrMore(PROPERTY)
+ pp.Suppress(linebreak)).setResultsName('subcomponents', listAllMatches=True)
components = pp.Group(pp.Suppress(components_id) + number('numComps') + pp.Suppress(linebreak)
+ pp.OneOrMore(subcomponent )
+ pp.Suppress(end_components)).setResultsName('components')
dbuPerMicron.setParseAction(self.handle_dbuPerMicron)
diearea.setParseAction(self.handle_diearea)
components.setParseAction(self.handle_components)
parse_file = pp.OneOrMore(dbuPerMicron | diearea | components) + pp.StringEnd()
# parse_file = pp.OnlyOnce(dbuPerMicron | diearea | components) + pp.StringEnd() # It doesn't work
return parse_file.searchString(def_string)def文件语法示例:
Grammar:
[UNITS DISTANCE MICRONS dbuPerMicron;]
[DIEAREA ptpt [pt] ... ;]
COMPONENTS numComps ;
[– compName modelName
[+ EEQMASTER macroName]
[+ SOURCE {NETLIST | DIST | USER | TIMING}]
[+ {FIXED pt orient | COVER pt orient | PLACED pt orient | UNPLACED} ]
[+ HALO [SOFT] leftbottomrighttop]
[+ ROUTEHALO haloDistminLayermaxLayer]
[+ WEIGHT weight]
[+ REGION regionName]
[+ PROPERTY {propName propVal} ...]...;] ...
END COMPONENTSdef文件示例:
VERSION 5.7 ;
DIVIDERCHAR "/" ;
BUSBITCHARS "[]" ;
DESIGN c1908 ;
UNITS DISTANCE MICRONS 2000 ;
PROPERTYDEFINITIONS
COMPONENTPIN designRuleWidth REAL ;
DESIGN FE_CORE_BOX_LL_X REAL 0.000 ;
DESIGN FE_CORE_BOX_UR_X REAL 23.425 ;
DESIGN FE_CORE_BOX_LL_Y REAL 0.000 ;
DESIGN FE_CORE_BOX_UR_Y REAL 19.600 ;
END PROPERTYDEFINITIONS
DIEAREA ( 0 0 ) ( 46850 39200 ) ;
COMPONENTS 248 ;
- U293 NOR2_X1 + PLACED ( 6080 0 ) N
;
- U294 FA_X1 + PLACED ( 0 0 ) N
;
- U295 NAND2_X1 + PLACED ( 4560 5600 ) N
;
- U296 FA_X1 + PLACED ( 20520 2800 ) N
;
- U297 NAND2_X1 + PLACED ( 26600 2800 ) N
;
- U298 NAND2_X1 + PLACED ( 27740 2800 ) N
;
- U299 NAND2_X1 + PLACED ( 22800 8400 ) N
;
- U300 NOR2_X1 + PLACED ( 25460 5600 ) N
;
- U301 HA_X1 + PLACED ( 33440 5600 ) N
;
- U540 INV_X1 + PLACED ( 760 28000 ) N
;
END COMPONENTS
PINS 58 ;
- N1 + NET N1 + DIRECTION INPUT + USE SIGNAL
+ LAYER metal3 ( -70 0 ) ( 70 140 )
And more thousands of lines that are useless to me.发布于 2019-10-25 10:57:53
如果在自己的方法中创建解析器,我尝试只执行解析器定义并返回它,并让调用者负责将解析器应用于输入字符串。这简化了对parser()方法的调用接口,并使隔离测试变得更加容易。
我将您的parse()方法更改为parser(),并将其包装在一个虚拟的X类中,但保留了几乎原封不动的内容,只是将最后的解析语句更改为:
return dbuPerMicron | diearea | components然后,我使用以下代码对任意长度的样本(您发布的样本加上10,000,000个随机字符,包括空格和换行符)运行解析器:
parser = X().parser()
# accumulate results using scanString
results = []
for t, s, e in parser.scanString(sample):
results.append(t)
# BUG! (sorry)
# if len(t) == 3:
if len(results) == 3:
break
# use builtin sum() function to merge all the parsed results into one
results = sum(results)
# or here is the same code as the above loop using islice to do the
# range checking for us
from itertools import islice
results = sum(t for t, s, e in islice(parser.scanString(sample), 0, 3))
# what did we get?
print(results.dump())创建1000万字符位是最耗时的任务,但在解析完3个相关段后,解析能够停止。我使用scanString写出了显式循环,但是使用itertools.islice,您可以将其压缩为一行。
results.dump()的输出如下所示(为简洁起见,删除了较长的列表行):
[['UNITS DISTANCE MICRONS', '2000'], ['0', '0', ...
- DIEAREA: ['0', '0', '46850', '39200']
- components: ['248', ['U293', 'NOR2_X1', 'PLACED', '6080', ...
- numComps: '248'
- subcomponents: [['U293', 'NOR2_X1', 'PLACED', '6080', ...
[0]:
['U293', 'NOR2_X1', 'PLACED', '6080', '0', 'N']
- PLACEMENT: ['PLACED', '6080', '0', 'N']
- cell: 'NOR2_X1'
- compName: ['U293', 'NOR2_X1']
- comp_name: 'U293'
- orientation: 'N'
- placement_x: '6080'
- placement_y: '0'
[1]:
['U294', 'FA_X1', 'PLACED', '0', '0', 'N']
- PLACEMENT: ['PLACED', '0', '0', 'N']
- cell: 'FA_X1'
- compName: ['U294', 'FA_X1']
- comp_name: 'U294'
- orientation: 'N'
- placement_x: '0'
- placement_y: '0'
[2]:
['U295', 'NAND2_X1', 'PLACED', '4560', '5600', 'N']
- PLACEMENT: ['PLACED', '4560', '5600', 'N']
- cell: 'NAND2_X1'
- compName: ['U295', 'NAND2_X1']
- comp_name: 'U295'
- orientation: 'N'
- placement_x: '4560'
- placement_y: '5600'
[3]:
['U296', 'FA_X1', 'PLACED', '20520', '2800', 'N']
- PLACEMENT: ['PLACED', '20520', '2800', 'N']
- cell: 'FA_X1'
- compName: ['U296', 'FA_X1']
- comp_name: 'U296'
- orientation: 'N'
- placement_x: '20520'
- placement_y: '2800'
[4]:
['U297', 'NAND2_X1', 'PLACED', '26600', '2800', 'N']
- PLACEMENT: ['PLACED', '26600', '2800', 'N']
- cell: 'NAND2_X1'
- compName: ['U297', 'NAND2_X1']
- comp_name: 'U297'
- orientation: 'N'
- placement_x: '26600'
- placement_y: '2800'
[5]:
['U298', 'NAND2_X1', 'PLACED', '27740', '2800', 'N']
- PLACEMENT: ['PLACED', '27740', '2800', 'N']
- cell: 'NAND2_X1'
- compName: ['U298', 'NAND2_X1']
- comp_name: 'U298'
- orientation: 'N'
- placement_x: '27740'
- placement_y: '2800'
[6]:
['U299', 'NAND2_X1', 'PLACED', '22800', '8400', 'N']
- PLACEMENT: ['PLACED', '22800', '8400', 'N']
- cell: 'NAND2_X1'
- compName: ['U299', 'NAND2_X1']
- comp_name: 'U299'
- orientation: 'N'
- placement_x: '22800'
- placement_y: '8400'
[7]:
['U300', 'NOR2_X1', 'PLACED', '25460', '5600', 'N']
- PLACEMENT: ['PLACED', '25460', '5600', 'N']
- cell: 'NOR2_X1'
- compName: ['U300', 'NOR2_X1']
- comp_name: 'U300'
- orientation: 'N'
- placement_x: '25460'
- placement_y: '5600'
[8]:
['U301', 'HA_X1', 'PLACED', '33440', '5600', 'N']
- PLACEMENT: ['PLACED', '33440', '5600', 'N']
- cell: 'HA_X1'
- compName: ['U301', 'HA_X1']
- comp_name: 'U301'
- orientation: 'N'
- placement_x: '33440'
- placement_y: '5600'
[9]:
['U540', 'INV_X1', 'PLACED', '760', '28000', 'N']
- PLACEMENT: ['PLACED', '760', '28000', 'N']
- cell: 'INV_X1'
- compName: ['U540', 'INV_X1']
- comp_name: 'U540'
- orientation: 'N'
- placement_x: '760'
- placement_y: '28000'
- dbuPerMicron: ['UNITS DISTANCE MICRONS', '2000']
- UnitsPerMicron: '2000'对于您知道将是整数或实数的项,您可以使用pyparsing.pyparsing_common中定义的integer和real的表达式(或者只使用number,它将匹配所有的数字形式);这些表达式将使用快速的正则表达式进行解析,并在解析时将结果转换为正确的Python类型,这样您以后就不必进行这种转换。
https://stackoverflow.com/questions/58485435
复制相似问题