我想解析维基百科发电厂列表,其中包含{{Location }}模板。在我的例子中,我使用的是德文翻译,但这不应该改变基本的过程。
如何从这些代码中提取label=、lat=、lon=和region=参数?对于像BeautifulSoup这样的html解析器来说,这可能什么都不是,而是awk?
{{ Positionskarte+
| Tadschikistan
| maptype = relief
| width = 600
| float = right
| caption =
| places =
{{ Positionskarte~
| Tadschikistan
| label = <small>[[Talsperre Baipasa|Baipasa]]</small>
| marktarget =
| mark = Blue pog.svg
| position = right
| lat = 38.267584
| long = 69.123906
| region = TJ
| background = #FEFEE9
}}
{{ Positionskarte~
| Tadschikistan
| label = <small>[[Kraftwerk Duschanbe|Duschanbe]]</small>
| marktarget =
| mark = Red pog.svg
| position = left
| lat = 38.5565
| long = 68.776
| region = TJ
| background = #FEFEE9
}}
...
}}提前感谢!
发布于 2018-03-10 08:44:17
只需提取带有正则表达式的信息。例如,像这样(PHP)
$k = "{{ Positionskarte+
| Tadschikistan
| maptype = relief
| width = 600
| float = right
| caption =
| places =
{{ Positionskarte~
| Tadschikistan
| label = <small>[[Talsperre Baipasa|Baipasa]]</small>
| marktarget =
| mark = Blue pog.svg
| position = right
| lat = 38.267584
| long = 69.123906
| region = TJ
| background = #FEFEE9
}}
{{ Positionskarte~
| Tadschikistan
| label = <small>[[Kraftwerk Duschanbe|Duschanbe]]</small>
| marktarget =
| mark = Red pog.svg
| position = left
| lat = 38.5565
| long = 68.776
| region = TJ
| background = #FEFEE9
}}
}}";
$items = explode("Positionskarte~", $k);
$result = [];
foreach ($items as $item) {
$info = [];
$pattern1 = '/label\s+=\s+(.+)/';
preg_match($pattern1, $item, $matches);
if (!empty($matches)) {
$info['label'] = $matches[1];
}
$pattern2 = '/lat\s+=\s+(.+)/';
preg_match($pattern2, $item, $matches);
if (!empty($matches)) {
$info['lat'] = $matches[1];
}
$pattern3 = '/long\s+=\s+(.+)/';
preg_match($pattern3, $item, $matches);
if (!empty($matches)) {
$info['long'] = $matches[1];
}
$pattern4 = '/region\s+=\s+(.+)/';
preg_match($pattern4, $item, $matches);
if (!empty($matches)) {
$info['region'] = $matches[1];
}
if(!empty($info)) {
$result[] = $info;
}
}
var_dump($result);https://stackoverflow.com/questions/49200279
复制相似问题