文章/答案/技术大牛

发布

社区首页 >问答首页 >Tcl regexp此字符串的部分内容

问Tcl regexp此字符串的部分内容
EN

Stack Overflow用户

提问于 2013-06-23 08:09:25

回答 1查看 259关注 0票数 3

[ 22.06.2013 23:23:41 UTC ]--[&nbsp;&nbsp;&nbsp;PRE&nbsp;&nbsp;&nbsp;]--[&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="?section=0DAY" title="SHOW ONLY 0DAY">0DAY</a>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;]--[ <a href="?search=Aminsamples+Sexy+Melody+Vol+2+MIDI+6581">Aminsamples.Sexy.Melody.Vol.2.MIDI-6581</a> ]--<b>[ 2.30MB ]</b>--<b>[ 1F ]</b>--<span style="font-weight: bold;">[ <a href="download/Aminsamples.Sexy.Melody.Vol.2.MIDI-6581.rar" title="Aminsamples.Sexy.Melody.Vol.2.MIDI-6581.rar">DOWNLOAD</a> ]

我想要抓取如下所示的数据。

[ 22.06.2013 23:23:41 UTC ]--[   PRE   ]--[      0DAY       ]--[ Aminsamples.Sexy.Melody.Vol.2.MIDI-6581 ]--[ 2.30MB ]--[ 1F ]--[ DOWNLOAD ]

但我不太确定我是如何做到这一点的，我能抓取的只有Aminsamples.Sexy.Melody.Vol.2.MIDI-6581.rar。

我想在TCL中做到这一点

这是我目前得到的。

   catch {set http [::http::geturl http://www.prelist.ws -timeout 15000]} error
     if {[string match "*error*" $error]} { puts "connect error!" ; return 0 }
      if {[string match "*timeout*" $error]} { puts "timeout!"; return 0 }
       set html [::http::data [split $http "\n"]]
         regsub -all "&amp;" $html {\&} html
         regsub -all "&times;" $html {*} html
         regsub -all "&nbsp;" $html { } html
         regsub -all -nocase "&#215;" $html "x" html
         regsub -all -nocase "&lt;" $html "<" html
         regsub -all -nocase "&gt;" $html ">" html
         regsub -all ">" $html "" html
            regsub -all "<tt" $html "" html
    foreach line $html {
    if {[string match "*SHOW*" $line]} { continue }
    if {[string match "*title*" $line]} {
    regexp -nocase -- {title="(.*?)>} $line -> all line
    regsub -all -nocase "title=" $line {} line
    regsub -all -nocase "DOWNLOAD" $line {} line
    regsub -all -nocase "\"</a" $line {} line
    regsub -all -nocase "\"free" $line {} line
    regsub -all -nocase "\"" $line {} line
    regsub -all -nocase "\\\[" $line {} line
    regsub -all -nocase "<title" $line {} line
    regsub -all -nocase "\\\]</title" $line {} line
    puts "$line"
}
}

regex

tcl

回答 1

Stack Overflow用户

回答已采纳

发布于 2013-06-23 10:14:04

这可以使用xpath很容易地完成：

#! /usr/bin/tclsh

package require tdom

set fp [open "input.txt" r]
set html [read $fp]
close $fp

set doc [ dom parse -html $html ] 
set root [$doc documentElement]
set itemNodes [$doc selectNodes {//div[@id="list"]/tt/small}]

foreach itemNode $itemNodes {
    puts "[$itemNode asText]"
}

请注意，您可以使用以下模式拆分每个字段：

foreach itemNode $itemNodes {
    set line "[string trim [$itemNode asText] \[\]\ ]"
    set fields [regexp -inline -all -- {[^[\s][^][]*?\S(?=\s*(?:]|$))} $line]
    puts [lindex $fields 2]
}

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/17256549

复制

相似问题

问Tcl regexp此字符串的部分内容
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Tcl regexp此字符串的部分内容EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Tcl regexp此字符串的部分内容
EN