我确实为亚马逊产品的标题做了刮擦,但是Amazon抓住了我的刮刀。我试了10次-去运行main.go(8次抓住我-2次我刮了的产品标题)
我研究过这个问题,但我没有找到任何解决方案(只有python),我有什么解决方案吗?
package main
import (
"fmt"
"strings"0
"github.com/gocolly/colly"
)
func main() {
// Create a Collector specifically for Shopify
c := colly.NewCollector(
colly.AllowedDomains("www.amazon.com", "amazon.com"),
)
c.OnHTML("div", func(h *colly.HTMLElement) {
capctha := h.Text
title := h.ChildText("span#productTitle")
fmt.Println(strings.TrimSpace(title))
fmt.Println(strings.TrimSpace(capctha))
})
// Start the collector
c.Visit("https://www.amazon.com/Bluetooth-Over-Ear-Headphones-Foldable-Prolonged/dp/B07K5214NZ")
}输出:
输入你在下面看到的字符--对不起,我们只需要确保你不是机器人。要获得最佳效果,请确保浏览器接受cookies。
发布于 2021-06-25 13:43:35
如果您不介意使用不同的包,那么我编写了一个用于搜索HTML的包(本质上是围绕github.com/tdewolff/parse的薄包装):
package main
import (
"github.com/89z/parse/html"
"net/http"
"os"
)
func main() {
req, err := http.NewRequest(
"GET", "https://www.amazon.com/dp/B07K5214NZ", nil,
)
req.Header = http.Header{
"User-Agent": {"Mozilla"},
}
res, err := new(http.Transport).RoundTrip(req)
if err != nil {
panic(err)
}
defer res.Body.Close()
lex := html.NewLexer(res.Body)
lex.NextAttr("id", "productTitle")
os.Stdout.Write(lex.Bytes())
}结果:
Bluetooth Headphones Over-Ear, Zihnic Foldable Wireless and Wired Stereo
Headset Micro SD/TF, FM for Cell Phone,PC,Soft Earmuffs &Light Weight for
Prolonged Waring(Rose Gold)https://stackoverflow.com/questions/68131475
复制相似问题