首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Colly最大深度和编码/json-空

Colly最大深度和编码/json-空
EN

Stack Overflow用户
提问于 2021-02-11 01:48:00
回答 1查看 502关注 0票数 0

我已经经历了围棋之旅,现在我正在复习一些柯利教程。我理解最大深度,并一直试图在这样的围棋程序中实现它:

代码语言:javascript
复制
package main

import (
    "encoding/json"
    "log"
    "net/http"

    "github.com/gocolly/colly"
)

func ping(w http.ResponseWriter, r *http.Request) {
    log.Println("Ping")
    w.Write([]byte("ping"))
}

func getData(w http.ResponseWriter, r *http.Request) {
    //Verify the param "URL" exists
    URL := r.URL.Query().Get("url")
    if URL == "" {
        log.Println("missing URL argument")
        return
    }
    log.Println("visiting", URL)

    //Create a new collector which will be in charge of collect the data from HTML
    c := colly.NewCollector(
        // MaxDepth is 2, so only the links on the scraped page
        // and links on those pages are visited
        colly.MaxDepth(2),
        colly.Async(true),
    )

    // Limit the maximum parallelism to 2
    // This is necessary if the goroutines are dynamically
    // created to control the limit of simultaneous requests.
    //
    // Parallelism can be controlled also by spawning fixed
    // number of go routines.
    c.Limit(&colly.LimitRule{DomainGlob: "*", Parallelism: 2})

    //Slices to store the data
    var response []string

    //onHTML function allows the collector to use a callback function when the specific HTML tag is reached
    //in this case whenever our collector finds an
    //anchor tag with href it will call the anonymous function
    // specified below which will get the info from the href and append it to our slice
    c.OnHTML("a[href]", func(e *colly.HTMLElement) {
        link := e.Request.AbsoluteURL(e.Attr("href"))
        if link != "" {
            response = append(response, link)
        }
    })

    //Command to visit the website
    c.Visit(URL)

    // parse our response slice into JSON format
    b, err := json.Marshal(response)
    if err != nil {
        log.Println("failed to serialize response:", err)
        return
    }
    // Add some header and write the body for our endpoint
    w.Header().Add("Content-Type", "application/json")
    w.Write(b)
}

func main() {
    addr := ":7171"

    http.HandleFunc("/links", getData)
    http.HandleFunc("/ping", ping)

    log.Println("listening on", addr)
    log.Fatal(http.ListenAndServe(addr, nil))
}

这样做时,响应为空。去掉MaxDepth和异步行将产生预期的响应(只有顶级链接)。

任何帮助都是非常感谢的!

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-02-11 02:50:39

在异步模式下运行时,c.Visit将在实际发出请求之前返回(请参阅这里);正确的过程将在并行演示中演示。就你而言,这意味着:

代码语言:javascript
复制
c.Visit(URL)
c.Wait()

在发出一个请求时,使用异步并不是很有用。请查看reddit示例,看看如何在一个操作中使用它访问多个URL。

注意:您确实应该检查这些函数返回的错误值,添加错误处理程序也是很好的做法。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/66147675

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档