首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Boost Beast按部分读取Conent

Boost Beast按部分读取Conent
EN

Stack Overflow用户
提问于 2022-06-06 14:45:12
回答 1查看 314关注 0票数 1

我试图理解如何通过在boost the中调用'read_some‘函数来限制从互联网读取的数据量。

起点是野兽文档中的增量读示例。从文档中我了解到,真正读取的数据存储在flat_buffer中。我做了以下实验:

  1. 将最大平面缓冲区的大小设置为1024
  2. 连接到一个相对较大的(几KB) html页面
  3. 一次呼叫read_some
  4. 关掉互联网
  5. 试着把这一页读到最后

由于缓冲区的容量不足以存储整个页面,所以我的实验应该失败--我不应该能够读取整个页面。尽管如此,它还是成功地结束了。这意味着存在存储读取数据的附加缓冲区。但是它是用来做什么的,我怎样才能限制它的尺寸?

UPD这里是我的源代码:

代码语言:javascript
复制
#include <boost/beast/core.hpp>
#include <boost/beast/http.hpp>
#include <boost/beast/version.hpp>
#include <boost/asio/strand.hpp>
#include <cstdlib>
#include <functional>
#include <iostream>
#include <memory>
#include <string>

namespace beast = boost::beast;         // from <boost/beast.hpp>
namespace http = beast::http;           // from <boost/beast/http.hpp>
namespace net = boost::asio;            // from <boost/asio.hpp>

using namespace http;

template<
        bool isRequest,
        class SyncReadStream,
        class DynamicBuffer>
void
read_and_print_body(
        std::ostream& os,
        SyncReadStream& stream,
        DynamicBuffer& buffer,
        boost::beast::error_code& ec ) {
    parser<isRequest, buffer_body> p;
    read_header( stream, buffer, p, ec );
    if ( ec )
        return;
    while ( !p.is_done()) {
        char buf[512];
        p.get().body().data = buf;
        p.get().body().size = sizeof( buf );
        read_some( stream, buffer, p, ec );
        if ( ec == error::need_buffer )
            ec = {};
        if ( ec )
            return;
        os.write( buf, sizeof( buf ) - p.get().body().size );
    }
}

int main(int argc, char** argv)
{
    try
    {
        // Check command line arguments.
        if(argc != 4 && argc != 5)
        {
            std::cerr <<
            "Usage: http-client-sync <host> <port> <target> [<HTTP version: 1.0 or 1.1(default)>]\n" <<
            "Example:\n" <<
            "    http-client-sync www.example.com 80 /\n" <<
            "    http-client-sync www.example.com 80 / 1.0\n";
            return EXIT_FAILURE;
        }
        auto const host = argv[1];
        auto const port = argv[2];
        auto const target = argv[3];
        int version = argc == 5 && !std::strcmp("1.0", argv[4]) ? 10 : 11;

        // The io_context is required for all I/O
        net::io_context ioc;

        // These objects perform our I/O
        boost::asio::ip::tcp::resolver resolver(ioc);
        beast::tcp_stream stream(ioc);

        // Look up the domain name
        auto const results = resolver.resolve(host, port);

        // Make the connection on the IP address we get from a lookup
        stream.connect(results);

        // Set up an HTTP GET request message
        http::request<http::string_body> req{http::verb::get, target, version};
        req.set(http::field::host, host);
        req.set(http::field::user_agent, BOOST_BEAST_VERSION_STRING);

        // Send the HTTP request to the remote host
        http::write(stream, req);

        // This buffer is used for reading and must be persisted
        beast::flat_buffer buffer;

        boost::beast::error_code ec;
        read_and_print_body<false>(std::cout, stream, buffer, ec);
    }
    catch(std::exception const& e)
    {
        std::cerr << "Error: " << e.what() << std::endl;
        return EXIT_FAILURE;
    }
    return EXIT_SUCCESS;
}
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-06-06 20:03:41

操作系统的TCP堆栈显然需要缓冲数据,因此很可能会在那里缓冲数据。

测试所需场景的方法:

住在Coliru

代码语言:javascript
复制
#include <boost/beast.hpp>
#include <iostream>
#include <thread>
namespace net = boost::asio;
namespace beast = boost::beast;
namespace http = beast::http;
using net::ip::tcp;

void server()
{
    net::io_context ioc;
    tcp::acceptor acc{ioc, {{}, 8989}};
    acc.listen();

    auto conn = acc.accept();

    http::request<http::string_body> msg(
        http::verb::get, "/", 11, std::string(20ull << 10, '*'));
    msg.prepare_payload();

    http::request_serializer<http::string_body> ser(msg);

    size_t hbytes = write_header(conn, ser);
    // size_t bbytes = write_some(conn, ser);
    size_t bbytes = write(conn, net::buffer(msg.body(), 1024));

    std::cout << "sent " << hbytes << " header and " << bbytes << "/"
              << msg.body().length() << " of body" << std::endl;
    // closes connection
}

namespace {
    template<bool isRequest, class SyncReadStream, class DynamicBuffer>
        auto
        read_and_print_body(
                std::ostream& /*os*/,
                SyncReadStream& stream,
                DynamicBuffer& buffer,
                boost::beast::error_code& ec)
        {
            struct { size_t hbytes = 0, bbytes = 0; } ret;

            http::parser<isRequest, http::buffer_body> p;
            //p.header_limit(8192);
            //p.body_limit(1024);

            ret.hbytes = read_header(stream, buffer, p, ec);
            if(ec)
                return ret;
            while(! p.is_done())
            {
                char buf[512];
                p.get().body().data = buf;
                p.get().body().size = sizeof(buf);
                ret.bbytes += http::read_some(stream, buffer, p, ec);
                if(ec == http::error::need_buffer)
                    ec = {};
                if(ec)
                    break;
                //os.write(buf, sizeof(buf) - p.get().body().size);
            }
            return ret;
        }
}

void client()
{
    net::io_context ioc;
    tcp::socket conn{ioc};
    conn.connect({{}, 8989});

    beast::error_code ec;
    beast::flat_buffer buf;
    auto [hbytes, bbytes] = read_and_print_body<true>(std::cout, conn, buf, ec);

    std::cout << "received hbytes:" << hbytes << " bbytes:" << bbytes
              << " (" << ec.message() << ")" << std::endl;
}

int main()
{
    std::jthread s(server);

    std::this_thread::sleep_for(std::chrono::seconds(1));
    std::jthread c(client);
}

打印

代码语言:javascript
复制
sent 41 header and 1024/20480 of body
received 1065 bytes of message (partial message)

旁注

你首先要问的是:

我正在努力理解如何限制从互联网上读取的数据量。

是建在野兽身上的

通过在boost野兽中调用'read_some‘函数。

为了限制读取的数据量,您不必在循环中使用read_some (根据定义,http::read已经做到了这一点)。

例如,使用上面的示例,如果您将20ull<<10 (20 KiB)替换为20ull<<20 (20 MiB),您将超过默认的大小限制:

代码语言:javascript
复制
http::request<http::string_body> msg(http::verb::get, "/", 11,
                                     std::string(20ull << 20, '*'));

打印住在Coliru

代码语言:javascript
复制
sent 44 header and 1024/20971520 of body
received hbytes:44 bbytes:0 (body limit exceeded)

您还可以设置自己的解析器限制:

代码语言:javascript
复制
http::parser<isRequest, http::buffer_body> p;
p.header_limit(8192);
p.body_limit(1024);

打印住在Coliru

发送了41个标头和1024/20480的正文接收到了h字节: 41 b字节:0(超过了身体限制)

正如您所看到的,它甚至知道在读取标头之后,使用来自标头的content-length信息来拒绝请求。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/72519383

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档