文章/答案/技术大牛

发布

社区首页 >问答首页 >Pymongo和pyscopg3:写的数量级差吗？

问Pymongo和pyscopg3:写的数量级差吗？
EN

Stack Overflow用户

提问于 2022-02-20 13:47:53

回答 1查看 59关注 0票数 0

我有一个从分散的交易所收集密码价格的应用程序(不是商业的，主要目的是学习一些具有真实数据的数据库技术)。我让它和MongoDB和PostgresSQL一起工作。然而，在写作方面，我看到了巨大的差异，支持MongoDB。我说的是几个数量级。我知道MongoDB是一个NoSQL db，在这类东西上被宣传为更有效，但我想知道我的postgres实现是否遗漏了什么东西。下面我将描述如何实现逻辑，并尝试提供可比较的度量(尽可能多)。

为了与数据库对话，我使用psycopg3作为Postgres实现，使用pymongo作为MongoDB实现。

下面是我要写到数据库的数据结构：

class PricingInfo(NamedTuple):
    blockchain_name: str
    dex_name: str
    pair_address: str
    token0_symbol: str
    token1_symbol: str
    token0_address: str
    token1_address: str
    raw_reserve0: int
    raw_reserve1: int
    reserve0: float
    reserve1: float
    mid_price: float
    datetime: pendulum.DateTime
    inverted: bool = False

    @classmethod
    def from_dict(cls, doc: dict) -> PricingInfo:
        doc.pop("_id", None)
        return cls(**doc)

    def to_dict(self) -> dict:
        return self._asdict()

这两种实现都是一样的。我每秒钟都要给数据库写几百封信。以下是我为postgres所做的工作：

def register_prices(self, prices: list[PricingInfo]) -> None:
    query = """
    insert into prices (
        blockchain_name,
        dex_name,
        pair_address,
        token0_address,
        token1_address,
        raw_reserve0,
        raw_reserve1,
        reserve0,
        reserve1,
        mid_price,
        datetime
    )
    values (
        %(blockchain_name)s,
        %(dex_name)s,
        %(pair_address)s,
        %(token0_address)s,
        %(token1_address)s,
        %(raw_reserve0)s,
        %(raw_reserve1)s,
        %(reserve0)s,
        %(reserve1)s,
        %(mid_price)s,
        %(datetime)s
    )
    """

    keys_to_keep = {
        "blockchain_name",
        "dex_name",
        "pair_address",
        "token0_address",
        "token1_address",
        "raw_reserve0",
        "raw_reserve1",
        "reserve0",
        "reserve1",
        "mid_price",
        "datetime",
    }

    with psycopg.connect(self.db_uri) as conn:
        with conn.cursor() as cur:
            start = time.perf_counter()
            if len(prices) == 1:
                cur.execute(
                    query,
                    {
                        k: v
                        for k, v in prices[0].to_dict().items()
                        if k in keys_to_keep
                    },
                )
            elif len(prices) > 1:
                cur.executemany(
                    query,
                    [
                        {k: v for k, v in p.to_dict().items() if k in keys_to_keep}
                        for p in prices
                    ],
                )

        conn.commit()
    delta = time.perf_counter() - start

    if self.dex_name in {"pangolin", "trader_joe"}:
        logger.warning(f"Inserting {len(prices)}")
        logger.warning(f"Inserting prices took {delta} seconds")

这是我的表格定义：

create table prices (
    id serial primary key,
    blockchain_name varchar(100) not null,
    dex_name varchar(100) not null,
    raw_reserve0 decimal not null,
    raw_reserve1 decimal not null,
    reserve0 decimal not null,
    reserve1 decimal not null,
    mid_price decimal not null,
    datetime timestamp with time zone not null,
    pair_address varchar(50) not null,
    token0_address varchar(50) not null,
    token1_address varchar(50) not null,
    foreign key (blockchain_name, dex_name, pair_address) references pairs (blockchain_name, dex_name, pair_address),
    foreign key (blockchain_name, dex_name, token0_address) references tokens (blockchain_name, dex_name, address),
    foreign key (blockchain_name, dex_name, token1_address) references tokens (blockchain_name, dex_name, address)
);

对于MongoDB：

def register_prices(self, prices: list[PricingInfo]) -> None:

    start = time.perf_counter()

    prices_table = self._db["prices"]
    prices_table.insert_many(price.to_dict() for price in prices)

    delta = time.perf_counter() - start

    if self.dex_name in {"pangolin", "trader_joe"}:
        logger.warning(f"Inserting {len(prices)}")
        logger.warning(f"Inserting prices took {delta} seconds")

该应用程序的运行方式与这两个数据库完全相同。用postgres编写时有一个很小的差别，在这种情况下，需要对数据块进行一些修改以适应模式(我正在进行一些规范化)，但是由于每次编写时只有大约600个数据块需要修改，所以我不认为这是瓶颈。在这两种情况下，我都有8个并发写入DB的进程。

对于postgres，我得到了以下指标：

Inserting 587
Inserting prices took 1.175270811014343 seconds
Inserting 611
Inserting prices took 0.3126116280036513 seconds

对于芒果：

Inserting 588
Inserting prices took 0.03671051503624767 seconds
Inserting 612
Inserting prices took 0.032324473024345934 seconds

这些时间相对稳定，对于postgres来说是1ms和300‘s，对Mongo大约是30’s。非常奇怪的是，postgres对于大致相同数量的数据有两个不同的写入时间。尽管如此，即使在postgres的最佳情况下，它仍然比mongo慢10倍。

补充说明：

对于postgres，我有一些外键限制。我试着消除这些限制，但是它并没有对时间产生明显的影响。
我尝试了postgres的alter user postgres set synchronous_commit to off;，对时间没有明显的影响。
我运行postgres:14和mongo:4，通过Docker

我对Postgres做错什么了吗？

python

mongodb

postgresql

psycopg3

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-02-20 18:18:36

对于将数据批量导入Postgres，最快的方法通常是使用Postgres命令复制。在psycopg3中，这是通过这里所示的过程获得的，psycopg3拷贝。请注意，COPY是全部或无，要么所有的数据都被导入，要么错误意味着所有数据都没有被导入。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/71195034

复制

相似问题

问Pymongo和pyscopg3:写的数量级差吗？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Pymongo和pyscopg3:写的数量级差吗？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Pymongo和pyscopg3:写的数量级差吗？
EN