我有一个从分散的交易所收集密码价格的应用程序(不是商业的,主要目的是学习一些具有真实数据的数据库技术)。我让它和MongoDB和PostgresSQL一起工作。然而,在写作方面,我看到了巨大的差异,支持MongoDB。我说的是几个数量级。我知道MongoDB是一个NoSQL db,在这类东西上被宣传为更有效,但我想知道我的postgres实现是否遗漏了什么东西。下面我将描述如何实现逻辑,并尝试提供可比较的度量(尽可能多)。
为了与数据库对话,我使用psycopg3作为Postgres实现,使用pymongo作为MongoDB实现。
下面是我要写到数据库的数据结构:
class PricingInfo(NamedTuple):
blockchain_name: str
dex_name: str
pair_address: str
token0_symbol: str
token1_symbol: str
token0_address: str
token1_address: str
raw_reserve0: int
raw_reserve1: int
reserve0: float
reserve1: float
mid_price: float
datetime: pendulum.DateTime
inverted: bool = False
@classmethod
def from_dict(cls, doc: dict) -> PricingInfo:
doc.pop("_id", None)
return cls(**doc)
def to_dict(self) -> dict:
return self._asdict()这两种实现都是一样的。我每秒钟都要给数据库写几百封信。以下是我为postgres所做的工作:
def register_prices(self, prices: list[PricingInfo]) -> None:
query = """
insert into prices (
blockchain_name,
dex_name,
pair_address,
token0_address,
token1_address,
raw_reserve0,
raw_reserve1,
reserve0,
reserve1,
mid_price,
datetime
)
values (
%(blockchain_name)s,
%(dex_name)s,
%(pair_address)s,
%(token0_address)s,
%(token1_address)s,
%(raw_reserve0)s,
%(raw_reserve1)s,
%(reserve0)s,
%(reserve1)s,
%(mid_price)s,
%(datetime)s
)
"""
keys_to_keep = {
"blockchain_name",
"dex_name",
"pair_address",
"token0_address",
"token1_address",
"raw_reserve0",
"raw_reserve1",
"reserve0",
"reserve1",
"mid_price",
"datetime",
}
with psycopg.connect(self.db_uri) as conn:
with conn.cursor() as cur:
start = time.perf_counter()
if len(prices) == 1:
cur.execute(
query,
{
k: v
for k, v in prices[0].to_dict().items()
if k in keys_to_keep
},
)
elif len(prices) > 1:
cur.executemany(
query,
[
{k: v for k, v in p.to_dict().items() if k in keys_to_keep}
for p in prices
],
)
conn.commit()
delta = time.perf_counter() - start
if self.dex_name in {"pangolin", "trader_joe"}:
logger.warning(f"Inserting {len(prices)}")
logger.warning(f"Inserting prices took {delta} seconds")这是我的表格定义:
create table prices (
id serial primary key,
blockchain_name varchar(100) not null,
dex_name varchar(100) not null,
raw_reserve0 decimal not null,
raw_reserve1 decimal not null,
reserve0 decimal not null,
reserve1 decimal not null,
mid_price decimal not null,
datetime timestamp with time zone not null,
pair_address varchar(50) not null,
token0_address varchar(50) not null,
token1_address varchar(50) not null,
foreign key (blockchain_name, dex_name, pair_address) references pairs (blockchain_name, dex_name, pair_address),
foreign key (blockchain_name, dex_name, token0_address) references tokens (blockchain_name, dex_name, address),
foreign key (blockchain_name, dex_name, token1_address) references tokens (blockchain_name, dex_name, address)
);对于MongoDB:
def register_prices(self, prices: list[PricingInfo]) -> None:
start = time.perf_counter()
prices_table = self._db["prices"]
prices_table.insert_many(price.to_dict() for price in prices)
delta = time.perf_counter() - start
if self.dex_name in {"pangolin", "trader_joe"}:
logger.warning(f"Inserting {len(prices)}")
logger.warning(f"Inserting prices took {delta} seconds")该应用程序的运行方式与这两个数据库完全相同。用postgres编写时有一个很小的差别,在这种情况下,需要对数据块进行一些修改以适应模式(我正在进行一些规范化),但是由于每次编写时只有大约600个数据块需要修改,所以我不认为这是瓶颈。在这两种情况下,我都有8个并发写入DB的进程。
对于postgres,我得到了以下指标:
Inserting 587
Inserting prices took 1.175270811014343 seconds
Inserting 611
Inserting prices took 0.3126116280036513 seconds对于芒果:
Inserting 588
Inserting prices took 0.03671051503624767 seconds
Inserting 612
Inserting prices took 0.032324473024345934 seconds这些时间相对稳定,对于postgres来说是1ms和300‘s,对Mongo大约是30’s。非常奇怪的是,postgres对于大致相同数量的数据有两个不同的写入时间。尽管如此,即使在postgres的最佳情况下,它仍然比mongo慢10倍。
补充说明:
alter user postgres set synchronous_commit to off;,对时间没有明显的影响。我对Postgres做错什么了吗?
发布于 2022-02-20 18:18:36
对于将数据批量导入Postgres,最快的方法通常是使用Postgres命令复制。在psycopg3中,这是通过这里所示的过程获得的,psycopg3拷贝。请注意,COPY是全部或无,要么所有的数据都被导入,要么错误意味着所有数据都没有被导入。
https://stackoverflow.com/questions/71195034
复制相似问题