我正在使用python与vertica进行通信。有没有一种优雅的方法来创建一个带有pandas数据帧的新的vertica表?我使用的是vertica-python 0.6.14。我所知道的唯一方法是使用for循环将数据帧的每一行写入vertica。此外,在vertica中创建表是非常痛苦的,因为您需要知道每列的数据类型。在提交for循环之前,我想知道是否有一个简单的解决方案可以解决所有问题。
我尝试使用以下代码:
from sqlalchemy import create_engine
engine = create_engine('vertica+vertica_python://user:pass@host:5433/MYDB')
df.to_sql('mytable', engine)它创建了一个表,但没有填充它,并且我收到了一条错误消息。
我还尝试通过DSN安装驱动程序和配置DSN。然后我使用了这行代码:
engine = create_engine('vertica+pyodbc://username:password@mydsn')我可以和vertica沟通,但是熊猫-> vertica仍然不能工作。有什么建议吗?
谢谢
发布于 2019-09-05 12:03:36
您可以使用copy语句将数据从Pandas data frame插入到Vertica:
import vertica_python
conn_info = {'host': host,
'port': port,
'user': user,
'password': password,
'database': database,
# 10 minutes timeout on queries
'read_timeout': 600,
# default throw error on invalid UTF-8 results
'unicode_error': 'strict',
# SSL is disabled by default
'ssl': False,
'connection_timeout': 30
# connection timeout is not enabled by default
}
df_csv = df.to_csv(sep=',',index=False)
cols = tuple(df.columns)
with vertica_python.connect(**conn_info) as connection:
cur = connection.cursor('dict')
cur.copy("""COPY yourtable {}
from stdin DELIMITER ',' """.format(cols),
df_csv)发布于 2018-04-17 21:35:41
为了让它正常工作,我不得不添加和安装Condaforge的vertica-python模块。
其中as Redsift、MySQL和MSSQL使用简单的连接字符串。
def _get_generic_connection(self):
"""
Creates a connection that can be used directly by the sqlalchemy library.
Returns: A sqlalchemy database connection
"""
return create_engine(<<your connection string>>)您必须执行类似以下的操作
from sqlalchemy import create_engine
import vertica_python
def _get_vertica_connection(self):
"""
Creates a connection appropriate for HP Vertica based on the vertica_python library.
Returns: A vertica_python database connection
"""
conn_info = {'host': <<your host>>,
'port': << Vertica port>>,
'user': << appropriate user >>,
'password': << appropriate password >>,
'database': << your db name >>,
# 10 minutes timeout on queries
'read_timeout': 600,
# default throw error on invalid UTF-8 results
'unicode_error': 'strict',
# SSL is disabled by default
'ssl': False,
'connection_timeout': 300
# connection timeout is not enabled by default
}
return vertica_python.connect(**conn_info)我有一个类,它具有这两个函数以及
def __init__(self, app_config):
"""
Args:
app_config( ApplicationConfiguration): Object to handle the configuration of the system
"""
self._app_config = app_config
self._platform = app_config.db_server.db_platform
self._connection_function_dict = {
"vertica": self._get_vertica_connection,
"redshift": self._get_generic_connection,
"mssql": self._get_generic_connection
}
def get_db_connection(self):
"""
Acts as the public method to retrieve a database connection for use by Pandas.
Returns: A database connection of a type dictated by the database platform
"""
db_connection = self._connection_function_dict[self._platform]()
if db_connection is None:
raise NameError("Database platform \"{}\" not known".format(self._platform))
return db_connection这足以产生到我们使用的任何平台的工作连接,SQLAlchemy对这些连接感到满意。这意味着您可以执行以下操作
pandas.read_sql(<<your SQL query>>, << your connection>>)发布于 2020-10-09 18:33:07
我有时做一个df.to_csv,然后
copy_cmd = /opt/vertica/bin/vsql -U <user> -d <db> --password='password' -h <host> -c "COPY schema.table(col1,col2) FROM LOCAL 'path/to/your.csv' PARSER fcsvparser(header='true')"
os.system(copy_cmd)这对我来说很有效,如果你想捕获异常或被拒绝的行,你也可以像这样使用copy cmd
copy_cmd = /opt/vertica/bin/vsql -U <user> -d <db> --password='password' -h <host>
-c "COPY schema.table(col1,col2) FROM LOCAL 'path/to/your.csv' PARSER
fcsvparser(header='true') DIRECT REJECTMAX 1000 EXCEPTIONS 'path/to/exception.csv'
REJECTED DATA 'path/to/rejected.csv'" https://stackoverflow.com/questions/43885486
复制相似问题