文章/答案/技术大牛

发布

社区首页 >问答首页 >从Access数据库读取大量数据

问从Access数据库读取大量数据
EN

Stack Overflow用户

提问于 2016-02-11 15:11:37

回答 1查看 1.9K关注 0票数 0

寻找关于如何解决我的具体问题的建议(由于在一个变量中存储了太多的信息而导致的MemoryError)，以及关于如何处理这个问题的不同方法的一般性建议。

我有一个可访问的1997年数据库，我正试图从中提取数据。由于安装了Access 2013，所以如果不下载Access 2003，就无法打开数据库。没问题--我可以用pyodbc和Jet来用蟒蛇进行提取。

我与数据库建立了一个pyodbc游标连接，并将此函数编写为首先查询所有表名，然后是与这些表关联的所有列：

def get_schema(cursor):
    """
    :param cursor: Cursor object to database
    :return: Dictionary with table name as key and list of columns as value
    """
    db_schema = dict()
    tbls = cursor.tables().fetchall()

    for tbl in tbls:
        if tbl not in db_schema:
            db_schema[tbl] = list()
        column_names = list()
        for col in cursor.columns(table=tbl):
            column_names.append(col[3])
        db_schema[tbl].append(tuple(column_names))

    return db_schema

我得到的变量如下所示：

{'Table 1': [('Column 1-1', 'Column 1-2', 'Column 1-3')],
 'Table 2': [('Column 2-1', 'Column 2-2')]}

然后将这个模式变量传递给另一个函数，将每个表中的数据转储到一个元组列表中：

def get_table_data(cursor, schema):

    for tbl, cols in schema.items():

        sql = "SELECT * from %s" % tbl  # Dump data
        cursor.execute(sql)  
        col_data = cursor.fetchall()

        for row in col_data:
            cols.append(row)

    return schema

但是，当我试图读取返回的变量时，会得到以下内容：

>>> schema2 = get_table_data(cursor, schema)
>>> schema2
Traceback (most recent call last):
  File "<input>", line 1, in <module>
MemoryError

TL;DR:当数据变得太大而无法读取时，是否有方法开始将数据存储在另一个变量中？还是增加内存分配的方法？最后，我想把它转储到csv文件或者类似的文件中--有更直接的方法吗？

python

ms-access

pyodbc

data-extraction

回答 1

Stack Overflow用户

回答已采纳

发布于 2016-02-11 15:32:58

您可能希望能够将数据流出数据库，而不是一次性加载数据。这样，您就可以直接将数据写回内存，而不必立即将太多的数据加载到内存中。

最好的方法是使用发电机。

因此，不必像当前那样修改模式变量，而是在从数据库表中读取各种行时生成它们：

def get_single_table_data(cursor, tbl):
    '''
    Generator to get all data from one table.
    Does this one row at a time, so we don't load
    too much data in at once
    '''
    sql = "SELECT * from %s" % tbl
    cursor.execute(sql)
    while True:
        row = cursor.fetchone()
        if row is None:
            break
        yield row

def print_all_table_data(cursor, schema):
    for tbl, cols in schema.items():
        print(cols)
        rows = get_single_table_data(cursor, tbl)
        for row in rows:
            print(row)

这显然只是一个例子，但它(理论上)会打印出所有表中的每一行--一次内存中不会有超过一行的数据。

票数 4

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/35343003

复制

相似问题

问从Access数据库读取大量数据
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从Access数据库读取大量数据EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从Access数据库读取大量数据
EN