我一直在使用WAMP来摄取一些csv日志,并希望通过编写一些我需要执行的常规操作的脚本来实现更自动化的过程。
我使用PHPmyadmin中的直接CSV导入功能来处理CSV的方言和细节。
我用Python语言写了一个上传器,使用MySQLdb解析日志文件,但是由于日志包含一些无用的字符,我发现我需要在我可能不想做的地方做很多清理输入的工作……
例如,日志是来自目录扫描器的一些数据,我无法控制人们使用的文件夹命名约定。我有这个文件夹:
"C:\user\NZ Business Roundtable_Download_13Feb2013, 400 Access"并且,字符作为新的字段标记被读取(它毕竟是csv )。我真正想让它做的是忽略引号内的所有文本:- "......"
我在'字符中看到了类似的问题,我相信还会有更多的问题。
我找到了这个:- http://www.tech-recipes.com/rx/2345/import_csv_file_directly_into_mysql/,它展示了如何编写Python脚本,使其功能类似于PHPmyadmin加载例程。主要使用以下代码片段:
load data local infile 'uniq.csv' into table tblUniq fields terminated by ','
enclosed by '"'
lines terminated by '\n'
(uniqName, uniqCity, uniqComments)但是,我想要保护的表有一些深入的处理和更改,我已经编写了脚本,所以我想知道是否有一种方法可以“告诉”MySQL我想使用""作为文本封装。我想要保护的主要进程是,在创建新表时为其指定一个特定的表名,并在其余的进程中使用该名称。
我的表格生成器脚本示例:-
def make_table(self):
query ="DROP TABLE IF EXISTS `atl`.`{}`".format(self.table)
self.cur.execute(query)
query = "CREATE TABLE IF NOT EXISTS `atl`.`{}` (`PK` INT NOT NULL AUTO_INCREMENT PRIMARY KEY, `ID` varchar(10), `PARENT_ID` varchar(10), `URI` varchar(284), \
`FILE_PATH` varchar(230), `NAME` varchar(125), `METHOD` varchar(9), `STATUS` varchar(4), `SIZE` varchar(9), \
`TYPE` varchar(9), `EXT` varchar(11), `LAST_MODIFIED` varchar(19), `EXTENSION_MISMATCH` varchar(20), `MD5_HASH` varchar(32), \
`FORMAT_COUNT` varchar(2), `PUID` varchar(9), `MIME_TYPE` varchar(71), `FORMAT_NAME` varchar(59), `FORMAT_VERSION` varchar(7), \
`delete_flag` tinyint, `delete_reason` VARCHAR(80), `move_flag` TINYINT, `move_reason` VARCHAR(80), \
`ext_change_flag` TINYINT, `ext_change_reason` VARCHAR(80), `ext_change_value` VARCHAR(4), `fname_change_flag` TINYINT, `fname_change_reason` VARCHAR(80),\
`fname_change_value` VARCHAR(80))".format(self.table)
self.cur.execute(query)
self.mydb.commit()我的摄取脚本示例:
def ingest_row(self, row):
query = "insert"
# Prepare SQL query to INSERT a record into the database.
query = "INSERT INTO `atl`.`{0}` (`ID`, `PARENT_ID`, `URI`, `FILE_PATH`, `NAME`, `METHOD`, `STATUS`, `SIZE`, `TYPE`, `EXT`, \
`EXTENSION_MISMATCH`, `LAST_MODIFIED`, `MD5_HASH`, `FORMAT_COUNT`, `PUID`, `MIME_TYPE`, `FORMAT_NAME`, `FORMAT_VERSION`) \
VALUES ('{1}','{2}','{3}','{4}','{5}','{6}','{7}','{8}','{9}','{10}','{11}','{12}','{13}','{14}','{15}','{16}','{17}','{18}')".format(self.table, row[0], row[1], row[2], row[3], row[4], \
row[5], row[6], row[7], row[8], row[9], row[10], row[11], row[12], row[13], row[14], row[15], row[16], row[17])
try:
self.cur.execute(query)
self.mydb.commit()
except:
print query
quit()日志示例:-
"ID","PARENT_ID","URI","FILE_PATH","NAME","METHOD","STATUS","SIZE","TYPE","EXT","LAST_MODIFIED","EXTENSION_MISMATCH","MD5_HASH","FORMAT_COUNT","PUID","MIME_TYPE","FORMAT_NAME","FORMAT_VERSION"
"1","","file:/C:/jay/NZ%20Business%20Roundtable_Download_13Feb2013,%20400%20Access/","C:\jay\NZ Business Roundtable_Download_13Feb2013, 400 Access","NZ Business Roundtable_Download_13Feb2013, 400 Access",,"Done","","Folder",,"2013-06-28T11:31:36","false",,"",,"","",""
"2","1","file:/C:/jay/NZ%20Business%20Roundtable_Download_13Feb2013,%20400%20Access/1993/","C:\jay\NZ Business Roundtable_Download_13Feb2013, 400 Access\1993","1993",,"Done","","Folder",,"2013-06-28T11:31:36","false",,"",,"","",""发布于 2013-07-15 05:19:31
您应该使用SQL prepared statements。在format中混合使用数据和sql代码为SQL injection (几乎总是1st in the top 25 software flaw / security issue)打开了大门。
例如,以下是您的数据:
>>> log = """\
... "ID","PARENT_ID","URI","FILE_PATH","NAME","METHOD","STATUS","SIZE","TYPE","EXT","LAST_MODIFIED","EXTENSION_MISMATCH","MD5_HASH","FORMAT_COUNT","PUID","MIME_TYPE","FORMAT_NAME","FORMAT_VERSION"
... "1","","file:/C:/jay/NZ%20Business%20Roundtable_Download_13Feb2013,%20400%20Access/","C:\jay\NZ Business Roundtable_Download_13Feb2013, 400 Access","NZ Business Roundtable_Download_13Feb2013, 400 Access",,"Done","","Folder",,"2013-06-28T11:31:36","false",,"",,"","",""
... "2","1","file:/C:/jay/NZ%20Business%20Roundtable_Download_13Feb2013,%20400%20Access/1993/","C:\jay\NZ Business Roundtable_Download_13Feb2013, 400 Access\1993","1993",,"Done","","Folder",,"2013-06-28T11:31:36","false",,"",,"","",""
... """我没有这个文件,所以让我们假设我有:
>>> import StringIO
>>> logfile = StringIO.StringIO(log)然后让我们构建查询:
>>> import csv
>>> csvreader = csv.reader(logfile)
>>> fields = csvreader.next()
>>>
>>> table = 'mytable'
>>>
>>> fields_fmt = ', '.join([ '`%s`' % f for f in fields ])
>>> values_fmt = ', '.join(['%s'] * len(fields))
>>> query = "INSERT INTO `atl`.`{0}` ({1}) VALUES ({2})".format(
... # self.table, fields_fmt, values_fmt)
... table, fields_fmt, values_fmt)
>>> query
'INSERT INTO `atl`.`mytable` (`ID`, `PARENT_ID`, `URI`, `FILE_PATH`, `NAME`, `METHOD`, `STATUS`, `SIZE`, `TYPE`, `EXT`, `LAST_MODIFIED`, `EXTENSION_MISMATCH`, `MD5_HASH`, `FORMAT_COUNT`, `PUID`, `MIME_TYPE`, `FORMAT_NAME`, `FORMAT_VERSION`) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)'如果你给ingest_row按摩
def ingest_row(self, row):
try:
self.cur.execute(query, row)
self.mydb.commit()
except:
print query
quit()然后,您可以使用以下命令导入数据:
for row in csvreader:
ingest_row(row)发布于 2013-07-15 04:19:29
永远不要使用字符串格式化、连接等来构建查询!
dbapi要求所有驱动程序都支持参数化查询,应将参数提供给游标的execute方法。对于支持格式样式参数化的MySQLdb,它将如下所示:
cursor.execute('insert into sometable values (%s, %s)', ('spam', 'eggs'))库可以正确地转义所提供的参数,因此字符串中是否包含必须转义的字符并不重要。
在特殊情况下,唯一的例外是表名,因为转义会产生非法的sql。
https://stackoverflow.com/questions/17643188
复制相似问题