我有一个包含9994条记录和21个字段的CSV文件,其中4个字段是数字,存储在Google Drive中。
我在Google Colab VM中安装了MySQL。安装程序似乎出现了一些小错误,但MySQL启动且SQL命令工作正常
Error: Unable to shut down server with process id 1276
dpkg: error processing package mysql-server-5.7 (--configure):
installed mysql-server-5.7 package post-installation script subprocess returned error exit status 1
dpkg: dependency problems prevent configuration of mysql-server:
mysql-server depends on mysql-server-5.7; however:
Package mysql-server-5.7 is not configured yet.
dpkg: error processing package mysql-server (--configure):
dependency problems - leaving unconfigured
Processing triggers for systemd (237-3ubuntu10.47) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
Processing triggers for libc-bin (2.27-3ubuntu1.2) ...
/sbin/ldconfig.real: /usr/local/lib/python3.7/dist-packages/ideep4py/lib/libmkldnn.so.0 is not a symbolic link
Errors were encountered while processing:
mysql-server-5.7
mysql-server
E: Sub-process /usr/bin/dpkg returned an error code (1)我将CSV文件下载到Colab VM中,然后创建了一个表。
CREATE TABLE IF NOT EXISTS ss_order (\
RowID int(4),\
OrderID char(14),OrderDate date,\
ShipDate date,ShipMode varchar(16),\
CustomerID char(8),CustomerName varchar(30),Segment varchar(20),\
Country varchar(30),City varchar(30),State varchar(30),PostalCode char(5),Region varchar(15) ,\
ProductID varchar(20), Category varchar(40), SubCategory varchar(40), ProductName varchar(200), \
Sales decimal(8,2), Quantity int(4), Discount decimal(4,2), Profit decimal(8,2) \
); \然后我加载表
LOAD DATA LOCAL INFILE 'SS_Orders.csv' INTO TABLE ss_order \
FIELDS TERMINATED BY ',' IGNORE 1 LINES; \无错误
我执行一个非常简单的SQL命令
select count(*),sum(Sales),Sum(Quantity), Sum(Profit) from ss_order;并得到以下答案
+----------+------------+---------------+-------------+
| count(*) | sum(Sales) | Sum(Quantity) | Sum(Profit) |
+----------+------------+---------------+-------------+
| 9994 | 2026591.44 | 382386 | 276351.97 |
+----------+------------+---------------+-------------+随后,我将CSV文件读取到Pandas Dataframe中,并生成相同的和,如下所示
dfSS['Quantity'].sum()
37873
dfSS['Sales'].sum()
2297200.8603
dfSS['Profit'].sum()
286397.0217显然,这是MySQL和Python Pandas生成的总和之间的巨大差异。我已经用Google Sheets和MS Excel检查了Python Pandas的答案是正确的,或者至少与Excel和Sheets sum函数一致。
带有代码的Colab Notebook --包括MySQL和Pandas --可以在this URL上找到。数据CSV文件的URL位于Colab Notebook中。
我的错误在哪里?还是错误?
发布于 2021-07-07 14:03:30
必须按如下方式更改LOAD命令
LOAD DATA LOCAL INFILE 'SS_Orders.csv' INTO TABLE ss_order \
FIELDS TERMINATED BY ',' enclosed by '\"' IGNORE 1 LINES; \以反映字段数据包含在“
这完全解决了问题。
https://stackoverflow.com/questions/68278576
复制相似问题