首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用fread读取.csv在OSX上工作,而不是在Windows上

使用fread读取.csv在OSX上工作,而不是在Windows上
EN

Stack Overflow用户
提问于 2018-06-08 12:11:17
回答 1查看 320关注 0票数 0

我有一个恼人的csv > 10 10,它在Mac上打开,但在Windows 10上不打开。

我使用的代码

代码语言:javascript
复制
data_in <- fread("my_data.csv")

SessionInfo窗口

代码语言:javascript
复制
R version 3.4.4 (2018-03-15)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252    LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] data.table_1.10.4-3 forcats_0.3.0       stringr_1.3.0       dplyr_0.7.4         purrr_0.2.4         readr_1.1.1         tidyr_0.8.0         tibble_1.4.2       
 [9] ggplot2_2.2.1       tidyverse_1.2.1     RMySQL_0.10.14      DBI_0.8            

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.16     cellranger_1.1.0 pillar_1.2.1     compiler_3.4.4   plyr_1.8.4       bindr_0.1.1      tools_3.4.4      lubridate_1.7.2  jsonlite_1.5    
[10] nlme_3.1-131.1   gtable_0.2.0     lattice_0.20-35  pkgconfig_2.0.1  rlang_0.2.0      psych_1.8.3.3    cli_1.0.0        rstudioapi_0.7   yaml_2.1.18     
[19] parallel_3.4.4   haven_1.1.1      bindrcpp_0.2.2   xml2_1.2.0       httr_1.3.1       hms_0.4.2        grid_3.4.4       glue_1.2.0       R6_2.2.2        
[28] readxl_1.0.0     foreign_0.8-69   modelr_0.1.1     reshape2_1.4.3   magrittr_1.5     scales_0.5.0     rvest_0.3.2      assertthat_0.2.0 mnormt_1.5-5    
[37] colorspace_1.3-2 stringi_1.1.7    lazyeval_0.2.1   munsell_0.4.3    broom_0.4.4      crayon_1.3.4 

SessionInfo OSX

代码语言:javascript
复制
R version 3.5.0 (2018-04-23)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] sv_SE.UTF-8/sv_SE.UTF-8/sv_SE.UTF-8/C/sv_SE.UTF-8/sv_SE.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.11.2

loaded via a namespace (and not attached):
[1] compiler_3.5.0 tools_3.5.0    yaml_2.1.19 

我在Windows上遇到的错误,我已经尝试了所有建议的解决方案,没有任何运气。

期待10科尔,但第1346596行包含文本后,处理所有科尔。用fill=TRUE再试一次。另一个原因可能是fread在不平衡的未转义引号中区分嵌入了sep=、和/或(未转义)‘n’字符的一个或多个字段的逻辑失败了。如果引号=‘’没有帮助,请提出一个问题,以确定是否可以改进逻辑。此外:警告信息:

使用verbose = TRUE (Windows)时的附加信息(尝试了一个较小的文件,同样的问题)

代码语言:javascript
复制
Input contains no \n. Taking this to be a filename to open
File opened, filesize is 0.004474 GB.
Memory mapping ... ok
Detected eol as \r\n (CRLF) in that order, the Windows standard.
Positioned on line 1 after skip or autostart
This line is the autostart and not blank so searching up for the last non-blank ... line 1
Detecting sep ... ','
Detected 10 columns. Longest stretch was from line 1 to line 30
Starting data input on line 1 (either column names or first row of data). First 10 characters: ,asin,sale
All the fields on line 1 are character fields. Treating as the column names.
Count of eol: 3657 (including 0 at the end)
Count of sep: 138915
nrow = MIN( nsep [138915] / (ncol [10] -1), neol [3657] - endblanks [0] ) = 3657
Type codes (point  0): 1444444340
Type codes (point  1): 1444444340
Type codes (point  2): 1444444340
Type codes (point  3): 1444444340
Type codes (point  4): 1444444344
Type codes (point  5): 1444444344
Type codes (point  6): 1444444344
Type codes (point  7): 1444444344
Type codes (point  8): 1444444344
Type codes (point  9): 1444444344
Type codes (point 10): 1444444444
Type codes: 1444444444 (after applying colClasses and integer64)
Type codes: 1444444444 (after applying drop or select (if supplied)
Allocating 10 column slots (10 - 0 dropped)
Error in fread("md2.csv", verbose = T) : 
  Expecting 10 cols, but line 3312 contains text after processing all cols. Try again with fill=TRUE. Another reason could be that fread's logic in distinguishing one or more fields having embedded sep=',' and/or (unescaped) '\n' characters within unbalanced unescaped quotes has failed. If quote='' doesn't help, please file an issue to figure out if the logic could be improved.

verbose = T OSX

代码语言:javascript
复制
nput contains no \n. Taking this to be a filename to open
[01] Check arguments
  Using 4 threads (omp_get_max_threads()=4, nth=4)
  NAstrings = [<<NA>>]
  None of the NAstrings look like numbers.
  show progress = 1
  0/1 column will be read as integer
[02] Opening the file
  Opening file md2.csv
  File opened, size = 4.581MB (4803885 bytes).
  Memory mapped ok
[03] Detect and skip BOM
[04] Arrange mmap to be \0 terminated
  \n has been found in the input and different lines can end with different line endings (e.g. mixed \n and \r\n in one file). This is common and ideal.
  File ends abruptly with ','. Final end-of-line is missing. Using cow page to write 0 to the last byte.
[05] Skipping initial rows if needed
  Positioned on line 1 starting: <<,asin,salesRank,imUrl,categori>>
[06] Detect separator, quoting rule, and ncolumns
  Detecting sep automatically ...
  sep=','  with 100 lines of 10 fields using quote rule 0
  Detected 10 columns on line 1. This line is either column names or first data row. Line starts as: <<,asin,salesRank,imUrl,categori>>
  Quote rule picked = 0
  fill=false and the most number of columns found is 10
[07] Detect column types, good nrow estimate and whether first row is column names
  Number of sampling jump points = 10 because (4803885 bytes from row 1 to eof) / (2 * 127664 jump0size) == 18
  Type codes (jump 000)    : 5AAAAAA7A2  Quote rule 0
  Type codes (jump 004)    : 5AAAAAA7AA  Quote rule 0
  Type codes (jump 010)    : 5AAAAAA7AA  Quote rule 0
  'header' determined to be true due to column 8 containing a string on row 1 and a lower type (float64) in the rest of the 1041 sample rows
  =====
  Sampled 1041 rows (handled \n inside quoted fields) at 11 jump points
  Bytes from first data row on line 2 to the end of last row: 4803813
  Line length: mean=2028.07 sd=3025.66 min=28 max=29901
  Estimated number of rows: 4803813 / 2028.07 = 2369
  Initial alloc = 4738 rows (2369 + 100%) using bytes/max(mean-2*sd,min) clamped between [1.1*estn, 2.0*estn]
  =====
[08] Assign column names
[09] Apply user overrides on column types
  After 0 type and 0 drop user overrides : 5AAAAAA7AA
[10] Allocate memory for the datatable
  Allocating 10 column slots (10 - 0 dropped) with 4738 rows
[11] Read the data
  jumps=[0..2), chunk_size=2401906, total_size=4803813
Read 3311 rows x 10 columns from 4.581MB (4803885 bytes) file in 00:00.025 wall clock time
[12] Finalizing the datatable
  Type counts:
         1 : int32     '5'
         1 : float64   '7'
         8 : string    'A'
=============================
   0.001s (  2%) Memory map 0.004GB file
   0.005s ( 19%) sep=',' ncol=10 and header detection
   0.000s (  0%) Column type detection using 1041 sample rows
   0.000s (  0%) Allocation of 4738 rows x 10 cols (0.000GB) of which 3311 ( 70%) rows used
   0.019s ( 78%) Reading 2 chunks (0 swept) of 2.291MB (each chunk 1655 rows) using 2 threads
   +    0.004s ( 15%) Parse to row-major thread buffers (grown 0 times)
   +    0.012s ( 48%) Transpose
   +    0.004s ( 15%) Waiting
   0.000s (  0%) Rereading 0 columns due to out-of-sample type exceptions
   0.025s        Total
EN

回答 1

Stack Overflow用户

发布于 2018-06-08 15:04:53

在最新版本的data.table中工作很好

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/50760388

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档