blocks|key|395652|text|对于大型数据文件，最好的选择是在将不必要的行导入到R中之前过滤掉它们，最简单的方法是使用OS命令，如sed、awk、grep等。下面的代码每4行从文件中读取一次:例如：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|395653|write.csv(1:1000,+file='test.csv')

file.pipe+<-+pipe("awk+'BEGIN{i=0}{i%2B%2B;if+(i%254==0)+print+$1}'+<+test.csv+")
res+<-+read.csv(file.pipe)
res

>+res
+++++X3+X3.1
1+++++7++++7
2++++11+++11
3++++15+++15
4++++19+++19
5++++23+++23
6++++27+++27
7++++31+++31
8++++35+++35|code-block|syntax|javascript|395654|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|I|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|J|8|@]|9|@]|A|$E|F]]|$1|G|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|H|$]]

For a large data file the best option is to filter out unnecessary row before they get imported into R. The simplest way to do this is by the means of the OS commands, like sed, awk, grep etc. The following code reads every 4th line from the file: for example:

<pre><code>write.csv(1:1000, file='test.csv')

file.pipe &lt;- pipe("awk 'BEGIN{i=0}{i++;if (i%4==0) print $1}' &lt; test.csv ")
res &lt;- read.csv(file.pipe)
res

&gt; res
 X3 X3.1
1 7 7
2 11 11
3 15 15
4 19 19
5 23 23
6 27 27
7 31 31
8 35 35
</code></pre>

blocks|key|395626|text|read.csv("filename.csv")[c(FALSE,+TRUE,+FALSE,+FALSE),+]|type|code-block|depth|inlineStyleRanges|entityRanges|data|syntax|javascript|395627|会成功的。|unstyled|395628|这是因为逻辑向量被回收，直到它匹配read.csv返回的数据帧的行数。|offset|length|style|CODE|395629|entityMap^0|0|0|H|8|0^^$0|@$1|2|3|4|5|6|7|O|8|@]|9|@]|A|$B|C]]|$1|D|3|E|5|F|7|P|8|@]|9|@]|A|$]]|$1|G|3|H|5|F|7|Q|8|@$I|R|J|S|K|L]]|9|@]|A|$]]|$1|M|3|-4|5|F|7|T|8|@]|9|@]|A|$]]]|N|$]]

<pre><code>read.csv("filename.csv")[c(FALSE, TRUE, FALSE, FALSE), ]
</code></pre>

will do the trick.

This works since the logical vector is recycled until it matches the number of rows of the data frame returned by <code>read.csv</code>.

blocks|key|1396899|text|正如@+As+239所建议的，使用命令行工具预先过滤行要好得多。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|1396900|下面是一个使用sed的简单版本|offset|length|style|CODE|1396901|df+<-+read.csv(pipe("sed+-n+'2~4p'+test.csv"))+|code-block|syntax|javascript|1396902|2~4p告诉sed每4行得到一次，从第2行开始。|1396903|entityMap^0|0|7|3|0|0|0|4|6|3|0^^$0|@$1|2|3|4|5|6|7|Q|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|R|8|@$D|S|E|T|F|G]]|9|@]|A|$]]|$1|H|3|I|5|J|7|U|8|@]|9|@]|A|$K|L]]|$1|M|3|N|5|6|7|V|8|@$D|W|E|X|F|G]|$D|Y|E|Z|F|G]]|9|@]|A|$]]|$1|O|3|-4|5|6|7|10|8|@]|9|@]|A|$]]]|P|$]]

As @df239 suggested, its much better to filter the rows beforehand using a commandline tool.

Here's a simpler version using <code>sed</code>:

<pre><code>df &lt;- read.csv(pipe("sed -n '2~4p' test.csv")) 
</code></pre>

The <code>2~4p</code> tells <code>sed</code> to get every 4th line, starting at line 2.

blocks|key|395675|text|对于中等大小的文件，Sven给出了一个很好的答案。但是，如果您这样做的原因是因为读取整个文件不适合内存，那么您需要采取不同的方法。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|395676|使用诸如Perl或AWK这样的外部工具对文件进行预处理可能是最简单的，因为只有您想要的行，您可以使用pipe从另一个程序的输出中读取，这样您就不必创建一个中间文件。|offset|length|style|CODE|395677|另一种方法是将文件传输到数据库，然后只从数据库中选择想要的行。|395678|您还可以循环遍历文件。如果您显式地打开文件，那么您可以一次读取几行，保留想要的行，然后从停止的位置开始读取下一个块。read.csv跳过行和限制要读取的行数的选项在这里会有所帮助。|395679|entityMap^0|0|1E|4|0|0|1M|8|0^^$0|@$1|2|3|4|5|6|7|N|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|O|8|@$D|P|E|Q|F|G]]|9|@]|A|$]]|$1|H|3|I|5|6|7|R|8|@]|9|@]|A|$]]|$1|J|3|K|5|6|7|S|8|@$D|T|E|U|F|G]]|9|@]|A|$]]|$1|L|3|-4|5|6|7|V|8|@]|9|@]|A|$]]]|M|$]]

Sven gave a great answer for moderately sized files. But if the reason that you are doing this is because reading the entire file does not fit into memory then you need to take a different approach.

It may be simplest to use an external tool like Perl or AWK to preprocess the file to only have the lines that you want, you can use <code>pipe</code> to read from the output of another program so that you do not have to create an intermediate file.

Another approach would be to transfer the file to a database, then select just the rows that you want from the database.

You can also loop through the file. If you explicitly open the file, then you can read a few rows at a time, keep just the ones that you want, then read the next chunk starting where you left off. The options to <code>read.csv</code> to skip lines and limit the number of lines to read would be helpful here.

blocks|key|1119857|text|虽然sed和awk解决方案很棒，但最好在R本身内这样做(比如在Windows机器上，或者避免GNU+sed与BSD+sed之间的差异)。使用来自tidyverse的tidyverse中的回调对每个nth行进行采样，运行得相当好：|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|1119858|read_tsv_sample+<-+function(fn,+nth,+...)+{
++sample_df_cb+<-+function(df,+idx)+{
++++df[seq(1,+nrow(df),+nth),+]
++}

++read_tsv_chunked(fn,
+++++++++++++++++++...,
+++++++++++++++++++chunk_size+=+10000,
+++++++++++++++++++callback+=+DataFrameCallback$new(sample_df_cb)
++)+%25>%25
++++bind_rows()
}|code-block|syntax|javascript|1119859|例如..。|1119860|iris+%25>%25+write_tsv("iris.tsv")

iris+%25>%25+dim
#>+[1]+150+++5

"iris.tsv"+%25>%25
++++read_tsv_sample(10,
++++++++++++++++++++col_types+=+cols(.default+=+col_double())
++++++++++++++++++++)+%25>%25
++++dim
#>+[1]+15++5|1119861|entityMap|0|LINK|mutability|MUTABLE|url|https://readr.tidyverse.org/reference/callback.html^0|2|3|6|3|K|1|1E|3|1M|3|20|9|2A|9|2Q|3|2L|2|0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|W|8|@$9|X|A|Y|B|C]|$9|Z|A|10|B|C]|$9|11|A|12|B|C]|$9|13|A|14|B|C]|$9|15|A|16|B|C]|$9|17|A|18|B|C]|$9|19|A|1A|B|C]|$9|1B|A|1C|B|C]]|D|@$9|1D|A|1E|1|1F]]|E|$]]|$1|F|3|G|5|H|7|1G|8|@]|D|@]|E|$I|J]]|$1|K|3|L|5|6|7|1H|8|@]|D|@]|E|$]]|$1|M|3|N|5|H|7|1I|8|@]|D|@]|E|$I|J]]|$1|O|3|-4|5|6|7|1J|8|@]|D|@]|E|$]]]|P|$Q|$5|R|S|T|E|$U|V]]]]

While the <code>sed</code> and <code>awk</code> solutions are great, it might be nice to do this within <code>R</code> itself (say on Windows machines or to avoid GNU <code>sed</code> vs BSD <code>sed</code> differences). Using <a href="https://readr.tidyverse.org/reference/read_delim_chunked.html" rel="nofollow noreferrer"><code>readr::read_*_chunked</code></a> from the <code>tidyverse</code> with a <a href="https://readr.tidyverse.org/reference/callback.html" rel="nofollow noreferrer">callback</a> that samples every <code>nth</code> row works rather well:

<pre><code>read_tsv_sample &lt;- function(fn, nth, ...) {
 sample_df_cb &lt;- function(df, idx) {
 df[seq(1, nrow(df), nth), ]
 }

 read_tsv_chunked(fn,
 ...,
 chunk_size = 10000,
 callback = DataFrameCallback$new(sample_df_cb)
 ) %&gt;%
 bind_rows()
}
</code></pre>

For example...

<pre><code>iris %&gt;% write_tsv("iris.tsv")

iris %&gt;% dim
#&gt; [1] 150 5

"iris.tsv" %&gt;%
 read_tsv_sample(10,
 col_types = cols(.default = col_double())
 ) %&gt;%
 dim
#&gt; [1] 15 5
</code></pre>

just a quick question. Is there a way to use read.csv to import every Nth row from a large file: 

Example, a 50-60 million line file where you only need every 4th row starting at row 2. 

I thought about maybe incorporating the 'seq' function, but I am not sure if that is possible.

Any suggestions?

Importing only every Nth row from a .csv file in R

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

教程

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云智能顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

EdgeOne AI 安全实战专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云AI代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

功能1上新10个字符

功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符。

功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符。

功能5描述100个字符功能5描述100个字符功能5描述100个字符功能5描述100个字符功能5描述100个字符功能5描述100个字符

功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符

功能4上新

文章&问答评论现已支持表情

全新交互，全新视觉，新增快捷键、悬浮工具栏、高亮块等功能并同时优化现有功能，全面提升创作效率和体验

社区富文本编辑器全新改版！诚邀体验～ 

精选全网热门MCP server，让你的AI更好用 🚀

💥开发者 MCP广场重磅上线！

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

聚焦“写作效率、视觉美观与运行性能”三方面进行全面升级，为您提供更高效、稳定的创作环境

社区富文本&Markdown编辑器全新改版上线，欢迎大家体验!

诚挚邀请您参与本次调研，分享您的真实使用感受与建议。您的反馈至关重要，感谢您的支持与参与！

社区新版编辑器体验调研

只是一个简单的问题。是否有一种方法可以使用read.csv从大文件中导入每一行：例如，一个5-6千万行文件，您只需要从第2行开始每4行。我考虑过可能合并“seq”函数，但我不确定这是否可能。有什么建议吗？

问仅从R中的.csv文件中导入每一行
EN

回答 5

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问仅从R中的.csv文件中导入每一行EN

回答 5

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问仅从R中的.csv文件中导入每一行
EN