blocks|key|1775464|text|type|unstyled|depth|inlineStyleRanges|entityRanges|data|1775465|尝试减少数据量。|unordered-list-item|1775466|尝试修改算法，以便在早期阶段提取相关数据|1775467|尝试划分和/或并行化问题，并在计算节点群集中的多个客户端上执行|1775468|1775469|entityMap^0|0|0|0|0|0^^$0|@$1|2|3|-4|4|5|6|K|7|@]|8|@]|9|$]]|$1|A|3|B|4|C|6|L|7|@]|8|@]|9|$]]|$1|D|3|E|4|C|6|M|7|@]|8|@]|9|$]]|$1|F|3|G|4|C|6|N|7|@]|8|@]|9|$]]|$1|H|3|-4|4|5|6|O|7|@]|8|@]|9|$]]|$1|I|3|-4|4|5|6|P|7|@]|8|@]|9|$]]]|J|$]]

<ul>
<li>try to reduce the amount of data.</li>
<li>try to modify the algorithm, to extract the relevant data at an early stage</li>
<li>try to divide and / or parallelize the problem, and execute it over several clients in a cluster of computing nodes</li>
</ul>

blocks|key|5561343|text|文件样式对您的任务来说已经足够了，夫妇示例：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|5561344|5561345|Use+BuffereReader+skip()+method|ordered-list-item|offset|length|5561346|RandomAccessFile|5561347|5561348|阅读这两篇文章，重复数据块的问题就会消失。|5561349|entityMap|0|LINK|mutability|MUTABLE|url|https://stackoverflow.com/questions/10102703/read-text-file-from-position-in-java|1|https://stackoverflow.com/questions/9671126/how-to-read-a-file-from-a-certain-offset^0|0|0|0|V|0|0|0|G|1|0|0|0^^$0|@$1|2|3|4|5|6|7|W|8|@]|9|@]|A|$]]|$1|B|3|-4|5|6|7|X|8|@]|9|@]|A|$]]|$1|C|3|D|5|E|7|Y|8|@]|9|@$F|Z|G|10|1|11]]|A|$]]|$1|H|3|I|5|E|7|12|8|@]|9|@$F|13|G|14|1|15]]|A|$]]|$1|J|3|-4|5|6|7|16|8|@]|9|@]|A|$]]|$1|K|3|L|5|6|7|17|8|@]|9|@]|A|$]]|$1|M|3|-4|5|6|7|18|8|@]|9|@]|A|$]]]|N|$O|$5|P|Q|R|A|$S|T]]|U|$5|P|Q|R|A|$S|V]]]]

File-style will be enough for your task, couple sample:

<ol>
<li><a href="https://stackoverflow.com/questions/10102703/read-text-file-from-position-in-java">Use BuffereReader skip() method</a></li>
<li><a href="https://stackoverflow.com/questions/9671126/how-to-read-a-file-from-a-certain-offset">RandomAccessFile</a></li>
</ol>

Read this two, and problem with duplication chunks should go away.

blocks|key|5561361|text|您绝对应该尝试减少数据量，并使用多个线程来处理数据。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|5561362|FutureTask可以帮助您：|5561363|ExecutorService+exec+=+Executors.newFixedThreadPool(5);
FutureTask<BigDecimal>+task1+=+new+FutureTask<>(new+Callable<BigDecimal>()+{

+++@Override
+++public+BigDecimal+call()+throws+Exception+{
++++++return+doBigProcessing();
+++}

});

//+start+future+task+asynchronously
exec.execute(task1);

//+do+other+stuff

//+blocking+till+processing+is+over
BigDecimal+result+=+task1.get();|code-block|syntax|javascript|5561364|同样，如果可能的话，您可以考虑缓存将来的任务，以提高应用程序的速度。|5561365|如果还不够，可以使用Apache+Spark框架来处理大型数据集。|5561366|entityMap^0|0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|O|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|P|8|@]|9|@]|A|$]]|$1|D|3|E|5|F|7|Q|8|@]|9|@]|A|$G|H]]|$1|I|3|J|5|6|7|R|8|@]|9|@]|A|$]]|$1|K|3|L|5|6|7|S|8|@]|9|@]|A|$]]|$1|M|3|-4|5|6|7|T|8|@]|9|@]|A|$]]]|N|$]]

You should definitely try to reduce the amount of data and have multiple threads to handle your data.

FutureTask could help you :

<pre><code>ExecutorService exec = Executors.newFixedThreadPool(5);
FutureTask&lt;BigDecimal&gt; task1 = new FutureTask&lt;&gt;(new Callable&lt;BigDecimal&gt;() {

 @Override
 public BigDecimal call() throws Exception {
 return doBigProcessing();
 }

});

// start future task asynchronously
exec.execute(task1);

// do other stuff

// blocking till processing is over
BigDecimal result = task1.get();
</code></pre>

In the same way, you could consider caching the future task to speed up your application if possible.

If not enough, you could use Apache Spark framework to process large datasets.

blocks|key|4093000|text|在你考虑性能之前，你必须考虑以下几点：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|4093001|4093002|为数据找到了一个良好的数据结构。|unordered-list-item|4093003|找到处理数据的好算法。|4093004|4093005|如果没有足够内存空间，|4093006|4093007|使用内存映射文件在data|4093008|上运行|4093009|如果您有机会在不加载所有数据情况下处理数据|4093010|4093011|分而治之|4093012|4093013|请给我们提供更多细节。|4093014|entityMap^0|0|0|0|0|0|0|0|0|0|0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|Z|8|@]|9|@]|A|$]]|$1|B|3|-4|5|6|7|10|8|@]|9|@]|A|$]]|$1|C|3|D|5|E|7|11|8|@]|9|@]|A|$]]|$1|F|3|G|5|E|7|12|8|@]|9|@]|A|$]]|$1|H|3|-4|5|6|7|13|8|@]|9|@]|A|$]]|$1|I|3|J|5|6|7|14|8|@]|9|@]|A|$]]|$1|K|3|-4|5|6|7|15|8|@]|9|@]|A|$]]|$1|L|3|M|5|E|7|16|8|@]|9|@]|A|$]]|$1|N|3|O|5|6|7|17|8|@]|9|@]|A|$]]|$1|P|3|Q|5|6|7|18|8|@]|9|@]|A|$]]|$1|R|3|-4|5|6|7|19|8|@]|9|@]|A|$]]|$1|S|3|T|5|E|7|1A|8|@]|9|@]|A|$]]|$1|U|3|-4|5|6|7|1B|8|@]|9|@]|A|$]]|$1|V|3|W|5|6|7|1C|8|@]|9|@]|A|$]]|$1|X|3|-4|5|6|7|1D|8|@]|9|@]|A|$]]]|Y|$]]

Before you think about performance you must consider belows:

<ul>
<li>find a good data structure for the data. </li>
<li>find good algorithms to process the data.</li>
</ul>

If you do not have enough memory space, 

<ul>
<li>use memory mapped file to work on data</li>
</ul>

If you have a chance to process data without load all data

<ul>
<li>divide and conquer</li>
</ul>

And please give us more details.

i have a program, that at the start generates big amount of data ( several GB, possibly more than 10GB ) and then for several times process all data, do something, process all data, do something... That much data doesn't fit into my RAM and when it starts paging, its really painful. What is the optimal way to store my data and in general, how to solve this problem?

Should i use DB even though i dont need to save the data after my program ends?
Should i split my data somehow and just save it into files and load them when i need them? Or just keep using RAM and get over paging?

With DB and files there is a problem. I have to process the data by pieces. So i load chunk of data (lets say 500mb), calculate, load next chunk and after i load and calculate everything, i can do something and repeat the cycle. That means i would read from HDD the same chunks of data i read in previous cycle.

How to store big amount of data

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

教程

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云智能顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云AI代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

功能1上新10个字符

功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符。

功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符。

功能5描述100个字符功能5描述100个字符功能5描述100个字符功能5描述100个字符功能5描述100个字符功能5描述100个字符

功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符

功能4上新

文章&问答评论现已支持表情

全新交互，全新视觉，新增快捷键、悬浮工具栏、高亮块等功能并同时优化现有功能，全面提升创作效率和体验

社区富文本编辑器全新改版！诚邀体验～ 

精选全网热门MCP server，让你的AI更好用 🚀

💥开发者 MCP广场重磅上线！

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

聚焦“写作效率、视觉美观与运行性能”三方面进行全面升级，为您提供更高效、稳定的创作环境

社区富文本&Markdown编辑器全新改版上线，欢迎大家体验!

诚挚邀请您参与本次调研，分享您的真实使用感受与建议。您的反馈至关重要，感谢您的支持与参与！

社区新版编辑器体验调研

我有一个程序，它在开始时生成大量数据(几GB，可能超过10 GB)，然后几次处理所有数据，做一些事情，处理所有数据，做一些事情……那么多的数据放不进我的RAM，当它开始分页时，真的很痛苦。存储数据的最佳方式是什么?通常情况下，如何解决此问题？即使我在程序结束后不需要保存数据，我也应该使用DB吗？我是否应该以某种方式拆分我的数据，然后将其保存到文件中，并在需要时加载它们？或者只是继续使用RAM，不再

问如何存储海量数据
EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何存储海量数据EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何存储海量数据
EN