blocks|key|1326799|text|从数据量的角度来看，300万条记录并不是很多(显然取决于记录的大小)，所以我建议尝试的最简单的事情是跨多个线程并行处理(使用java.util.concurrent.Executor框架)。只要您有多个可用的CPU核心，您就应该能够获得近乎线性的性能提升。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|1326800|entityMap^0|0^^$0|@$1|2|3|4|5|6|7|D|8|@]|9|@]|A|$]]|$1|B|3|-4|5|6|7|E|8|@]|9|@]|A|$]]]|C|$]]

3 million records isn't really that much from a volume-of-data point of view (depending on record size, obviously), so I'd suggest that the easiest thing to try is parallelising the processing across multiple threads (using the java.util.concurrent.Executor framework). As long as you have multiple CPU cores available, you should be able to get near-linear performance increases.

blocks|key|27428|text|这取决于数据源。如果它是一个单一的数据库，那么您将花费大部分时间来检索数据。如果它在本地文件中，那么您可以将数据划分为较小的文件，或者您可以填充具有相同大小的记录-这允许随机访问一批记录。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|27429|如果您使用的是多核计算机，则可以并行处理分区的数据。如果您确定了记录桶分配，则可以使用PreparedStatement的批处理功能将信息写回数据库。|27430|如果您只有一台核心计算机，您仍然可以通过设计数据检索-数据处理-批量写回分离来利用I/O操作的暂停时间，从而实现一些性能改进。|27431|entityMap^0|0|0|0^^$0|@$1|2|3|4|5|6|7|H|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|I|8|@]|9|@]|A|$]]|$1|D|3|E|5|6|7|J|8|@]|9|@]|A|$]]|$1|F|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|G|$]]

It depends on the data source. If it is a single database, you will spend most of the time retrieving the data anyway. If it is in a local file, then you can partition the data into smaller files or you can pad the records to have equal size - this allows random access to a batch of records. 

If you have a multi-core machine, the partitioned data can be processed in parallel. If you determined the record-bucket assignment, you can write back the information into the database using the PreparedStatement's batch capability.

If you have only a single core machine, you can still achieve some performance improvements by designing a data retrieval - data processing - batch writeback separation to take advantage of the pause times of the I/O operations.

blocks|key|927796|text|我不太确定你在找but+here's+a+blog+post+about+how+the+New+York+Times+used+Apache+Hadoop+Project+to+process+a+large+volume+of+data做什么。|type|unstyled|depth|inlineStyleRanges|entityRanges|offset|length|data|927797|entityMap|0|LINK|mutability|MUTABLE|url|http://open.blogs.nytimes.com/2007/11/01/self-service-prorated-super-computing-fun/^0|8|34|0|0^^$0|@$1|2|3|4|5|6|7|L|8|@]|9|@$A|M|B|N|1|O]]|C|$]]|$1|D|3|-4|5|6|7|P|8|@]|9|@]|C|$]]]|E|$F|$5|G|H|I|C|$J|K]]]]

I'm not quite sure what you're after <a href="http://open.blogs.nytimes.com/2007/11/01/self-service-prorated-super-computing-fun/" rel="nofollow noreferrer">but here's a blog post about how the New York Times used Apache Hadoop Project to process a large volume of data</a>.

blocks|key|4144695|text|作为一个无意义的基准，我们有一个内部缓存的系统。我们目前正在加载500K行。对于每一行，我们生成统计数据，将键放在不同的缓存中，等等。目前，我们处理这一过程需要<20秒。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|4144696|这是一个没有意义的基准测试，但它是一个实例，根据环境的不同，在当今的硬件上，3M行并不是很多行。|4144697|这就是说。|4144698|正如其他人所建议的，将作业分解为碎片，并并行运行，每个内核1-2个线程。每个线程维护自己的本地数据结构和状态，最后，主进程合并结果。这是一个粗糙的"map/reduce“算法。这里的关键是确保线程不会争抢全局资源，如全局计数器等。让线程结果的最终处理串行处理这些资源。|4144699|如果每个线程都在执行DB+IO，那么您可以在每个内核上使用多个线程，因为没有一个线程是纯粹的CPU绑定的。只需使用不同的线程数多次运行该进程，直到它运行得最快。|4144700|我们已经看到，即使我们通过JMS这样的持久排队系统运行批处理来分配工作与线性处理，速度也会提高50%25，而且我已经在2核笔记本电脑上看到了这些改进，所以这里有一定的进步空间。|4144701|如果可能的话，另一件事是在最后之前不要做任何磁盘IO+(除了从数据库读取数据)。在这一点上，您有更多的机会批量处理任何需要进行的更新，这样您至少可以减少网络往返时间。即使您必须更新每一行，大批SQL仍然会显示出性能上的净收益。显然，这可能是内存密集型的。值得庆幸的是，大多数现代系统都有大量的内存。|4144702|entityMap^0|0|0|0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|P|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|Q|8|@]|9|@]|A|$]]|$1|D|3|E|5|6|7|R|8|@]|9|@]|A|$]]|$1|F|3|G|5|6|7|S|8|@]|9|@]|A|$]]|$1|H|3|I|5|6|7|T|8|@]|9|@]|A|$]]|$1|J|3|K|5|6|7|U|8|@]|9|@]|A|$]]|$1|L|3|M|5|6|7|V|8|@]|9|@]|A|$]]|$1|N|3|-4|5|6|7|W|8|@]|9|@]|A|$]]]|O|$]]

As a meaningless benchmark, we have a system that has a internal cache. We're currently loading 500K rows. For each row we generate statistics, place keys in different caches, etc. Currently this takes &lt; 20s for us to process.

It's a meaningless benchmark, but it is an instance that, depending on the circumstances, 3M rows is not a lot of rows on todays hardware.

That said.

As others have suggested, break the job up in to pieces, and parallelize the runs, 1-2 threads per core. Each thread maintains their own local data structures and state, and at the end, the master process consolidates the results. This is a crude "map/reduce" algorithm. The key here is to ensure that the threads aren't fighting over global resources like global counters, etc. Let the final processing of the thread results deal with those serially.

You can use more than one thread per core if each thread is doing DB IO, since no single thread will be purely CPU bound. Simply run the process several times with different thread counts until it comes out fastest.

We've seen 50% speed ups even when we run batches through a persistent queueing system like JMS to distribute the work vs linear processing, and I've seen these gains on 2 core laptop computers, so there is definite room for progress here.

Another thing if possible is don't do ANY disk IO (save reading the data from the DB) until the very end. At that point you have a lot more opportunity to batch any updates that need to be made so you can, at least, cut down on network round trip times. Even if you had to update every single row, large batches of SQL will still show net gains in performance. Obviously this can be memory intensive. Thankfully, most modern systems have a lot of memory.

blocks|key|1326843|text|您必须使用Java处理数据有什么原因吗？您不能使用SQL查询来写入中间字段吗？您可以在每个字段--属性--的基础上构建，直到您拥有了所需的所有内容。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|1326844|或者你可以使用SQL和java的混合体...使用不同的过程获取不同的“桶”信息，然后将该信息发送到一个线程路径以进行更详细的处理，并使用另一个查询来获取另一组数据并将其发送到不同的线程路径...|1326845|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|F|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|G|8|@]|9|@]|A|$]]|$1|D|3|-4|5|6|7|H|8|@]|9|@]|A|$]]]|E|$]]

Is there a reason that you have to use Java to process the data? Couldn't you use SQL queries to write to intermediate fields? You could build upon each field -- attributes -- until you have everything in the bucket you need. 

Or you could use a hybrid of SQL and java... Use different procedures to get different "buckets" of information and then send that down one thread path for more detailed processing and another query to get another set of data and send that down a different thread path...

blocks|key|928027|text|这对于大多数需要处理大量信息的项目来说都是一样的。我将假设每个记录都是相同的，例如，您每次都以相同的方式处理它，这将是您可以派生一个单独的线程来进行处理的点。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|928028|第二个显而易见的点是您在哪里获取信息，在这种情况下，您提到了一个数据库，但实际上这是非常不相关的。您希望在代码中分离I/O和处理元素，以分离线程(或者更有可能的是，一个用于处理的执行器池)。|928029|尽量使它们相互独立，并记住在必要时使用锁定。这里有一些你可能想要阅读的链接。|928030|http://www.ibm.com/developerworks/library/j-thread.html|offset|length|928031|+http://www.ibm.com/developerworks/java/library/j-threads1.html++http://www.devarticles.com/c/a/Java/Multithreading-in-Java/|928032|entityMap|0|LINK|mutability|MUTABLE|url|1|http://www.ibm.com/developerworks/java/library/j-threads1.html|2|3|http://www.devarticles.com/c/a/Java/Multithreading-in-Java/|4^0|0|0|0|0|1J|0|0|0|1|1|1|1Q|2|1S|1|3|1T|1N|4|0^^$0|@$1|2|3|4|5|6|7|Y|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|Z|8|@]|9|@]|A|$]]|$1|D|3|E|5|6|7|10|8|@]|9|@]|A|$]]|$1|F|3|G|5|6|7|11|8|@]|9|@$H|12|I|13|1|14]]|A|$]]|$1|J|3|K|5|6|7|15|8|@]|9|@$H|16|I|17|1|18]|$H|19|I|1A|1|1B]|$H|1C|I|1D|1|1E]|$H|1F|I|1G|1|1H]]|A|$]]|$1|L|3|-4|5|6|7|1I|8|@]|9|@]|A|$]]]|M|$N|$5|O|P|Q|A|$R|G]]|S|$5|O|P|Q|A|$R|T]]|U|$5|O|P|Q|A|$R|T]]|V|$5|O|P|Q|A|$R|W]]|X|$5|O|P|Q|A|$R|W]]]]

This goes the same for most projects where you need to process large amounts of information. I am going to assume that each record is the same, e.g. you process it the same way each time, which would be the point you can spawn a separate thread to do the processing.

The second obvious point is where you are fetching your information, this case you mentioned a database, but really that is pretty irrelevant. You want to separate your I/O and processing elements in your code to separate threads (or more likely, a pool of executors for the processing). 

Try to make each as independent as possible, and remember to use locking when necessary. Here are some links that you may want to read up on. 

<a href="http://www.ibm.com/developerworks/library/j-thread.html" rel="nofollow noreferrer"><a href="http://www.ibm.com/developerworks/library/j-thread.html" rel="nofollow noreferrer">http://www.ibm.com/developerworks/library/j-thread.html</a></a> 
<a href="http://www.ibm.com/developerworks/java/library/j-threads1.html" rel="nofollow noreferrer">
<a href="http://www.ibm.com/developerworks/java/library/j-threads1.html" rel="nofollow noreferrer">http://www.ibm.com/developerworks/java/library/j-threads1.html</a></a>
<a href="http://www.devarticles.com/c/a/Java/Multithreading-in-Java/" rel="nofollow noreferrer">
<a href="http://www.devarticles.com/c/a/Java/Multithreading-in-Java/" rel="nofollow noreferrer">http://www.devarticles.com/c/a/Java/Multithreading-in-Java/</a></a>

blocks|key|928097|text|此场景的有效设计步骤包括:首先，确定可以对要处理的记录进行分区以实现全引擎并行化的任何和所有位置(即，针对750k记录运行四个单元的成本相对较低)。然后，根据汇总记录的规则的成本(我将存储桶的分配视为汇总操作)，确定您的操作是受CPU限制还是受记录检索限制。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|928098|如果您的CPU很有限，那么增加分区是最佳的性能提升。如果您受IO限制，则可以并行工作以响应分块数据检索的规则处理工作线程是一种性能更好的设计。|928099|所有这些都假设您的规则不会导致需要在记录之间跟踪的状态。这种情况严重威胁着并行化方法。如果并行化不是一个易于处理的解决方案，因为累积状态是规则集的一个组成部分，那么您最好的解决方案实际上可能是逐个记录的顺序处理。|928100|entityMap^0|0|0|0^^$0|@$1|2|3|4|5|6|7|H|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|I|8|@]|9|@]|A|$]]|$1|D|3|E|5|6|7|J|8|@]|9|@]|A|$]]|$1|F|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|G|$]]

Effective design steps for this scenario consist of first, determining any and all places where you can partition the records to be processed to allow full-engine parallelization (i.e., four units running against 750k records each is comparatively cheap). 
Then, depending upon the cost of the rules that summarize your record (I am viewing assignment of a bucket as a summarization operation), determine if your operation is going to be CPU bound or record retrieval bound. 

If you're CPU bound, increasing the partitioning is your best performance gain. If you're IO bound, rule processing worker threads that can work in parallel in response to chunked data retrieval is a better-performing design.

All of this assumes that your rules will not result in state which needs to be tracked between records. Such a scenario deeply threatens the parallelization approach. If parallelization is not a tractable solution because of cumulative state being a component of the rule set, then your best solution may in fact be sequential processing of individual records.

blocks|key|1326946|text|type|unstyled|depth|inlineStyleRanges|entityRanges|data|1326947|顺序处理的数量如此之大显然超出了范围。|blockquote|1326948|1326949|1326950|我不认为你知道这个。以这种方式处理1000条记录需要多长时间?+10,000?+100,000?+1,000,000？如果答案确实是“太长”，那么很好:开始寻找优化。但是你可能会发现答案是“无关紧要的”，然后你就完成了。|1326951|其他答案都提到了这一点，但这是我的全部答案。在开始优化之前，先证明你有问题。那么你至少已经有了一个简单、正确的系统来分析和比较优化的答案。|1326952|entityMap^0|0|0|0|0|0|0^^$0|@$1|2|3|-4|4|5|6|L|7|@]|8|@]|9|$]]|$1|A|3|B|4|C|6|M|7|@]|8|@]|9|$]]|$1|D|3|-4|4|5|6|N|7|@]|8|@]|9|$]]|$1|E|3|-4|4|5|6|O|7|@]|8|@]|9|$]]|$1|F|3|G|4|5|6|P|7|@]|8|@]|9|$]]|$1|H|3|I|4|5|6|Q|7|@]|8|@]|9|$]]|$1|J|3|-4|4|5|6|R|7|@]|8|@]|9|$]]]|K|$]]

<blockquote>
 Sequential processing of such a big
 number is clearly out of scope.
</blockquote>

I don't think you know that. How long does it take to process 1,000 records in this way? 10,000? 100,000? 1,000,000? If the answer is really "too long," then fine: start to look for optimizations. But you might find the answer is "insignificant," and then you're done.

Other answers have alluded to this, but it's my entire answer. Prove that you have a problem before you start optimizing. Then you've at least got a simple, correct system to profile and against which to compare optimized answers.

blocks|key|1329524|text|根据修改后的描述，我想我会尝试对数据进行排序。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|1329525|排序可以是一个n_log(n)过程；如果大多数比较都是针对可排序字段的直接相等，这应该会产生~O(n_log(n))的总复杂度。理论上是这样。如果将项目分配到存储桶后不再需要，只需将其从数据列表中删除即可。|1329526|即使在逻辑中的各个步骤中需要对数据重新排序几次，它仍然应该比n%5E2方法快一点。|1329527|基本上，这将涉及到对数据进行预处理，以使其更容易进行实际处理。|1329528|这对存储桶分配的逻辑做出了某些假设(即它离提供的伪代码不太远)；如果您需要从A、B的每一对中提取数据，那么它将是无效的。|1329529|希望这能有所帮助。|1329530|编辑:如果我可以的话，我会评论的；但是，唉，我太新了。预处理对数据的应用与对单个类别的应用一样多。最终，从15分钟的计算时间到5分钟的计算时间，您所需要做的就是能够以编程方式确定不能也永远不会匹配的类别中的2/3%2B。在小于O(n)的摊销时间内。我承认，这可能不适用于你的具体情况。|1329531|entityMap^0|0|0|0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|P|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|Q|8|@]|9|@]|A|$]]|$1|D|3|E|5|6|7|R|8|@]|9|@]|A|$]]|$1|F|3|G|5|6|7|S|8|@]|9|@]|A|$]]|$1|H|3|I|5|6|7|T|8|@]|9|@]|A|$]]|$1|J|3|K|5|6|7|U|8|@]|9|@]|A|$]]|$1|L|3|M|5|6|7|V|8|@]|9|@]|A|$]]|$1|N|3|-4|5|6|7|W|8|@]|9|@]|A|$]]]|O|$]]

Based on the revised description, I think I'd try and look at sorting the data.

Sorting can be an nlog(n) process; and if most of the comparisons are for direct equality on sortable fields, this should yield a total complexity of ~O(nlog(n)). Theoretically. If after assigning an item to a bucket it's no longer needed, just remove it from the list of data. 

Even if the data needed to be resorted a few times for various steps in the logic, it should still be a bit faster then then n^2 approach.

Basically, this would involve preprocessing the data to make it easier for actual processing.

This makes certain assumptions about the logic of bucket assigning (nameley that it's not too far from the psuedo code provided); and would be invalid if you needed to extract data from every pair of A,B.

Hope this helps.

Edit: I would comment if I could; but, alas, I am too new. Preprocessing applies as much to the data as it does to the individual categories. Ultimately all you need to do to go from a 15 minute compute time to a 5 minute compute time is to be able to programmatically determine 2/3s+ of the categories that cannot and will never match... in less then O(n) amortized time. Which might not be applicable to your specific situation, I admit.

blocks|key|4144803|text|我会努力推动规范作者更多地关注“需要做什么”，而不是如何做。我不能想象为什么一个规范会将“‘java”推向数据密集型操作。如果它与数据有关，那么就使用SQL。如果您使用的是Oracle，那么有一个名为nTile的函数。因此，创建一组固定的存储桶非常简单：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|4144804|选择ntile(4)，而不是(按empno排序)+grp、empno、ename+from+emp|4144805|这会导致：|4144806|GRP+EMPNO+ENAME
---+-----+---------
1++7369+SMITH
1++7499+ALLEN
1++7521+WARD
1++7566+JONES
2++7654+MARTIN
2++7698+BLAKE
2++7782+CLARK
2++7788+SCOTT
3++7839+KING
3++7844+TURNER
3++7876+ADAMS
4++7900+JAMES
4++7902+FORD
4++7934+MILLER|code-block|syntax|javascript|4144807|至少你可以在SQL中建立你的“存储桶”，然后你的Java代码将只需要处理一个给定的存储桶。|4144808|Worker+worker+=+new+Worker(bucketID);
worker.doWork();|4144809|如果您不关心存储桶的数量(上面的示例要求4个存储桶)，而是每个存储桶的固定大小(每个存储桶5条记录)，则SQL为：|4144810|select+ceil(row_number()over(order+by+empno)/5.0)+grp,
++empno,
++ename
from+emp|4144811|输出：|4144812|GRP++++++EMPNO+ENAME
++++---+----------+-------
1+++++++7369+SMITH
1+++++++7499+ALLEN
1+++++++7521+WARD
1+++++++7566+JONES
1+++++++7654+MARTIN
2+++++++7698+BLAKE
2+++++++7782+CLARK
2+++++++7788+SCOTT
2+++++++7839+KING
2+++++++7844+TURNER
3+++++++7876+ADAMS
3+++++++7900+JAMES
3+++++++7902+FORD
3+++++++7934+MILLER|4144813|以上两个例子都来自安东尼·莫利纳罗写的一本很棒的书:+SQL+Cookbook，第一版|4144814|entityMap^0|0|0|0|0|0|0|0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|10|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|11|8|@]|9|@]|A|$]]|$1|D|3|E|5|6|7|12|8|@]|9|@]|A|$]]|$1|F|3|G|5|H|7|13|8|@]|9|@]|A|$I|J]]|$1|K|3|L|5|6|7|14|8|@]|9|@]|A|$]]|$1|M|3|N|5|H|7|15|8|@]|9|@]|A|$I|J]]|$1|O|3|P|5|6|7|16|8|@]|9|@]|A|$]]|$1|Q|3|R|5|H|7|17|8|@]|9|@]|A|$I|J]]|$1|S|3|T|5|6|7|18|8|@]|9|@]|A|$]]|$1|U|3|V|5|H|7|19|8|@]|9|@]|A|$I|J]]|$1|W|3|X|5|6|7|1A|8|@]|9|@]|A|$]]|$1|Y|3|-4|5|6|7|1B|8|@]|9|@]|A|$]]]|Z|$]]

I would make efforts to push back with the specification author to focus more on 'what' needs to be done, rather than how. I can't imagine why a specifcation would push'java' for a data intensive operation. If it has to do with data, do it with SQL. If your using Oracle there is a function called nTile. So creating a fixed set of buckets is as trivial as:

select ntile(4)over(order by empno) grp,
 empno,
 ename
from emp

Which results in:

<pre><code>GRP EMPNO ENAME
--- ----- ---------
1 7369 SMITH
1 7499 ALLEN
1 7521 WARD
1 7566 JONES
2 7654 MARTIN
2 7698 BLAKE
2 7782 CLARK
2 7788 SCOTT
3 7839 KING
3 7844 TURNER
3 7876 ADAMS
4 7900 JAMES
4 7902 FORD
4 7934 MILLER
</code></pre>

At minimum you could at least establish your 'buckets' in SQL, then your Java Code would just need to process a given bucket. 

<pre><code>Worker worker = new Worker(bucketID);
worker.doWork();
</code></pre>

If you don't care about the number of buckets (the example above was asking for 4 buckets) tbut rather a fixed size of each bucket (5 records per bucket) then the SQL is:

<pre><code>select ceil(row_number()over(order by empno)/5.0) grp,
 empno,
 ename
from emp
</code></pre>

Output:

<pre><code>GRP EMPNO ENAME
 --- ---------- -------
1 7369 SMITH
1 7499 ALLEN
1 7521 WARD
1 7566 JONES
1 7654 MARTIN
2 7698 BLAKE
2 7782 CLARK
2 7788 SCOTT
2 7839 KING
2 7844 TURNER
3 7876 ADAMS
3 7900 JAMES
3 7902 FORD
3 7934 MILLER
</code></pre>

Both examples above come from the terrific book:
SQL Cookbook, 1st Edition by Anthony Molinaro

As part of the requirement we need to process nearly 3 million records and associate them with a bucket. This association is decided on a set of rules (comprising of 5-15 attributes, with single or range of values and precedence) which derive the bucket for a record.
Sequential processing of such a big number is clearly out of scope.
Can someone guide us on the approach to effectively design a solution?

Process huge volume of data using Java

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

教程

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云智能顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云AI代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

功能1上新10个字符

功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符。

功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符。

功能5描述100个字符功能5描述100个字符功能5描述100个字符功能5描述100个字符功能5描述100个字符功能5描述100个字符

功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符

功能4上新

文章&问答评论现已支持表情

全新交互，全新视觉，新增快捷键、悬浮工具栏、高亮块等功能并同时优化现有功能，全面提升创作效率和体验

社区富文本编辑器全新改版！诚邀体验～ 

精选全网热门MCP server，让你的AI更好用 🚀

💥开发者 MCP广场重磅上线！

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

聚焦“写作效率、视觉美观与运行性能”三方面进行全面升级，为您提供更高效、稳定的创作环境

社区富文本&Markdown编辑器全新改版上线，欢迎大家体验!

诚挚邀请您参与本次调研，分享您的真实使用感受与建议。您的反馈至关重要，感谢您的支持与参与！

社区新版编辑器体验调研

作为需求的一部分，我们需要处理近300万条记录，并将它们与一个存储桶相关联。这种关联是由一组规则(由5-15个属性组成，具有单个或范围的值和优先级)决定的，这些规则派生出记录的存储桶。对如此大的数字进行顺序处理显然超出了范围。有人可以指导我们有效地设计解决方案的方法吗？

问使用Java处理海量数据
EN

回答 10

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用Java处理海量数据EN

回答 10

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用Java处理海量数据
EN