文章/答案/技术大牛

发布

问Java并行文件处理
EN

Stack Overflow用户

提问于 2012-08-01 00:24:20

回答 2查看 1.7K关注 0票数 6

我有以下代码：

import java.io.*;
import java.util.concurrent.* ;
public class Example{
public static void main(String args[]) {
    try {
        FileOutputStream fos = new FileOutputStream("1.dat");
        DataOutputStream dos = new DataOutputStream(fos);

        for (int i = 0; i < 200000; i++) {
            dos.writeInt(i);
        }
        dos.close();                                                         // Two sample files created

        FileOutputStream fos1 = new FileOutputStream("2.dat");
        DataOutputStream dos1 = new DataOutputStream(fos1);

        for (int i = 200000; i < 400000; i++) {
            dos1.writeInt(i);
        }
        dos1.close();

        Exampless.createArray(200000); //Create a shared array
        Exampless ex1 = new Exampless("1.dat");
        Exampless ex2 = new Exampless("2.dat");
        ExecutorService executor = Executors.newFixedThreadPool(2); //Exexuted parallaly to cont number of matches in two file
        long startTime = System.nanoTime();
        long endTime;
        Future<Integer> future1 = executor.submit(ex1);
        Future<Integer> future2 = executor.submit(ex2);
        int count1 = future1.get();
        int count2 = future2.get();
        endTime = System.nanoTime();
        long duration = endTime - startTime;
        System.out.println("duration with threads:"+duration);
        executor.shutdown();
        System.out.println("Matches: " + (count1 + count2));

        startTime = System.nanoTime();
        ex1.call();
        ex2.call();
        endTime = System.nanoTime();
        duration = endTime - startTime;
        System.out.println("duration without threads:"+duration);

    } catch (Exception e) {
        System.err.println("Error: " + e.getMessage());
    }
}
}

class Exampless implements Callable {

public static int[] arr = new int[20000];
public String _name;

public Exampless(String name) {
    this._name = name;
}

static void createArray(int z) {
    for (int i = z; i < z + 20000; i++) { //shared array
        arr[i - z] = i;
    }
}

public Object call() {
    try {
        int cnt = 0;
        FileInputStream fin = new FileInputStream(_name);
        DataInputStream din = new DataInputStream(fin);      // read file and calculate number of matches
        for (int i = 0; i < 20000; i++) {
            int c = din.readInt();
            if (c == arr[i]) {
                cnt++;
            }
        }
        return cnt ;
    } catch (Exception e) {
        System.err.println("Error: " + e.getMessage());
    }
    return -1 ;
}

}

我正在尝试计算包含两个文件的数组中的匹配数。现在，虽然我在两个线程上运行它，但代码运行得不是很好，因为：

(单线程运行，文件1+文件2读取时间)<(文件1 ||文件2多线程读取时间)。

谁能帮我解决这个问题(我有2核CPU和文件大小约。1.5 GB)。

java

multithreading

file-handling

回答 2

Stack Overflow用户

回答已采纳

发布于 2012-08-01 00:32:59

在第一种情况下，您将逐个字节、逐个块地顺序读取一个文件。这是磁盘I/O所能达到的最快速度，前提是文件不是很零碎。当您处理完第一个文件时，disk/OS会找到第二个文件的开头，并继续对磁盘进行非常高效的线性读取。

在第二种情况下，您经常在第一个和第二个文件之间切换，迫使磁盘从一个位置查找到另一个位置。这个额外的搜索时间(大约10毫秒)是你困惑的根源。

哦，你知道磁盘访问是单线程的，你的任务是I/O受限的，所以只要你从同一物理磁盘读取数据，就没有办法把这个任务分成多个线程来完成。只有在以下情况下，您的方法才是合理的：

除从文件读取外，每个线程还会执行一些CPU密集型或阻塞操作，与I/O相比要慢一个数量级。
文件位于不同的物理驱动器上(不同的分区不够)，或者在某些RAID驱动器上使用固态硬盘

票数 7

Stack Overflow用户

发布于 2012-08-01 00:54:21

正如Tomasz指出的那样，从磁盘读取数据不会从多线程中获得任何好处。如果你多线程检查，你可能会在速度上得到一些改进，例如，你顺序地将文件中的数据加载到数组中，然后线程并行执行检查。但是考虑到你的文件很小(大约80kb)，而且你只是在比较int，我怀疑性能的提升是否值得你的努力。

如果您不使用readInt()，那么一定会提高执行速度。因为您知道要比较的是20000个整数，所以您应该为每个文件(或至少以块为单位)一次将所有20000个整数读入一个数组，而不是调用readInt()函数20000次。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/11744730

复制

相似问题

问Java并行文件处理
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Java并行文件处理EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Java并行文件处理
EN