首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >求职面试数据分析程序

求职面试数据分析程序
EN

Code Review用户
提问于 2020-01-04 19:23:39
回答 1查看 261关注 0票数 6

在面试Java开发人员的初级职位(我希望:)之前,我被要求做一个测试任务。请你检查一下我的代码。现在程序参数是硬编码的,而不是来自args[]的,明天我将添加参数处理。

您也可以在GitHub上看到我的代码。如有任何反馈,我将不胜感激。谢谢!

项目结构:

任务说明:

编写一个JAVA程序,它将: 1。生成一个具有随机数字(范围从1到2^64−1整数)数据的文件。文件大小受命令行选项的限制。默认文件大小限制为64 MB。每个随机数由空间分隔(ASCII代码32)。程序将需要一个参数,即文件名为generated. 2。读取步骤1中生成的文件,分析它并将其输出到控制台。输出应该包括:stackNewline1.10条形图form. 2中最常见的数字。numbers. 3的计数。阿姆斯特朗numbers. 4的计数。单独输出读取和分析file.所需的时间--注意: 1。检查错误handling. 2。保持代码整洁和格式化,遵循基本的JAVA命名conventions. 3。程序速度很重要,可以使用并行处理。

主要班:

代码语言:javascript
复制
package ee.raintree.test.numbers;

import java.io.File;
import java.io.IOException;
import java.math.BigInteger;
import java.nio.file.Files;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;

public class Main {
    private final static String SPACE = " ";
    private static int fileSize = 67108864;
    private static String fileName;

    public static void main(String args[]) throws InterruptedException, ExecutionException, IOException {
        fileName = "result";
        File result = new File(fileName);
        int coreCount = Runtime.getRuntime().availableProcessors();
        ExecutorService service = Executors.newFixedThreadPool(coreCount);

        // Part 1: Generate numbers and write them to file
        List<File> tmpFiles = new ArrayList<>();
        List<Future> futureTmpFiles = new ArrayList<>();
        for (int i = 0; i < coreCount; i++) {
            Future<File> futureTmpFile = service.submit(new TmpNumbersFileCreator(fileSize / coreCount));
            futureTmpFiles.add(futureTmpFile);
        }
        for (int i = 0; i < coreCount; i++) {
            Future<File> futureTmpFile = futureTmpFiles.get(i);
            tmpFiles.add(futureTmpFile.get());
        }

        IOCopier.joinFiles(result, tmpFiles);

        // Part 2: Read numbers from file and analyze them
        long readAndAnalyzeStart = System.currentTimeMillis();

        List<BigInteger> numbers = new ArrayList<>();
        for (String line : Files.readAllLines(result.toPath())) {
            for (String part : line.split(SPACE)) {
                numbers.add(new BigInteger(part));
            }
        }

        int listSize = numbers.size();
        int chunkListSize = listSize / coreCount + 1;
        List<List<BigInteger>> lists = ListSplitter.ofSize(numbers, chunkListSize);

        int countOfPrimeNumbers = 0;
        int countOfArmstrongNumbers = 0;

        List<Future> futurePrimeCounts = new ArrayList<>();
        for(int i = 0; i < coreCount; i++) {
            final int j = i;
            Future<Integer> futurePrimeCount = service.submit(new Callable<Integer>() {
                @Override
                public Integer call() throws Exception {
                    int primeCount = 0;
                    for(BigInteger number : lists.get(j)) {
                        if(number.isProbablePrime(128)) {
                            primeCount++;
                        }
                    }
                    return primeCount;
                }
            });
            futurePrimeCounts.add(futurePrimeCount);
        }

        for (int i = 0; i < coreCount; i++) {
            Future<Integer> futurePrimeCount = futurePrimeCounts.get(i);
            countOfPrimeNumbers = countOfPrimeNumbers + futurePrimeCount.get();
        }

        List<Future> futureArmstrongCounts = new ArrayList<>();
        for(int i = 0; i < coreCount; i++) {
            final int j = i;
            Future<Integer> futureArmstrongCount = service.submit(new Callable<Integer>() {
                @Override
                public Integer call() throws Exception {
                    int armstrongCount = 0;
                    for(BigInteger number : lists.get(j)) {
                        if(MathUtils.isArmstrongNumber(number)) {
                            armstrongCount++;
                        }
                    }
                    return armstrongCount;
                }
            });
            futureArmstrongCounts.add(futureArmstrongCount);
        }

        for (int i = 0; i < coreCount; i++) {
            Future<Integer> futureArmstrongCount = futureArmstrongCounts.get(i);
            countOfArmstrongNumbers = countOfArmstrongNumbers + futureArmstrongCount.get();
        }

        service.shutdown();
        long readAndAnalyzeEnd = System.currentTimeMillis();

        // Part 3: Printing result
        System.out.println("Read and analysis done. Thak took " + (readAndAnalyzeEnd - readAndAnalyzeStart) + " milliseconds.");
        System.out.println("Prime numbers count: " + countOfPrimeNumbers);
        System.out.println("Prime numbers count: " + countOfArmstrongNumbers);
        System.out.println("10 most frequently appeared numbers in bar chart form:");
        Map<BigInteger, Integer> numbersFreqMap = MapUtils.getSortedFreqMapFromList(numbers);
        BarChartPrinter printer = new BarChartPrinter(numbersFreqMap);
        printer.print();

    }
}    

BarChartPrinter类:

代码语言:javascript
复制
package ee.raintree.test.numbers;

import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;

public class BarChartPrinter<T> {
    private final static String BAR = "|";
    private final static String SPACE = " ";
    List<Entry<T, Integer>> listOfEntries;
    private int chartsCount = 10;
    private int longestEntrySize;
    private int barChartStep;

    public BarChartPrinter(Map<T, Integer> map) {
        listOfEntries = new ArrayList<Entry<T, Integer>>(map.entrySet());
        if (listOfEntries.size() < chartsCount) {
            chartsCount = listOfEntries.size();
        }
        barChartStep = listOfEntries.get(chartsCount - 1).getValue();
    }

    public void print() {
        setLongestEntrySize();
        printBarChart();
    }

    private void printBarChart() {
        for (int i = 0; i < chartsCount; i++) {
            Entry<T, Integer> entry = listOfEntries.get(i);
            int barsCount = entry.getValue() / barChartStep;
            System.out.print(entry.getKey() + getAdditionalSpaces(entry.getKey().toString())  + SPACE);
            for (int bars = 0; bars < barsCount; bars++) {
                System.out.print(BAR);
            }
            System.out.println();
        }
    }

    private void setLongestEntrySize() {
        int longest = 0;
        for(int i = 0; i < chartsCount; i++) {
            if(listOfEntries.get(i).getKey().toString().length() > longest) {
                longest = listOfEntries.get(i).getKey().toString().length();
            }
        }

        longestEntrySize = longest;
    }

    private String getAdditionalSpaces(String string) {
        StringBuilder sb = new StringBuilder();
        int needSpaces = longestEntrySize - string.length();
        for(int i = 0; i < needSpaces; i++) {
            sb.append(SPACE);
        }
        return sb.toString();
    }
}

IOCopier类,完全抄袭自某些半官方来源:

代码语言:javascript
复制
package ee.raintree.test.numbers;

import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.List;

import org.apache.commons.io.IOUtils;

class IOCopier {
    public static void joinFiles(File destination, List<File> sources) {
        try (OutputStream output = createAppendableStream(destination)) {
            for (File source : sources) {
                appendFile(output, source);
            }
        } catch (IOException e) {
            System.out.println("Error joining files");
        }
    }

    private static BufferedOutputStream createAppendableStream(File destination) throws FileNotFoundException {
        return new BufferedOutputStream(new FileOutputStream(destination, true));
    }

    private static void appendFile(OutputStream output, File source) {
        try (InputStream input = new BufferedInputStream(new FileInputStream(source))) {
            IOUtils.copy(input, output);
        } catch (IOException e) {
            System.out.println("Error appending file");
        }
    }
}

ListSplitter,完全是从半官方来源复制的:

代码语言:javascript
复制
package ee.raintree.test.numbers;

import java.util.AbstractList;
import java.util.ArrayList;
import java.util.List;

public class ListSplitter<T> extends AbstractList<List<T>> {

    private final List<T> list;
    private final int chunkSize;

    public ListSplitter(List<T> list, int chunkSize) {
        this.list = new ArrayList<>(list);
        this.chunkSize = chunkSize;
    }

    public static <T> ListSplitter<T> ofSize(List<T> list, int chunkSize) {
        return new ListSplitter<>(list, chunkSize);
    }

    @Override
    public List<T> get(int index) {
        int start = index * chunkSize;
        int end = Math.min(start + chunkSize, list.size());

        if (start > end) {
            throw new IndexOutOfBoundsException("Index " + index + " is out of the list range <0," + (size() - 1) + ">");
        }

        return new ArrayList<>(list.subList(start, end));
    }

    @Override
    public int size() {
        return (int) Math.ceil((double) list.size() / (double) chunkSize);
    }
}

MapUtils类:

代码语言:javascript
复制
package ee.raintree.test.numbers;

import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.Map.Entry;

public class MapUtils {

    public static <T> Map<T, Integer> getSortedFreqMapFromList(List<T> list) {
        Map<T, Integer> map = getFreqMapFromList(list);
        Set<Entry<T, Integer>> entries = map.entrySet();
        List<Entry<T, Integer>> listOfEntries = new ArrayList<Entry<T, Integer>>(entries);
        Collections.sort(listOfEntries, getValueDescComparator());
        Map<T, Integer> sortedByValue = new LinkedHashMap<T, Integer>(listOfEntries.size());
        for (Entry<T, Integer> entry : listOfEntries) {
            sortedByValue.put(entry.getKey(), entry.getValue());
        }
        return sortedByValue;
    }

    private static <T> Map<T, Integer> getFreqMapFromList(List<T> list) {
        Map<T, Integer> result = new HashMap<>();
        for (T item : list) {
            if (result.get(item) == null) {
                result.put(item, 1);
            } else {
                result.put(item, result.get(item) + 1);
            }
        }
        return result;
    }

    private static <T> Comparator<Entry<T, Integer>> getValueDescComparator() {
        Comparator<Entry<T, Integer>> valueComparator = new Comparator<Entry<T, Integer>>() {
            @Override
            public int compare(Entry<T, Integer> e1, Entry<T, Integer> e2) {
                Integer v1 = e1.getValue();
                Integer v2 = e2.getValue();
                return v2.compareTo(v1);
            }
        };
        return valueComparator;
    }
}

MathUtilsClass:

代码语言:javascript
复制
package ee.raintree.test.numbers;

import java.math.BigInteger;

public class MathUtils {
    public static boolean isArmstrongNumber(BigInteger number) {
        String numberInString = number.toString();
        int digitsCount = number.toString().length();
        int power = digitsCount;
        BigInteger sum = BigInteger.ZERO;

        for (int i = 0; i < digitsCount; i++) {
            int digit = Character.getNumericValue(numberInString.charAt(i));
            BigInteger digitInPower = BigInteger.valueOf(digit).pow(power);
            sum = sum.add(digitInPower);
        }

        return sum.compareTo(number) == 0;
    }
}

TmpNumbersFileCreator类:

代码语言:javascript
复制
package ee.raintree.test.numbers;

import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.PrintWriter;
import java.math.BigInteger;
import java.util.Random;
import java.util.concurrent.Callable;

public class TmpNumbersFileCreator implements Callable<File> {
    private File file;
    private PrintWriter printWriter;
    private static final String SEPARATOR = " ";
    private int size;

    public TmpNumbersFileCreator(int size) {
        this.size = size;
    }

    @Override
    public File call() throws Exception {
        return getTempFile();
    }

    public File getTempFile() {
        createTempFile();
        writeNumbersToFile();
        return file;
    }

    private void createTempFile() {
        try {
            file = File.createTempFile("numbers-", "-txt");
            file.deleteOnExit();
        } catch (IOException e) {
            System.out.println("Temporary file creation failed");
        }
    }

    private void writeNumbersToFile() {
        try {
            printWriter = new PrintWriter(file);
        } catch (FileNotFoundException e) {
            System.out.println("Temporary file not found");
        }
        while (!isFileSizeMax()) {
            printWriter.write(getRandomNumber().toString() + SEPARATOR);
        }
        printWriter.flush();
        printWriter.close();
    }

    private BigInteger getRandomNumber() {
        Random random = new Random();
        BigInteger number;
        do {
            number = new BigInteger(64, random);
        } while (number.equals(BigInteger.ZERO));
        return number;
    }

    private boolean isFileSizeMax() {
        if (file.length() <= size) {
            return false;
        }
        return true;
    }
}
EN

回答 1

Code Review用户

发布于 2020-01-06 20:09:18

在实现多线程随机数生成器和分析器之前,您是否测量了运行时间?我打赌合并文件比从并发中获得更多的时间(IO很慢)。这将是不成熟的优化和危险的标志。

除了将参数解析为业务逻辑所理解的格式之外,主方法不应包含任何其他逻辑。您应该将数字生成器、数字分析器和数字打印机作为一个自包含类,并让主方法在它们之间传递数据。研究单一责任原则。

我想你应该印两次:阅读时间和分析时间。

您将这些数字读取到内存中,并在其上循环三次(所以是四个循环)。您应该能够在读取文件中的数字时进行分析(一个循环)。同样,您是否测量了多线程分析与单线程分析的效果?任务没有指定文件大小的上限,因此通过将数据读入内存,您从JVM内存中创建了一个不必要的人为限制。

我在期待一些评论,解释为什么你选择像你那样编码。

ListSplitter做了很多不必要的复制。它不应该扩展AbstractList,因为一个简单的实用方法就足够了。如果您提交复制的代码,始终尝试复制良好的代码。:)

每次创建随机数时,都要创建一个新的Random实例。这是不必要的,完全是浪费时间。随机应该是一个实例变量。

在写入之前将分隔符连接到数字是不必要的浪费时间,因为它创建了一个新的立即释放的字符串对象。先写数字,然后写分隔符(作为字符,而不是字符串)。

文件大小检查中返回true或false的if语句只会产生不必要的认知负载。只需写:

代码语言:javascript
复制
return file.length() > size;

检查通过调用file.length()编写的字节数是非常昂贵的,因为它一直到文件系统以获得结果。它也没有考虑到在写作过程中可能发生的任何缓冲,可能导致错误。简单地保持写入字节数的整数计数器将更有效。

您正在使用PrintWriter编写数字,但没有使用它的任何特殊功能。它给人的印象是你不熟悉IO类。您应该使用BufferedWriter来获得缓冲写入的速度优势(您现在需要手动计算所写的字节)。

不要忘记指定文件的字符编码!尽管您只是在编写数字和空格,并且产生的文件很可能总是与ASCII兼容的,但明确地指定它告诉读者,您并不是经常依赖系统默认编码在生产中造成字符编码问题的人之一。

这是一个特别糟糕的复制粘贴,因为它是很难阅读和非常低效率。您应该首先获得变量的值,并在if语句和赋值中使用它。

代码语言:javascript
复制
if(listOfEntries.get(i).getKey().toString().length() > longest) {
    longest = listOfEntries.get(i).getKey().toString().length();
票数 3
EN
页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://codereview.stackexchange.com/questions/235086

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档