首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >计算排序列表中单词的频率

计算排序列表中单词的频率
EN

Stack Overflow用户
提问于 2014-10-15 18:07:03
回答 2查看 98关注 0票数 0
代码语言:javascript
复制
public static void frequencyFinder() throws FileNotFoundException, IOException {
    String foldername = ".../Meta_Oct/separate";
    File folder = new File(foldername);
    File[] listOfFiles = folder.listFiles();


    String line;
    for (int x = 0; x < listOfFiles.length; x++) {
        BufferedReader in = new BufferedReader(new FileReader(listOfFiles[x]));
        String filename = listOfFiles[x].getName();
        String language = filename.split("@")[0];
        String target = filename.split("@")[1];
        String source = filename.split("@")[2];
        int frequency = 0;

        while ((line = in.readLine()) != null) {
            lemma_match = line.split(";")[3];
            frequency = 1;
            while((in.readLine().split(";")[3]).equals(lemma_match)){                 
                frequency++;
                line = in.readLine();                    
            }

            System.out.println(target + ":" + source +":"+lemma_match + ":" + frequency);
            frequency = 0;                
            lemma_match = null;
        }


    }
}

必须计算最后一栏中单词的频率。问题是,while循环跳过了一些线路,并在NullPointerExceptions中结束,并且在这一点之前并不是所有的频率都计算出来。我已经在下面附加了堆栈跟踪,以及示例文件。

代码语言:javascript
复制
EN;GOVERNMENT;DISEASE;bristle at 
EN;GOVERNMENT;DISEASE;contract 
EN;GOVERNMENT;DISEASE;detect in 
EN;GOVERNMENT;DISEASE;detect in 
EN;GOVERNMENT;DISEASE;immunize against 
EN;GOVERNMENT;DISEASE;inherit from 
EN;GOVERNMENT;DISEASE;spread 
EN;GOVERNMENT;DISEASE;spread 
EN;GOVERNMENT;DISEASE;spread 
EN;GOVERNMENT;DISEASE;stave off 
EN;GOVERNMENT;DISEASE;stave off 
EN;GOVERNMENT;DISEASE;transmit 
EN;GOVERNMENT;DISEASE;treat 
EN;GOVERNMENT;DISEASE;treat 
EN;GOVERNMENT;DISEASE;treat as 
EN;GOVERNMENT;DISEASE;treat by 
EN;GOVERNMENT;DISEASE;ward off 

堆栈跟踪:

代码语言:javascript
复制
GOVERNMENT:DISEASE:bristle at :1
GOVERNMENT:DISEASE:detect in :2
GOVERNMENT:DISEASE:spread :2
GOVERNMENT:DISEASE:stave off :1
Exception in thread "main" java.lang.NullPointerException
GOVERNMENT:DISEASE:treat :2
    at javaapplication6.FrequencyFinder.frequencyFinder(FrequencyFinder.java:53)
    at javaapplication6.FrequencyFinder.main(FrequencyFinder.java:26)
Java Result: 1
EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2014-10-15 18:10:56

以下代码有问题:

代码语言:javascript
复制
    while ((line = in.readLine()) != null) { // here you read a line
        lemma_match = line.split(";")[3];
        frequency = 1;
        while((in.readLine().split(";")[3]).equals(lemma_match)){ // here you read
                                                                  // another line
            frequency++;
            line = in.readLine(); // here you read another line                   
        }

由于您在此代码中的3处读取了一个新行,因此不会增加所有这些读取的频率。例如,在内部循环的每一次迭代中,您将读取两行,但只会增加一次frequency。即使您修复了内循环,当内部while循环结束时和外部while循环读取新行时,仍然会遗漏一些行。

此外,内部while循环将为您提供NullPointerException,因为您在尝试对in.readLine() != null进行split之前不会检查它。

现在让我们看看如何用一个循环来完成这个任务:

代码语言:javascript
复制
    String lemma_match = "";
    while ((line = in.readLine()) != null) {
        String new_lemma_match = line.split(";")[3];
        if (!lemma_match.equals(new_lemma_match)) { // start count for a new lemma
            if (!lemma_match.equals("")) {
                System.out.println(target + ":" + source +":"+lemma_match + ":" + frequency);
            }
            lemma_match=new_lemma_match;
            frequency = 1; // initialize frequency for new lemma
        } else {
            frequency++; // increase frequency for current lemma
        }
    }
票数 1
EN

Stack Overflow用户

发布于 2014-10-15 18:41:44

继续在hashmap中添加条目。对于每个唯一的条目(键),值都会增加。最后你会得到你的结果。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/26389195

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档