首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >支持拼写检查器的一组函数。

支持拼写检查器的一组函数。
EN

Code Review用户
提问于 2016-09-18 14:37:35
回答 1查看 70关注 0票数 3

这是一个程序,它将字典加载到哈希表中,并拼写检查作为命令行参数提供的文本文件。字典的格式如下(按字母顺序):

代码语言:javascript
复制
python
bar
foo
code
review

程序完成后,我将获得如下输出:

这是一个问题集的一部分,在这个问题集中,我得到了一个在(speller.c)之上构建的基础,我将在下面提供它,但不需要输入。尽管我的目标是创建尽可能高效、最快的代码,但我希望输入主要集中在我的代码样式上,比如可读性和注释。再次,如果你决定阅读拼写c,请不要集中你的批评,因为它不是我写的。

其中一些要求是:

  • 您可能假设传递给您的程序的任何字典的结构都与我们的完全一样,从上到下按字典顺序排列,每行有一个单词,每一行以\n结尾。您还可以假设字典至少包含一个单词,任何单词都不会长于长度(字典中定义的一个常量)字符,任何单词都不会出现超过一次,每个单词只包含小写字母字符和可能的撇号。
  • 您可能会假设check将只传递带有字母字符和/或撇号的字符串。
  • 检查的实现必须不区分大小写.

speller.c

代码语言:javascript
复制
#include <ctype.h>
#include <stdio.h>
#include <sys/resource.h>
#include <sys/time.h>

#include "dictionary.h"
#undef calculate
#undef getrusage

// default dictionary
#define DICTIONARY "dictionaries/large"

// prototype
double calculate(const struct rusage* b, const struct rusage* a);

int main(int argc, char* argv[])
{
    // check for correct number of args
    if (argc != 2 && argc != 3)
    {
        printf("Usage: speller [dictionary] text\n");
        return 1;
    }

    // structs for timing data
    struct rusage before, after;

    // benchmarks
    double time_load = 0.0, time_check = 0.0, time_size = 0.0, time_unload = 0.0;

    // determine dictionary to use
    char* dictionary = (argc == 3) ? argv[1] : DICTIONARY;

    // load dictionary
    getrusage(RUSAGE_SELF, &before);
    bool loaded = load(dictionary); // THIS IS LOAD
    getrusage(RUSAGE_SELF, &after);

    // abort if dictionary not loaded
    if (!loaded)
    {
        return 1;
    }

    // calculate time to load dictionary
    time_load = calculate(&before, &after);

    // try to open text
    char* text = (argc == 3) ? argv[2] : argv[1];
    FILE* fp = fopen(text, "r");
    if (fp == NULL)
    {
        printf("Could not open %s.\n", text);
        unload();
        return 1;
    }

    // prepare to report misspellings
    printf("\nMISSPELLED WORDS\n\n");

    // prepare to spell-check
    int index = 0, misspellings = 0, words = 0;
    char word[LENGTH+1];

    // spell-check each word in text
    for (int c = fgetc(fp); c != EOF; c = fgetc(fp))
    {
        // allow only alphabetical characters and apostrophes
        if (isalpha(c) || (c == '\'' && index > 0))
        {
            // append character to word
            word[index] = c;
            index++;

            // ignore alphabetical strings too long to be words
            if (index > LENGTH)
            {
                // consume remainder of alphabetical string
                while ((c = fgetc(fp)) != EOF && isalpha(c));

                // prepare for new word
                index = 0;
            }
        }

        // ignore words with numbers (like MS Word can)
        else if (isdigit(c))
        {
            // consume remainder of alphanumeric string
            while ((c = fgetc(fp)) != EOF && isalnum(c));

            // prepare for new word
            index = 0;
        }

        // we must have found a whole word
        else if (index > 0)
        {
            // terminate current word
            word[index] = '\0';

            // update counter
            words++;

            // check word's spelling
            getrusage(RUSAGE_SELF, &before);
            bool misspelled = !check(word);
            getrusage(RUSAGE_SELF, &after);

            // update benchmark
            time_check += calculate(&before, &after);

            // print word if misspelled
            if (misspelled)
            {
                printf("%s\n", word);
                misspellings++;
            }

            // prepare for next word
            index = 0;
        }
    }

    // check whether there was an error
    if (ferror(fp))
    {
        fclose(fp);
        printf("Error reading %s.\n", text);
        unload();
        return 1;
    }

    // close text
    fclose(fp);

    // determine dictionary's size
    getrusage(RUSAGE_SELF, &before);
    unsigned int n = size();
    getrusage(RUSAGE_SELF, &after);

    // calculate time to determine dictionary's size
    time_size = calculate(&before, &after);

    // unload dictionary
    getrusage(RUSAGE_SELF, &before);
    bool unloaded = unload();
    getrusage(RUSAGE_SELF, &after);

    // abort if dictionary not unloaded
    if (!unloaded)
    {
        printf("Could not unload %s.\n", dictionary);
        return 1;
    }

    // calculate time to unload dictionary
    time_unload = calculate(&before, &after);

    // report benchmarks
    printf("\nWORDS MISSPELLED:     %d\n", misspellings);
    printf("WORDS IN DICTIONARY:  %d\n", n);
    printf("WORDS IN TEXT:        %d\n", words);
    printf("TIME IN load:         %.2f\n", time_load);
    printf("TIME IN check:        %.2f\n", time_check);
    printf("TIME IN size:         %.2f\n", time_size);
    printf("TIME IN unload:       %.2f\n", time_unload);
    printf("TIME IN TOTAL:        %.2f\n\n",
     time_load + time_check + time_size + time_unload);

    // that's all folks
    return 0;
}

/**
 * Returns number of seconds between b and a.
 */
double calculate(const struct rusage* b, const struct rusage* a)
{
    if (b == NULL || a == NULL)
    {
        return 0.0;
    }
    else
    {
        return ((((a->ru_utime.tv_sec * 1000000 + a->ru_utime.tv_usec) -
                 (b->ru_utime.tv_sec * 1000000 + b->ru_utime.tv_usec)) +
                ((a->ru_stime.tv_sec * 1000000 + a->ru_stime.tv_usec) -
                 (b->ru_stime.tv_sec * 1000000 + b->ru_stime.tv_usec)))
                / 1000000.0);
    }
}

dictionary.c

代码语言:javascript
复制
/*
* dictionary.c
* Implements speller.c's functionality
*/

#include <string.h>
#include <ctype.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>

#include "dictionary.h"


// Global variables and arrays for simple access
char buffer[LENGTH+1] = {'\0'}; // Declares the buffer to store one line or one word in total
unsigned int wordcount = 0;
int hash_index;

node* hashtable[HASHTABLE_SIZE] = {NULL}; // Stores pointer to node datatype
node* current_head_node;  // Will always be the current node, or first in a linked list
FILE* dictionary_ref;


/**
 * Returns true if word is in dictionary else false.
 */
bool check(const char* word)
{
    // Load word into buffer, and convert it to lower
    for (int i = 0, wordlen = strlen(word); i < wordlen; i++) {
        buffer[i] = tolower(word[i]);
        buffer[i+1] = '\0'; // Make sure the word gets terminated
    }

    // Figure out where word in buffer belongs, and make head node to point there
    hash_index = hash(buffer);
    current_head_node = hashtable[hash_index];

    // Iterate through a linked list
    while (current_head_node != NULL) {

        // Check if word in buffer is also in dictionary
        if (strcmp(buffer, current_head_node->word) == 0) {
            swap(¤t_head_node, &hashtable[hash_index]);
            return true;
        }

        // Go to next node
        current_head_node = current_head_node->next;
    }

    return false;
}

 /**
 * Loads dictionary into memory. Returns true if successful else false.
 */
bool load(const char* dictionary)
{
    // Open dictionary file, and error-check
    dictionary_ref = fopen(dictionary, "r");
    if (dictionary_ref == NULL)
        return false;


    while (fscanf(dictionary_ref, "%s", buffer) > 0) {

        // Figure which index to store word in buffer
        hash_index = hash(buffer);

        // Create new node
        current_head_node = malloc(sizeof(node));
        if (current_head_node == NULL) {
            printf("malloc returned NULL");
            return false;
        }

        strcpy(current_head_node->word, buffer);

        // Figure out whether or not the i'th element of array has been used,
        // and act accordingly
        if (hashtable[hash_index] == NULL)
            current_head_node->next = NULL;

        else
            current_head_node->next = hashtable[hash_index];

        // Point hashtable to the most recent addition
        hashtable[hash_index] = current_head_node;

        wordcount++;
    }

    return true;
}


/**
 * Unloads dictionary from memory. Returns true if successful else false.
 */
bool unload(void)
{
    // Close dictionary reference
    fclose(dictionary_ref);

    node* previous_head = NULL;

    for (int i = 0; i < HASHTABLE_SIZE; i++) { // Iterate through hashtable
        current_head_node = hashtable[i];

        while (current_head_node != NULL) { // Iterate the list

            // Be sure we don't free hashtable[0 - 1]
            if (previous_head != NULL)
                free(previous_head);

            // Advance previous_head to current_head,
            // and then iterate current_head to the next
            previous_head = current_head_node;
            current_head_node = current_head_node->next;

        }
        free(previous_head); // Free the last element in linked list
        previous_head = NULL; // Reset so that we can repeat the above
    }

    return true;
}
EN

回答 1

Code Review用户

回答已采纳

发布于 2016-09-18 22:19:24

  1. 建议删除除少数需要的所有全局变量,并使这些static。除了hashtable[],其余的都是本地的建筑师,
  2. 缺少#include "dictionary.h",当然这应该包括在文章中。
  3. dictionary.c应该确保dictionary.h是自我的,包括所有需要的<*.h>文件。通过让dictionary.c首先包含dictionary.h,这是很容易测试的。#包括"dictionary.h“#include # .//包括"dictionary.h”
  4. LENGTH没有定义,希望这样的通用命名定义不会出现在"dictionary.h"中。如果是的话,请推荐一本像DICTIONARY_BUFFER_LENGTH这样的字典。HASHTABLE_SIZE也一样。
  5. 没有声明node。和上面一样,node是一个非常通用的类型名称。建议dictionary_noode或类似的。
  6. 使用类型size_t作为数组索引/大小类型。类型int可能不够。// for (int i= 0,wordlen = strlen(word);i< wordlen;i++) { for (size_t i= 0,wordlen = strlen(word);i< wordlen;i++) {
  7. buffer[]空字符终止只需要在循环之后发生一次。// for (int i= 0,wordlen = strlen( word );i< wordlen;i++) { // buffer我 =tolower(单词我);// bufferI+1 = '\0';//确保单词被终止// } size_t = wordlen = strlen(word);for (size_t i= 0;i< wordlen;i++) { buffer我 =tolower(单词我);} buffer道伦 = '\0';
  8. word, next未在current_head_node->wordcurrent_head_node->next;中声明
  9. 由于node* current_head_node;的声明与malloc()调用相去甚远,因此评审需要额外的时间来确定类型是否正确。对于更容易正确编写的代码,请检查和维护。// current_head_node =malloc(节点);current_head_node =malloc( *current_head_node);
  10. 代码缺乏溢出保护。// fscanf(dictionary_ref,"%s",缓冲器) fscanf(dictionary_ref,"%(some_value)s",缓冲区)
  11. 不需要测试。free(NULL)定义得很好,实际上是没有操作的.// if (previous_head != NULL)空闲(Previous_head);空闲(Previous_head);
  12. 目标:“检查的实现必须是不区分大小写的.”如果仅限于A-Za-z,那么转换到更低或更高的位置几乎没有什么区别。当使用扩展(8位)字符集时,健壮的代码将使用往返,因为扩展的字母表并不总是有1到1的映射。ch =toupper(tolower(无符号字符)ch));
票数 2
EN
页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://codereview.stackexchange.com/questions/141711

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档