文章/答案/技术大牛

发布

社区首页 >问答首页 >支持拼写检查器的一组函数。

问支持拼写检查器的一组函数。
EN

Code Review用户

提问于 2016-09-18 14:37:35

回答 1查看 70关注 0票数 3

这是一个程序，它将字典加载到哈希表中，并拼写检查作为命令行参数提供的文本文件。字典的格式如下(按字母顺序)：

python
bar
foo
code
review

程序完成后，我将获得如下输出：

这是一个问题集的一部分，在这个问题集中，我得到了一个在(speller.c)之上构建的基础，我将在下面提供它，但不需要输入。尽管我的目标是创建尽可能高效、最快的代码，但我希望输入主要集中在我的代码样式上，比如可读性和注释。再次，如果你决定阅读拼写c，请不要集中你的批评，因为它不是我写的。

其中一些要求是：

您可能假设传递给您的程序的任何字典的结构都与我们的完全一样，从上到下按字典顺序排列，每行有一个单词，每一行以\n结尾。您还可以假设字典至少包含一个单词，任何单词都不会长于长度(字典中定义的一个常量)字符，任何单词都不会出现超过一次，每个单词只包含小写字母字符和可能的撇号。
您可能会假设check将只传递带有字母字符和/或撇号的字符串。
检查的实现必须不区分大小写.

speller.c

#include <ctype.h>
#include <stdio.h>
#include <sys/resource.h>
#include <sys/time.h>

#include "dictionary.h"
#undef calculate
#undef getrusage

// default dictionary
#define DICTIONARY "dictionaries/large"

// prototype
double calculate(const struct rusage* b, const struct rusage* a);

int main(int argc, char* argv[])
{
    // check for correct number of args
    if (argc != 2 && argc != 3)
    {
        printf("Usage: speller [dictionary] text\n");
        return 1;
    }

    // structs for timing data
    struct rusage before, after;

    // benchmarks
    double time_load = 0.0, time_check = 0.0, time_size = 0.0, time_unload = 0.0;

    // determine dictionary to use
    char* dictionary = (argc == 3) ? argv[1] : DICTIONARY;

    // load dictionary
    getrusage(RUSAGE_SELF, &before);
    bool loaded = load(dictionary); // THIS IS LOAD
    getrusage(RUSAGE_SELF, &after);

    // abort if dictionary not loaded
    if (!loaded)
    {
        return 1;
    }

    // calculate time to load dictionary
    time_load = calculate(&before, &after);

    // try to open text
    char* text = (argc == 3) ? argv[2] : argv[1];
    FILE* fp = fopen(text, "r");
    if (fp == NULL)
    {
        printf("Could not open %s.\n", text);
        unload();
        return 1;
    }

    // prepare to report misspellings
    printf("\nMISSPELLED WORDS\n\n");

    // prepare to spell-check
    int index = 0, misspellings = 0, words = 0;
    char word[LENGTH+1];

    // spell-check each word in text
    for (int c = fgetc(fp); c != EOF; c = fgetc(fp))
    {
        // allow only alphabetical characters and apostrophes
        if (isalpha(c) || (c == '\'' && index > 0))
        {
            // append character to word
            word[index] = c;
            index++;

            // ignore alphabetical strings too long to be words
            if (index > LENGTH)
            {
                // consume remainder of alphabetical string
                while ((c = fgetc(fp)) != EOF && isalpha(c));

                // prepare for new word
                index = 0;
            }
        }

        // ignore words with numbers (like MS Word can)
        else if (isdigit(c))
        {
            // consume remainder of alphanumeric string
            while ((c = fgetc(fp)) != EOF && isalnum(c));

            // prepare for new word
            index = 0;
        }

        // we must have found a whole word
        else if (index > 0)
        {
            // terminate current word
            word[index] = '\0';

            // update counter
            words++;

            // check word's spelling
            getrusage(RUSAGE_SELF, &before);
            bool misspelled = !check(word);
            getrusage(RUSAGE_SELF, &after);

            // update benchmark
            time_check += calculate(&before, &after);

            // print word if misspelled
            if (misspelled)
            {
                printf("%s\n", word);
                misspellings++;
            }

            // prepare for next word
            index = 0;
        }
    }

    // check whether there was an error
    if (ferror(fp))
    {
        fclose(fp);
        printf("Error reading %s.\n", text);
        unload();
        return 1;
    }

    // close text
    fclose(fp);

    // determine dictionary's size
    getrusage(RUSAGE_SELF, &before);
    unsigned int n = size();
    getrusage(RUSAGE_SELF, &after);

    // calculate time to determine dictionary's size
    time_size = calculate(&before, &after);

    // unload dictionary
    getrusage(RUSAGE_SELF, &before);
    bool unloaded = unload();
    getrusage(RUSAGE_SELF, &after);

    // abort if dictionary not unloaded
    if (!unloaded)
    {
        printf("Could not unload %s.\n", dictionary);
        return 1;
    }

    // calculate time to unload dictionary
    time_unload = calculate(&before, &after);

    // report benchmarks
    printf("\nWORDS MISSPELLED:     %d\n", misspellings);
    printf("WORDS IN DICTIONARY:  %d\n", n);
    printf("WORDS IN TEXT:        %d\n", words);
    printf("TIME IN load:         %.2f\n", time_load);
    printf("TIME IN check:        %.2f\n", time_check);
    printf("TIME IN size:         %.2f\n", time_size);
    printf("TIME IN unload:       %.2f\n", time_unload);
    printf("TIME IN TOTAL:        %.2f\n\n",
     time_load + time_check + time_size + time_unload);

    // that's all folks
    return 0;
}

/**
 * Returns number of seconds between b and a.
 */
double calculate(const struct rusage* b, const struct rusage* a)
{
    if (b == NULL || a == NULL)
    {
        return 0.0;
    }
    else
    {
        return ((((a->ru_utime.tv_sec * 1000000 + a->ru_utime.tv_usec) -
                 (b->ru_utime.tv_sec * 1000000 + b->ru_utime.tv_usec)) +
                ((a->ru_stime.tv_sec * 1000000 + a->ru_stime.tv_usec) -
                 (b->ru_stime.tv_sec * 1000000 + b->ru_stime.tv_usec)))
                / 1000000.0);
    }
}

dictionary.c

/*
* dictionary.c
* Implements speller.c's functionality
*/

#include <string.h>
#include <ctype.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>

#include "dictionary.h"


// Global variables and arrays for simple access
char buffer[LENGTH+1] = {'\0'}; // Declares the buffer to store one line or one word in total
unsigned int wordcount = 0;
int hash_index;

node* hashtable[HASHTABLE_SIZE] = {NULL}; // Stores pointer to node datatype
node* current_head_node;  // Will always be the current node, or first in a linked list
FILE* dictionary_ref;


/**
 * Returns true if word is in dictionary else false.
 */
bool check(const char* word)
{
    // Load word into buffer, and convert it to lower
    for (int i = 0, wordlen = strlen(word); i < wordlen; i++) {
        buffer[i] = tolower(word[i]);
        buffer[i+1] = '\0'; // Make sure the word gets terminated
    }

    // Figure out where word in buffer belongs, and make head node to point there
    hash_index = hash(buffer);
    current_head_node = hashtable[hash_index];

    // Iterate through a linked list
    while (current_head_node != NULL) {

        // Check if word in buffer is also in dictionary
        if (strcmp(buffer, current_head_node->word) == 0) {
            swap(¤t_head_node, &hashtable[hash_index]);
            return true;
        }

        // Go to next node
        current_head_node = current_head_node->next;
    }

    return false;
}

 /**
 * Loads dictionary into memory. Returns true if successful else false.
 */
bool load(const char* dictionary)
{
    // Open dictionary file, and error-check
    dictionary_ref = fopen(dictionary, "r");
    if (dictionary_ref == NULL)
        return false;


    while (fscanf(dictionary_ref, "%s", buffer) > 0) {

        // Figure which index to store word in buffer
        hash_index = hash(buffer);

        // Create new node
        current_head_node = malloc(sizeof(node));
        if (current_head_node == NULL) {
            printf("malloc returned NULL");
            return false;
        }

        strcpy(current_head_node->word, buffer);

        // Figure out whether or not the i'th element of array has been used,
        // and act accordingly
        if (hashtable[hash_index] == NULL)
            current_head_node->next = NULL;

        else
            current_head_node->next = hashtable[hash_index];

        // Point hashtable to the most recent addition
        hashtable[hash_index] = current_head_node;

        wordcount++;
    }

    return true;
}


/**
 * Unloads dictionary from memory. Returns true if successful else false.
 */
bool unload(void)
{
    // Close dictionary reference
    fclose(dictionary_ref);

    node* previous_head = NULL;

    for (int i = 0; i < HASHTABLE_SIZE; i++) { // Iterate through hashtable
        current_head_node = hashtable[i];

        while (current_head_node != NULL) { // Iterate the list

            // Be sure we don't free hashtable[0 - 1]
            if (previous_head != NULL)
                free(previous_head);

            // Advance previous_head to current_head,
            // and then iterate current_head to the next
            previous_head = current_head_node;
            current_head_node = current_head_node->next;

        }
        free(previous_head); // Free the last element in linked list
        previous_head = NULL; // Reset so that we can repeat the above
    }

    return true;
}

hash-map

回答 1

Code Review用户

回答已采纳

发布于 2016-09-18 22:19:24

建议删除除少数需要的所有全局变量，并使这些static。除了hashtable[]，其余的都是本地的建筑师，
缺少#include "dictionary.h"，当然这应该包括在文章中。
dictionary.c应该确保dictionary.h是自我的，包括所有需要的<*.h>文件。通过让dictionary.c首先包含dictionary.h，这是很容易测试的。#包括"dictionary.h“#include # .//包括"dictionary.h”
LENGTH没有定义，希望这样的通用命名定义不会出现在"dictionary.h"中。如果是的话，请推荐一本像DICTIONARY_BUFFER_LENGTH这样的字典。HASHTABLE_SIZE也一样。
没有声明node。和上面一样，node是一个非常通用的类型名称。建议dictionary_noode或类似的。
使用类型size_t作为数组索引/大小类型。类型int可能不够。// for (int i= 0，wordlen = strlen(word)；i< wordlen；i++) { for (size_t i= 0，wordlen = strlen(word)；i< wordlen；i++) {
buffer[]空字符终止只需要在循环之后发生一次。// for (int i= 0，wordlen = strlen( word )；i< wordlen；i++) { // buffer我 =tolower(单词我)；// bufferI+1 = '\0'；//确保单词被终止// } size_t = wordlen = strlen(word)；for (size_t i= 0；i< wordlen；i++) { buffer我 =tolower(单词我)；} buffer道伦 = '\0'；
word, next未在current_head_node->word、current_head_node->next;中声明
由于node* current_head_node;的声明与malloc()调用相去甚远，因此评审需要额外的时间来确定类型是否正确。对于更容易正确编写的代码，请检查和维护。// current_head_node =malloc(节点)；current_head_node =malloc( *current_head_node)；
代码缺乏溢出保护。// fscanf(dictionary_ref，"%s"，缓冲器) fscanf(dictionary_ref，"%(some_value)s"，缓冲区)
不需要测试。free(NULL)定义得很好，实际上是没有操作的.// if (previous_head != NULL)空闲(Previous_head)；空闲(Previous_head)；
目标：“检查的实现必须是不区分大小写的.”如果仅限于A-Za-z，那么转换到更低或更高的位置几乎没有什么区别。当使用扩展(8位)字符集时，健壮的代码将使用往返，因为扩展的字母表并不总是有1到1的映射。ch =toupper(tolower(无符号字符)ch))；

票数 2

页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://codereview.stackexchange.com/questions/141711

复制

相似问题

问支持拼写检查器的一组函数。
EN

speller.c

dictionary.c

回答 1

Code Review用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问支持拼写检查器的一组函数。EN

speller.c

dictionary.c

回答 1

Code Review用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问支持拼写检查器的一组函数。
EN