文章/答案/技术大牛

发布

社区首页 >问答首页 >验证ISBN-10，在C++11中使用或不带破折号

问验证ISBN-10，在C++11中使用或不带破折号
EN

Code Review用户

提问于 2020-11-01 10:53:33

回答 2查看 643关注 0票数 6

描述

一种基于正则表达式的验证ISBN-10为字符串的方法.根据那些讲英语的出版物(由下面代码中的正则表达式定义)，可以是数字(或末尾的“X”)，也可以是虚线。

最后一个数字，一个检查数字(除非它是'X')，是以下列方式计算的。从最左边的数字开始，将每一个数字乘以一个权重，然后对结果进行求和。检查数字应该是这样的，这个和可以被11整除。权重从1开始，每个数字增加1。例如，考虑ISBN 0-306-40615-2.这笔款项计算如下：

sum = 0*1 + 3*2 + 0*3 + 6*4 + 4*5 + 0*6 + 6*7 + 1*8 + 5*9 + 2*10 = 165
165 mod 11 = 0 // the check digit, 2, is valid.

我是一个.NET开发人员，为了好玩而涉足C++。通过这个练习，我试图通过在namespace中创建一个匿名namespace来隐藏某些东西。本质上，我试图实现private关键字在C#中所做的工作，即C++语义。我很好奇我是否在这段代码中做过"un C++“之类的事情。我设想这段代码可以扩展为具有一个验证ISBN-13数字的函数。

码

#include <iostream>
#include <string>
#include <vector>
#include <regex>
#include <algorithm>

namespace isbn_validation
{
    namespace // anonymous namespace
    {
        // regex expressions
        // dashless
        const std::regex isbn10_no_dashes(R"((\d{9})[\d|\X])");
        // with dashes
        const std::regex isbn10_dashes1(R"((\d{1})\-(\d{5})\-(\d{3})\-[\d|\X])");
        const std::regex isbn10_dashes2(R"((\d{1})\-(\d{3})\-(\d{5})\-[\d|\X])");
        const std::regex isbn10_dashes3(R"((\d{1})\-(\d{4})\-(\d{4})\-[\d|\X])");
        const std::regex isbn10_dashes4(R"((\d{1})\-(\d{5})\-(\d{3})\-[\d|\X])");
        const std::regex isbn10_dashes5(R"((\d{2})\-(\d{5})\-(\d{2})\-[\d|\X])");
        const std::regex isbn10_dashes6(R"((\d{1})\-(\d{6})\-(\d{2})\-[\d|\X])");
        const std::regex isbn10_dashes7(R"((\d{1})\-(\d{7})\-(\d{1})\-[\d|\X])");

        bool isbn10_check_digit_valid(std::string isbn10)
        {
            auto valid = false;

            // split it
            std::vector<char> split(isbn10.begin(), isbn10.end());

            // if the very last character is an 'X', don't bother with it
            if (split[9] == 'X')
            {
                return true;
            }

            // all digits
            // validate the last digit (check digit)
            int digit_sum = 0;
            int digit_index = 1;
            for (std::vector<char>::iterator it = split.begin(); it != split.end(); ++it)
            {
                digit_sum = digit_sum + ((*it - '0')*digit_index);

                digit_index++;
            }
            valid = !(digit_sum%11);

            return valid;
        }
    }

    bool valid_isbn10(std::string isbn)
    {
        // can take ISBN-10, with or without dashes
        auto valid = false;

        // check if it is a valid ISBN-10 without dashes
        if (std::regex_match(isbn, isbn10_no_dashes))
        {
            // validate the check digit
            valid = isbn10_check_digit_valid(isbn);
        }

        // check if it is a valid ISBN-10 with dashes
        if (std::regex_match(isbn, isbn10_dashes1) || std::regex_match(isbn, isbn10_dashes2) || std::regex_match(isbn, isbn10_dashes3) ||
            std::regex_match(isbn, isbn10_dashes4) || std::regex_match(isbn, isbn10_dashes5) || std::regex_match(isbn, isbn10_dashes6) || std::regex_match(isbn, isbn10_dashes7))
        {
            // remove the dashes
            isbn.erase(std::remove(isbn.begin(), isbn.end(), '-'), isbn.end());

            // validate the check digit
            valid = isbn10_check_digit_valid(isbn);
        }

        return valid;
    }
}

c++

c++11

regex

回答 2

Code Review用户

回答已采纳

发布于 2020-11-01 11:21:36

当您比较8种不同的模式，然后简单地删除其中7种验证的-时，为什么不首先删除-，然后根据剩下的唯一模式进行验证。

另外要注意的是，在模式的末尾，您有一个字符集：[\d|\X]。这实际上将匹配以下之一：

数字
文字|
文字X字符(但您不需要\X )。

相反，这应该是：

\d{9}[\dX]

代码应如何工作的总体大纲：

从给定字符串中删除所有-
检查字符串的长度是否恰好是10
根据上面重写的模式验证
检查最后一个字符是否为X
验证单个数字和的可分性。

票数 6

Code Review用户

发布于 2020-11-01 22:51:58

hjpotter92从正确的想法开始：

统一和简化。

你应该走得更远。

另外，如果最后一个数字是X，不要放弃，这对于您的代码来说是一个非常好的数字。

你对所有的破折号都不感兴趣(或者不应该)，所以忽略它们。

你能知道你有所有用过的图案吗？

仔细看看ISBN10 10上的维基百科，您会发现这种模式是完全正常的。每一个数字都乘以它的位置，甚至最后一个数字(可能是10，用X表示)。

由于不需要修改字符串，所以可以使用std::string_view，使任何调用者都能获得最大的方便和效率。

是的，它只随C++17一起提供，如果您的库不提供它，那么要么使用免费的实现，要么至少返回到std::string const&以避免复制。

避免不必要的分配(即使小对象优化可能为您的助手函数的字符串参数节省您的培根)是一个非常C++的事情。

这并不是说它可以修复您创建的一个完全多余的std::vector。

现在您可以将其标记为noexcept和constexpr，使调用方确信它始终会成功，甚至可以在编译时完成。

顺便说一句，从C++到C++11，for-range是一件很好的事情，它比手动迭代器更好。

如果您多次使用不同的数据执行相同的操作，请考虑将其存储在数组中并循环使用，就像在C#中所做的那样。

constexpr bool validate_isbn10(std::string_view s) noexcept {
    unsigned num = 0; // unsigned so overflow is harmless
    std::size_t found = 0; // std::size_t so cannot overflow at all
    for (auto c : s)
        if (c >= '0' && c <= '9')
            num += ++found * (c - '0');
        else if (c == 'X' && found == 9)
            num += ++found * 10;
        else if (c != '-')
            return false;
    return found == 10 && num % 11 == 0;
}

票数 5

页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://codereview.stackexchange.com/questions/251415

复制

相似问题

问验证ISBN-10，在C++11中使用或不带破折号
EN

描述

码

回答 2

Code Review用户

Code Review用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问验证ISBN-10，在C++11中使用或不带破折号EN

描述

码

回答 2

Code Review用户

Code Review用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问验证ISBN-10，在C++11中使用或不带破折号
EN