文章/答案/技术大牛

发布

社区首页 >问答首页 >受模式约束的最长公共子串

问受模式约束的最长公共子串
EN

Stack Overflow用户

提问于 2011-11-13 17:58:59

回答 4查看 898关注 0票数 1

问题：

我有三个字符串s1，s2，s3。每个文件的两边都包含垃圾文本，其中心是一个定义模式：text1+number1。number1在每个字符串中增加2。我想提取text1+number1。

我已经编写了找到number1的代码

如何扩展LCS函数以获得text1?

#include <iostream>

const std::string longestCommonSubstring(int, std::string const& s1, std::string const& s2, std::string const& s3);

int main(void) {
    std::string s1="hello 5", s2="bolo 7", s3="lo 9sdf";
    std::cout << "Trying to get \"lo 5\", actual result: \"" << longestCommonSubstring(5, s1, s2, s3) << '\"';
}

const std::string longestCommonSubstring(int must_include, std::string const& s1, std::string const& s2, std::string const& s3) {
    std::string longest;

    for(size_t start=0, length=1; start + length <= s1.size();) {
        std::string tmp = s1.substr(start, length);
        if (std::string::npos != s2.find(tmp) && std::string::npos != s3.find(tmp)) {
            tmp.swap(longest);
            ++length;
        } else ++start;
    }

    return longest;
}

示例：

来自"hello 5"，"bolo 7"，"lo 9sdf"，我想要"lo 5"

代码：

我已经能够编写一个简单的LCS函数(测试用例)，但是我在编写这个修改的函数时遇到了困难。

c++

algorithm

parsing

pattern-matching

longest-substring

回答 4

Stack Overflow用户

回答已采纳

发布于 2011-11-14 10:56:53

写了我自己的解决方案：

#include <iostream>
#include <string>
#include <sstream>
#include <vector>

typedef std::pair<std::pair<std::string, std::string>, std::pair<std::pair<std::string, std::string>, std::pair<std::string, std::string>>> pairStringTrio;
typedef std::pair<std::string,std::pair<std::string,std::string>> stringPairString;

stringPairString longestCommonSubstring(const pairStringTrio&);
std::string strFindReplace(const std::string&, const std::string&, const std::string&);

int main(void) {
        std::string s1= "6 HUMAN ACTIONb", s2="8 HUMAN ACTIONd", s3="10 HUMAN ACTIONf";
        pairStringTrio result = std::make_pair(std::make_pair(s1, "6"), std::make_pair(std::make_pair(s2, "8"), std::make_pair(s3, "10")));

        stringPairString answer = longestCommonSubstring(result);
        std::cout << '\"' << answer.first << "\"\t\"" << answer.second.first << "\"\t\"" << answer.second.second << '\"';
}


stringPairString longestCommonSubstring(const pairStringTrio &foo) {
        std::string longest;

        for(size_t start=0, length=foo.first.first.size()-1; start + length <= foo.first.first.size();) {
                std::string s1_tmp = foo.first.first.substr(start, length);
                std::string s2_tmp = strFindReplace(s1_tmp, foo.first.second, foo.second.first.second);
                std::string s3_tmp = strFindReplace(s1_tmp, foo.first.second, foo.second.second.second);

                if (std::string::npos != foo.second.first.first.find(s2_tmp) && std::string::npos != foo.second.second.first.find(s3_tmp)) {
                        s1_tmp.swap(longest);
                        ++length;
                } else ++start;
        }

        return std::make_pair(longest, std::make_pair(strFindReplace(longest, foo.first.second, foo.second.first.second), strFindReplace(longest, foo.first.second, foo.second.second.second)));
}

std::string strFindReplace(const std::string &original, const std::string& src, const std::string& dest) {
        std::string answer=original;
        for(std::size_t pos = 0; (pos = answer.find(src, pos)) != answer.npos;)
                answer.replace(pos, src.size(), dest);
        return answer;
}

票数 0

Stack Overflow用户

发布于 2011-11-13 22:17:13

假设您正在寻找一个模式*n、*n+2、*n+4等，您有以下字符串: s1="hello 1，拜2，ciao 1"，s2="hello 3，拜4，ciao 2“和s3="hello 5，拜6，ciao 5”。这样就可以做到以下几点：

//find all pattern sequences
N1 = findAllPatterns(s1, number);
 for i = 2 to n:
  for item in Ni-1:
   for match in findAllPatterns(si, nextPattern(item))
    Ni.add([item, (match, indexOf(match))]);

//for all pattern sequences identify the max common substring
maxCommonLength = 0; 
for sequence in Nn:
 temp = findLCS(sequence);
 if(length(temp[0]) > maxCommonLength):
  maxCommonLength = length(temp[0]);
  result = temp;

return result;

算法的第一部分将识别序列：(1，6)，(3，6)，(5，6)，(1，19)，(3，6)，(5，6)，(2，12)，(4，12)，(6，12)

第二部分将识别："hello 1“、"hello 3”、"hello 5“作为匹配模式的最长子串。

该算法可以通过合并两个部分并丢弃与模式匹配但不是最优的早期序列来进一步优化，但为了更清晰起见，我更愿意将其分为两部分。

-编辑固定代码块

票数 1

Stack Overflow用户

发布于 2011-11-13 18:39:50

如果您已经知道了number1，并且知道这些数字只在它们对应的字符串中出现一次，那么下面的操作应该是有效的：

我将调用您的字符串s[0]、s[1]等，设置longest = INT_MAX。对于每个字符串s[i] (i >= 0)，只需：

查找number1 + 2 * i在s[i]中的位置。假设它发生在j位置。
如果(i == 0) j0 = j；If
- 对于(k = 1；k <= j&k <= &s& si == s；++k) {}
- 最长= k；

最后，longest将是所有字符串共有的最长子字符串的长度。

基本上，我们只是从找到数字的点开始向后扫描，寻找与您的s1 (my s[0])中相应的字符不匹配的地方，并跟踪longest中最长匹配的子字符串是什么--这只能保持不变，或者随着我们看到的每一个新字符串而减少。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/8113508

复制

相似问题

问受模式约束的最长公共子串
EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问受模式约束的最长公共子串EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问受模式约束的最长公共子串
EN