文章/答案/技术大牛

发布

社区首页 >问答首页 >查找数组中重复的子数组的个数

问查找数组中重复的子数组的个数
EN

Stack Overflow用户

提问于 2013-07-14 18:39:03

回答 3查看 2.5K关注 0票数 7

有一个从0-n (即size=n)索引的数组，其中包含来自0-m的元素，其中m < n (假设m小于n的100或1000倍，即m远远小于n)，因此许多元素或子数组必须重复，并且我们必须找到大小为1或大于1的这种重复的子数组的数量。例如，考虑一个数组

1 2 4 1 2 4 1 5 6 3 5，这里

1重复2次

2重复1次

4重复1次

5重复1次

1 2重复1次

2 4重复1次

4 1重复1次

1 2 4重复1次

2 4 1重复1次

1 2 4 1重复1次

所以最终的答案是这些的总和，即11

有人可以建议一些数据结构或有效的算法来解决这个问题吗？我在考虑对m进行散列，并记下它出现的索引值，但没有将其形式化。

假设n,m < 10^5

algorithm

arrays

回答 3

Stack Overflow用户

回答已采纳

发布于 2013-07-14 22:14:21

创建一个以整数为键的散列，以及整数的可扩展列表（例如链表）作为值。
通读您的输入列表。将每个被扫描的输入元素视为一个键；将以下元素的索引附加到关联的值列表中。
现在，您重复的 1 元素计数是 n - |keys|
接下来，检查你的钥匙。对于每个键，将原始列表的索引元素视为新的输入列表。创建一个新的散列并按照步骤 1 继续。请注意，当我们存储索引时，我们将继续使用原始列表的索引。

示例:1 2 4 1 2 4 1 5 6 3 5(假设我们是零索引的)。给你，n=11。

1 -> 1，4，7
2 -> 2，5
3 -> 10
4 -> 3，6
5 -> 8，nil

H1216 -> 9F223

重复的1-elts的数量是11 -6= 5。

现在，我们来看一下钥匙。我们从1开始，索引列表是1，4，7，对应于元素2，2，5。

2 -> 2，5
5 -> 8

这将拾取3-2=1个重复的2元素列表。

等等。

运行时间:输入+输出为线性

票数 4

Stack Overflow用户

发布于 2013-07-14 22:02:26

我使用了一种基于后缀树的方法，来自Skienna's book on the design of algorithms。后缀树是一种特殊的trie tree，可以将给定字符串的每个后缀插入到trie树中。Trie树是一种在单个树中存储多个单词的方法:根是空字符串，每个边都添加一个字符。下面我使用了一个非常简单的实现作为概念的证明。

#include <iostream>
#include <string>
using std::string;
using std::cout;

class Node
{
    Node* m_descendants[10];
    int m_count;
public:
    Node()
    {
        for(int i=0; i<10; ++i)  { m_descendants[i] = nullptr; }
        m_count = 0;
    }
    void Add(const char* str) {
        // Increment number of times we've reached this node
        m_count++;

        const int firstDigit = str[0] - '0';
        if (!(0 <= firstDigit && firstDigit <= 9)) return; // recursion base case 

        // Access descendant node correponding to current digit
        Node*& descendant = m_descendants[firstDigit];
        if (descendant == 0) { // create descendant if necessary
            descendant = new Node;
        }

        // Recurse on rest of string
        descendant->Add(str +1);
    }
    void Print(const string& str, int countAdj=0) const // debug function
    {
        for(int nextDigit=0; nextDigit<10; ++nextDigit) {
            const Node* node = m_descendants[nextDigit];
            if (node) {
                const int adjustedCount = node->Count() - countAdj;
                if (adjustedCount > 0) {
                    char c = '0' + nextDigit;
                    string strWithC = str;
                    strWithC += c;
                    cout << strWithC << ": " << adjustedCount << "\n";
                    node->Print(strWithC, countAdj);
                }
            }
        }
    }
    int SumRepeated() const
    {
        int sumRepeated = 0;
        for(int nextDigit=0; nextDigit<10; ++nextDigit) {
            const Node* node = m_descendants[nextDigit];
            if (node) {
                const int repeatCount = node->Count() -1;
                if (repeatCount > 0) {
                    sumRepeated += repeatCount;
                    sumRepeated += node->SumRepeated();
                }
            }
        }
        return sumRepeated;
    }
    int Count() const { return m_count; }
};

int main(int argc, const char* argv[])
{
    // Construct
    Node* const root = new Node;
    for(const char* str = "12412415635"; *str; ++str) {
        root->Add(str);
    }

    // Print
    root->Print(string(), 1);

    cout << "Sum repeated is " << root->SumRepeated() << "\n";

    return 0; // (nodes are leaked)
}

输出

1: 2
12: 1
124: 1
1241: 1
2: 1
24: 1
241: 1
4: 1
41: 1
5: 1
Sum repeated is 11

注意，这里还有一个重复的子字符串，即1241。

正如我所说的，这是概念验证，因此，例如，您可能希望使用字典而不是数组来存储具有更大m的子代。此外，即使就整体算法而言，这种实现也不是最优的:它在时间和空间上都是O(n^2)。您可以通过折叠没有分支的路径来优化，以获得O(n)空间，并使用巧妙的构造算法来获得O(n)时间。此外，正如在另一个答案中指出的那样，您不需要处理最大长度为原始字符串一半的子字符串。

票数 2

Stack Overflow用户

发布于 2013-07-15 10:33:59

下面是JavaScript中的一些东西：(输出或多或少是即时的，我认为JavaScript是最慢的之一)：

<script>

function f (s, m) {
  var i = 0, n = s.length, c = 0, H = {}, Hi = {}
  while (i < n-1) {
    for (var j=i+1; j<=Math.min (i + 1 + m,n - 1); j++) {
      if (s[i] == s[j]) {
        var i_= i, j_= j, tmp = ''
        while (s[i_] == s[j_] && i_ < j) {
          tmp += String (s[i_])
          i_++
          j_++
        }
        var k = tmp.length - 1
        while (k >= 0) {
          if (!H[tmp]) {
            H[tmp] = 1
            Hi[tmp] = i
            c++
          }
          else if (i == Hi[tmp]) {
            H[tmp]++
            c++
          }
          tmp = tmp.substr(0,k--)
        }
      }
    }
    i++
  }
  var t1 = ''
  for (var i in H) t1 += i + ", "
  return [t1,c]
}

var a1 = [1,2,4,1,2,4,1,5,6,3,5],
    a2 = []

for (var i=0;i<100000;i++) a2.push(Math.floor(Math.random()*10))

console.log(a1 +"\n"+ f(a1,10))
console.log(f(a2,10))

</script>

输出：

1,2,4,1,2,4,1,5,6,3,5
1, 2, 4, 5, 12, 24, 41, 124, 241, ,10

["0...6358, 3582, 038, 4304, 2068, 095, 
 9252, 37866, 3786, 7866, 7893, 8701, 0669, ", 771]

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/17638627

复制

相似问题

问查找数组中重复的子数组的个数
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问查找数组中重复的子数组的个数EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问查找数组中重复的子数组的个数
EN