文章/答案/技术大牛

发布

社区首页 >问答首页 >找到集合D中最长的单词，它是字符串S的子序列

问找到集合D中最长的单词，它是字符串S的子序列
EN

Code Review用户

提问于 2019-02-27 18:09:48

回答 2查看 826关注 0票数 6

给出一个字符串S和一组单词D，找出D中最长的单词，即S的子序列，如果一些字符(可能为零)可以在不重新排序剩下的字符的情况下，删除一些字符(可能为零)，则D中最长的单词是S的子序列。注意:d可以以任何格式出现(列表、哈希表、前缀树等)。例如，给定S= "abppplee“和D= {"able”、"ale“、"apple”、"bale“、”袋鼠“}的输入，正确的输出将是"apple”。

"ale“和”ale“这两个词都是S的后缀，但它们都比”苹果“短。
“包”这个词并不是S的后续词，因为尽管S的字母都是正确的，但它们的顺序并不是对的。
“袋鼠”是D中最长的单词，但它不是S的子序列。

我目前正在努力学习如何成为一个更好的程序员。我是采取谷歌技术发展指南推荐的路径，这是第一个问题。我想出了自己的解决方案，并阅读了Google的第一个推荐答案是使用蛮力，而一个小小的优化是至少确保字典由哈希表或前缀树表示，这样查找才是有效的。

这是我第一次使用字典，我对使用对象和类也没有信心。如果您可以查看并推荐更好的方法来改进我的代码，也许可以使用更多(或更好)对象，我会很感激的。

class Main:
    def create_dictionary(string):
        dictionary = {}
        index = 0

        for letter in string:
            if letter in dictionary:
                dictionary[letter].append(index)
            else:
                dictionary[letter] = [index]
            index += 1
        return(dictionary)

    def get_word_is_substring(word, dictionary):
        index_of_last_letter_found = None
        for letter in word:
            if letter in dictionary and (index_of_last_letter_found is None or dictionary[letter][-1] > index_of_last_letter_found):
                index = 0
                while index < len(dictionary[letter]):
                    if index_of_last_letter_found is None or index_of_last_letter_found < dictionary[letter][index]:
                        index_of_last_letter_found = dictionary[letter][index]
                        break
                    else:
                        index += 1

            else:
                return False
        return True

    def replace_word_if_necessary(word, longest_word):
        if (longest_word is None) or (len(word) > len(longest_word)):
            longest_word = word
        return longest_word

    def get_longest_word(s, d):
        dictionary = Main.create_dictionary(s)
        longest_word = None

        for word in d:
            word_is_substring = Main.get_word_is_substring(word, dictionary)
            if word_is_substring:
                longest_word = Main.replace_word_if_necessary(word, longest_word)
        print(longest_word)

Main.get_longest_word("abppplee", {"ale", "bale", "able", "apple", "kangaroo"})

python

python-3.x

回答 2

Code Review用户

发布于 2019-02-27 19:57:25

使用`defaultdict`和`enumerate`

Python提供了defaultdict类，它的工作方式类似于常规的二叉树，但如果请求的元素不存在，则返回默认值(此处为空列表)。如果需要循环中的索引，也可以使用enumerate。

from collections import defaultdict

def create_dictionary(string):
    dictionary = defaultdict(list)
    for index, letter in enumerate(string):
        dictionary[letter].append(index)
    return dictionary

在下一个方法中，使用索引列表来利用defaultdict。另外，使用列表理解来筛选有效的索引，因为这也默认为空列表，因此不需要任何特殊的案例处理。

def get_word_is_substring(word, dictionary):
    indices = [-1]
    for letter in word:
        indices = [index for index in dictionary[letter] if index > indices[0]]
        if not indices:
            return False
    return True

票数 3

Code Review用户

发布于 2019-02-27 22:58:27

我知道这是一个代码回顾，我应该改进代码，而不是编写一个新的代码，我想和大家分享一下我是如何做到的。

def longest_matching_word(s, d):
    def is_valid(word, s=s):
        if not word and not s:
            return True
        if not s and word:
            return False
        if word and word[0]==s[0]:
            return is_valid(word[1:], s[1:])
        return is_valid(word, s[1:])
    return max(filter(is_valid, d), key=len)

S = "abppplee" 
D = {"ale", "bale", "able", "apple", "kangaroo"}

longest_matching_word(S, D) # -> 'apple'

这样做，调试和维护就更容易了(没有状态或变量，所以可能出错的事情就少了)。

由于懒惰的filter对象，对于大型集，它将几乎不使用任何内存。

复杂性是O(n)。对于大型集来说，一个有趣的事情是可以应用multiprocessing.map来更有效地利用计算机的资源。

如果你想一想，就不难理解：

它也是非常可读的，没有神秘的变量，可疑的循环，每件事都有一个可以推断的目的。

我希望它给你一些想法..。

票数 2

页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://codereview.stackexchange.com/questions/214408

复制

相似问题

问找到集合D中最长的单词，它是字符串S的子序列
EN

回答 2

Code Review用户

使用`defaultdict`和`enumerate`

Code Review用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问找到集合D中最长的单词，它是字符串S的子序列EN

回答 2

Code Review用户

使用defaultdict和enumerate

Code Review用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问找到集合D中最长的单词，它是字符串S的子序列
EN

使用`defaultdict`和`enumerate`