文章/答案/技术大牛

发布

社区首页 >问答首页 >在没有标点符号的.txt文件中查找最长的单词

问在没有标点符号的.txt文件中查找最长的单词
EN

Stack Overflow用户

提问于 2020-05-30 13:19:49

回答 3查看 610关注 0票数 1

我正在做Python /O练习，虽然在一个尝试在.txt文件的每一行中找到最长的单词的练习上取得了巨大的进步，但我无法摆脱标点符号。

下面是我的代码：

with open("original-3.txt", 'r') as file1:
lines = file1.readlines()
for line in lines:
    if not line == "\n":
        print(max(line.split(), key=len))

这是我得到的输出

这是我读取数据的original-3.txt文件

'Twas brillig, and the slithy toves
Did gyre and gimble in the wabe;
All mimsy were the borogoves,
And the mome raths outgrabe.

"Beware the Jabberwock, my son!
The jaws that bite, the claws that catch!
Beware the Jubjub bird, and shun
The frumious Bandersnatch!"

He took his vorpal sword in hand:
Long time the manxome foe he sought,
So rested he by the Tumtum tree,
And stood a while in thought.

And, as in uffish thought he stood,
The Jabberwock, with eyes of flame,
Came whiffling through the tulgey wood,
And burbled as it came!

One two! One two! And through and through
The vorpal blade went snicker-snack!
He left it dead, and with its head
He went galumphing back.

"And hast thou slain the Jabberwock?
Come to my arms, my beamish boy!"
"Oh frabjous day! Callooh! Callay!"
He chortled in his joy.

'Twas brillig, and the slithy toves
Did gyre and gimble in the wabe:
All mimsy were the borogoves,
And the mome raths outgrabe.

如你所见，我得到了像["," ";" "?" "!"]这样的标点符号

你觉得我怎么才能听得到自己的话？

谢谢

python

parsing

text-processing

text-parsing

回答 3

Stack Overflow用户

回答已采纳

发布于 2020-05-30 13:23:30

您必须从单词中strip这些字符：

with open("original-3.txt", 'r') as file1:
    lines = file1.readlines()
for line in lines:
    if not line == "\n":
        print(max(word.strip(",?;!\"") for word in line.split()), key=len))

或者使用正则表达式提取所有看起来像单词的内容(即由字母组成)：

import re


for line in lines: 
    words = re.findall(r"\w+", line) 
    if words: 
        print(max(words, key=len))

票数 1

Stack Overflow用户

发布于 2020-05-30 13:48:53

使用Regex很容易获得什么是length of longest word

import re

for line in lines:
    found_strings = re.findall(r'\w+', line)
    print(max([len(txt) for txt in found_strings]))

票数 2

Stack Overflow用户

发布于 2020-05-30 14:48:28

此解决方案不使用正则表达式。它将行拆分为单词，然后对每个单词进行消毒，使其只包含字母字符。

with open("original-3.txt", 'r') as file1:
    lines = file1.readlines()
    for line in lines:
        if not line == "\n":
            words = line.split()
            for i, word in enumerate(words):
                words[i] = "".join([letter for letter in word if letter.isalpha()])
            print(max(words, key=len))

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/62103029

复制

相似问题

问在没有标点符号的.txt文件中查找最长的单词
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在没有标点符号的.txt文件中查找最长的单词EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在没有标点符号的.txt文件中查找最长的单词
EN