文章/答案/技术大牛

发布

社区首页 >问答首页 >从任意文件中剥离电子邮件地址

问从任意文件中剥离电子邮件地址
EN

Stack Overflow用户

提问于 2013-05-01 00:55:02

回答 2查看 92关注 0票数 0

从大型文件集中获取user@host.com组合的最佳方法是什么？

我假设sed/awk可以做到这一点，但我不太熟悉regexp。

我们有一个文件，即Staff_data.txt，它不仅包含电子邮件，而且希望剥离其余数据，只收集电子邮件地址(即h@south.com)。

我认为最简单的方法是在终端中通过sed/awk，但是考虑到regexp可能有多复杂，我希望得到一些指导。

谢谢。

regex

sed

awk

回答 2

Stack Overflow用户

发布于 2013-05-01 00:56:48

你想在这里使用grep，而不是sed或awk。例如，显示来自域south.com的所有电子邮件

grep -o '[^ ]*@south\.com ' file

票数 0

Stack Overflow用户

发布于 2013-05-01 02:10:00

这是我几年前为完成这项工作而写的一个有点尴尬但显然可以工作的脚本：

# Get rid of any Message-Id line like this:
#   Message-ID: <AANLkTinSDG_dySv_oy_7jWBD=QWiHUMpSEFtE-cxP6Y1@mail.gmail.com>
#
# Change any character that can't be in an email address to a space.
#
# Print just the character strings that look like email addresses.
#
# Drop anything with multple "@"s and change any domain names (i.e.
# the part after the "@") to all lower case as those are not case-sensitive.
#
# If we have a local mail box part (i.e. the part before the "@") that's
# a mix of upper/lower and another that's all lower, keep them both. Ditto
# for multiple versions of mixed case since we don't know which is correct.
#
# Sort uniquely.

cat "$@" |
awk '!/^Message-ID:/' |
awk '{gsub(/[^-_.@[:alnum:]]+/," ")}1' |
awk '{for (i=1;i<=NF;i++) if ($i ~ /.+@.+[.][[:alpha:]]+$/) print $i}' |
awk '
  BEGIN   { FS=OFS="@" }
  NF != 2 { printf "Badly formatted %s skipped.\n",$0 | "cat>&2"; next }
  { $2=tolower($2); print }
' |
tr '[A-Z]' '[a-z]' |
sort -u

它不是很漂亮，但看起来很健壮。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/16305155

复制

相似问题

问从任意文件中剥离电子邮件地址
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从任意文件中剥离电子邮件地址EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从任意文件中剥离电子邮件地址
EN