首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >比较两个以逗号分隔的列表,以表示每个列表中的公共元素和不同元素

比较两个以逗号分隔的列表,以表示每个列表中的公共元素和不同元素
EN

Stack Overflow用户
提问于 2019-12-07 12:40:13
回答 2查看 60关注 0票数 1

我有以下两份配料清单:

  1. 碳酸钙、水化合物、山梨醇、香气、泊洛沙姆407、单氟磷酸钠(1450 ppm F)、钴胺丙基甜菜碱、氧化锌、苄醇、纤维素、柠檬酸锌、碳酸氢钠、焦磷酸钠、黄原胶、糖精钠、蔗糖、琥珀糖、柠檬烯、CI 77891.
  2. 碳酸钙、水、山梨醇、月桂酸钠、芳香素、单氟磷酸钠(1450 ppm F)、纤维素Gum、碳酸氢钠、磷酸氢二钠、碳酸钠、苯甲酰、辛烯醇、

77891。

我想知道的是:

  1. 哪种元素是共同的
  2. 哪种元素在一个中存在,而不是在另一个

我在python中做了一些工作,但是我想要一个更简单的bash实现。

代码语言:javascript
复制
import sys
from collections import OrderedDict
import os
from copy import deepcopy
from itertools import combinations

my_ingredients_dict = OrderedDict()

for f in sys.argv[1:]:
        with open(f, 'r') as myfile:
                as_a_set = set([ s.strip() for s in myfile.readlines()[0].split(',')])
                my_ingredients_dict[os.path.basename(f)] = as_a_set
all_ing_list = my_ingredients_dict.values()

common_ingredients = OrderedDict()
divergent_ingredients = OrderedDict()


for agent1, agent2 in combinations(my_ingredients_dict, 2):
    agent_key = str(agent1)+"___AND___"+str(agent2)
    agent_common = my_ingredients_dict[agent1] & my_ingredients_dict[agent2]
    if agent_common:
        common_ingredients[agent_key] = agent_common
    agent_1_but_not_in_agent_2_key = "STUFF_IN__"+str(agent1)+"__BUT_NOT_IN__"+str(agent2)
    agent1_vs_agent2 = my_ingredients_dict[agent1] - my_ingredients_dict[agent2]
    if agent1_vs_agent2:
        divergent_ingredients[agent_1_but_not_in_agent_2_key] = agent1_vs_agent2

    agent_2_but_not_in_agent_1_key = "STUFF_IN__"+str(agent2)+"__BUT_NOT_IN__"+str(agent1)
    agent2_vs_agent1 = my_ingredients_dict[agent2] - my_ingredients_dict[agent1]
    if agent2_vs_agent1:
        divergent_ingredients[agent_2_but_not_in_agent_1_key] = agent2_vs_agent1

print "========= COMMON ==============\n"
for key,val in common_ingredients.items():
        print key, val
print "=========================================\n"

print "============== DIVERGENT =========== \n"
for key, val in divergent_ingredients.items():
        print key,val
print "======================================\n"

关于gawk解决方案,如果我给出以下列表,代码会产生错误的结果:

(a)

代码语言:javascript
复制
Arginine 8%,Calcium Carbonate, Aqua, Sorbitol, Bicarbonate, Sodium Lauryl Sulfate, Sodium Monofluorophosphate (1450 ppm F), Aroma, Cellulose Gum, Sodium Bicarbonate, Tetrasodium Pyrophosphate, Titanium Dioxide, Benzyl Alcohol, Sodium Saccharin, Xanthan Gum, Limonene

(b)

代码语言:javascript
复制
Arginine 8%,Aqua , Calcium Carbonate, Sorbitol, Hydrated Silica, Sodium Lauryl Sulfate, Aroma, Sodium Monofluorophosphate (1450 ppm F), Cellulose Gum, Tricalcium Phosphate, Sodium Bicarbonate, Tetrasodium Pyrophosphate, Sodium Saccharin, Benzyl Alcohol,Xanthan Gum, Limonene, Titanium Dioxide

gawk的结果:

代码语言:javascript
复制
Common:
Cellulose Gum
Sodium Bicarbonate
Sorbitol
Sodium Monofluorophosphate (1450 ppm F)
Sodium Saccharin
Calcium Carbonate, Aqua, Sorbitol, Aroma, Poloxamer 407, Sodium Monofluorophosphate (1450 ppm F), Cocamidopropyl Betaine, Zinc Oxide, Benzyl Alcohol, Cellulose Gum, Zinc Citrate, Sodium Bicarbonate, Tetrasodium Pyrophosphate, Xanthan Gum,Sodium Lauryl Sulfate
Aroma
Titanium Dioxide
Calcium Carbonate, Aqua, Sorbitol, Sodium Lauryl Sulfate, Aroma, Sodium Monofluorophosphate (1450 ppm F), Cellulose Gum, Sodium Bicarbonate, Tetrasodium Pyrophosphate, Sodium Saccharin, Benzyl Alcohol, Xanthan Gum, Limonene, CI 77891.
Tetrasodium Pyrophosphate
Limonene

a:
Xanthan Gum
Benzyl Alcohol
Aqua
Arginine 8%,Calcium Carbonate
Bicarbonate

b:
Tricalcium Phosphate
Hydrated Silica
Calcium Carbonate
Arginine 8%,Aqua
Benzyl Alcohol,Xanthan Gum

我的python脚本的结果:

代码语言:javascript
复制
========= COMMON ==============

a.txt___AND___b.txt set(['Sorbitol', 'Xanthan Gum', 'Tetrasodium Pyrophosphate', 'Sodium Saccharin', 'Aqua', 'Titanium Dioxide', 'Sodium Bicarbonate', 'Arginine 8%', 'Calcium Carbonate', 'Sodium Monofluorophosphate (1450 ppm F)', 'Sodium Lauryl Sulfate', 'Benzyl Alcohol', 'Limonene', 'Cellulose Gum', 'Aroma'])
=========================================

============== DIVERGENT ===========

STUFF_IN__a.txt__BUT_NOT_IN__b.txt set(['Bicarbonate'])
STUFF_IN__b.txt__BUT_NOT_IN__a.txt set(['Hydrated Silica', 'Tricalcium Phosphate'])
======================================
EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2019-12-08 16:19:20

以下是GNU awk中的一篇文章:

代码语言:javascript
复制
$ gawk '
BEGIN {
    FS=","                              # comma separated
}
{
    for(i=1;i<=NF;i++) {
        gsub(/^ +| +$/,"",$i)           # trim off leading and trailing space
        file[ARGIND][$i]
    }
}
END {
    print "Common:"
    for(i in file[1])
        if(i in file[2])
            print i

    print ORS ARGV[1] ":"
    for(i in file[1])
        if(!(i in file[2]))
            print i

    print ORS ARGV[2] ":"
    for(i in file[2])
        if(!(i in file[1]))
            print i
}' file1 file2

输出:

代码语言:javascript
复制
Common:
Xanthan Gum
...

file1:
Zinc Citrate
...

file2:
Sodium Lauryl Sulfate
票数 0
EN

Stack Overflow用户

发布于 2019-12-07 12:50:05

使用排序、bash和uniq:

哪些元素是共同的

代码语言:javascript
复制
sort <(sed 's/, /\n/g' file1) <(sed 's/, /\n/g' file2) | uniq -d

哪些元素存在于一个元素中,而另一个元素不存在?

代码语言:javascript
复制
sort <(sed 's/, /\n/g' file1) <(sed 's/, /\n/g' file2) | uniq -u
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/59226064

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档