我有以下两份配料清单:
77891。
我想知道的是:
中
我在python中做了一些工作,但是我想要一个更简单的bash实现。
import sys
from collections import OrderedDict
import os
from copy import deepcopy
from itertools import combinations
my_ingredients_dict = OrderedDict()
for f in sys.argv[1:]:
with open(f, 'r') as myfile:
as_a_set = set([ s.strip() for s in myfile.readlines()[0].split(',')])
my_ingredients_dict[os.path.basename(f)] = as_a_set
all_ing_list = my_ingredients_dict.values()
common_ingredients = OrderedDict()
divergent_ingredients = OrderedDict()
for agent1, agent2 in combinations(my_ingredients_dict, 2):
agent_key = str(agent1)+"___AND___"+str(agent2)
agent_common = my_ingredients_dict[agent1] & my_ingredients_dict[agent2]
if agent_common:
common_ingredients[agent_key] = agent_common
agent_1_but_not_in_agent_2_key = "STUFF_IN__"+str(agent1)+"__BUT_NOT_IN__"+str(agent2)
agent1_vs_agent2 = my_ingredients_dict[agent1] - my_ingredients_dict[agent2]
if agent1_vs_agent2:
divergent_ingredients[agent_1_but_not_in_agent_2_key] = agent1_vs_agent2
agent_2_but_not_in_agent_1_key = "STUFF_IN__"+str(agent2)+"__BUT_NOT_IN__"+str(agent1)
agent2_vs_agent1 = my_ingredients_dict[agent2] - my_ingredients_dict[agent1]
if agent2_vs_agent1:
divergent_ingredients[agent_2_but_not_in_agent_1_key] = agent2_vs_agent1
print "========= COMMON ==============\n"
for key,val in common_ingredients.items():
print key, val
print "=========================================\n"
print "============== DIVERGENT =========== \n"
for key, val in divergent_ingredients.items():
print key,val
print "======================================\n"关于gawk解决方案,如果我给出以下列表,代码会产生错误的结果:
(a)
Arginine 8%,Calcium Carbonate, Aqua, Sorbitol, Bicarbonate, Sodium Lauryl Sulfate, Sodium Monofluorophosphate (1450 ppm F), Aroma, Cellulose Gum, Sodium Bicarbonate, Tetrasodium Pyrophosphate, Titanium Dioxide, Benzyl Alcohol, Sodium Saccharin, Xanthan Gum, Limonene(b)
Arginine 8%,Aqua , Calcium Carbonate, Sorbitol, Hydrated Silica, Sodium Lauryl Sulfate, Aroma, Sodium Monofluorophosphate (1450 ppm F), Cellulose Gum, Tricalcium Phosphate, Sodium Bicarbonate, Tetrasodium Pyrophosphate, Sodium Saccharin, Benzyl Alcohol,Xanthan Gum, Limonene, Titanium Dioxidegawk的结果:
Common:
Cellulose Gum
Sodium Bicarbonate
Sorbitol
Sodium Monofluorophosphate (1450 ppm F)
Sodium Saccharin
Calcium Carbonate, Aqua, Sorbitol, Aroma, Poloxamer 407, Sodium Monofluorophosphate (1450 ppm F), Cocamidopropyl Betaine, Zinc Oxide, Benzyl Alcohol, Cellulose Gum, Zinc Citrate, Sodium Bicarbonate, Tetrasodium Pyrophosphate, Xanthan Gum,Sodium Lauryl Sulfate
Aroma
Titanium Dioxide
Calcium Carbonate, Aqua, Sorbitol, Sodium Lauryl Sulfate, Aroma, Sodium Monofluorophosphate (1450 ppm F), Cellulose Gum, Sodium Bicarbonate, Tetrasodium Pyrophosphate, Sodium Saccharin, Benzyl Alcohol, Xanthan Gum, Limonene, CI 77891.
Tetrasodium Pyrophosphate
Limonene
a:
Xanthan Gum
Benzyl Alcohol
Aqua
Arginine 8%,Calcium Carbonate
Bicarbonate
b:
Tricalcium Phosphate
Hydrated Silica
Calcium Carbonate
Arginine 8%,Aqua
Benzyl Alcohol,Xanthan Gum我的python脚本的结果:
========= COMMON ==============
a.txt___AND___b.txt set(['Sorbitol', 'Xanthan Gum', 'Tetrasodium Pyrophosphate', 'Sodium Saccharin', 'Aqua', 'Titanium Dioxide', 'Sodium Bicarbonate', 'Arginine 8%', 'Calcium Carbonate', 'Sodium Monofluorophosphate (1450 ppm F)', 'Sodium Lauryl Sulfate', 'Benzyl Alcohol', 'Limonene', 'Cellulose Gum', 'Aroma'])
=========================================
============== DIVERGENT ===========
STUFF_IN__a.txt__BUT_NOT_IN__b.txt set(['Bicarbonate'])
STUFF_IN__b.txt__BUT_NOT_IN__a.txt set(['Hydrated Silica', 'Tricalcium Phosphate'])
======================================发布于 2019-12-08 16:19:20
以下是GNU awk中的一篇文章:
$ gawk '
BEGIN {
FS="," # comma separated
}
{
for(i=1;i<=NF;i++) {
gsub(/^ +| +$/,"",$i) # trim off leading and trailing space
file[ARGIND][$i]
}
}
END {
print "Common:"
for(i in file[1])
if(i in file[2])
print i
print ORS ARGV[1] ":"
for(i in file[1])
if(!(i in file[2]))
print i
print ORS ARGV[2] ":"
for(i in file[2])
if(!(i in file[1]))
print i
}' file1 file2输出:
Common:
Xanthan Gum
...
file1:
Zinc Citrate
...
file2:
Sodium Lauryl Sulfate发布于 2019-12-07 12:50:05
使用排序、bash和uniq:
哪些元素是共同的
sort <(sed 's/, /\n/g' file1) <(sed 's/, /\n/g' file2) | uniq -d哪些元素存在于一个元素中,而另一个元素不存在?
sort <(sed 's/, /\n/g' file1) <(sed 's/, /\n/g' file2) | uniq -uhttps://stackoverflow.com/questions/59226064
复制相似问题