输入:
('MechanicalKeyboards', 2, 'ForgetfulDoryFish')
('MechanicalKeyboards', 1, 'cheshire26')
('MechanicalKeyboards', 1, 'Sygaldry')
('scala', 5, 'hyperforce')
('xkcd', 3, 'brinjal66')
('MechanicalKeyboards', 1, 'Sygaldry')
('MechanicalKeyboards', 1, 'DzyDzyDino')这是我的RDD。
With-e = lines.filter(lambda x: 'e' in lines[0])
期望输出:
('MechanicalKeyboards', 2, 'ForgetfulDoryFish')
('MechanicalKeyboards', 1, 'cheshire26')
('MechanicalKeyboards', 1, 'Sygaldry')
('MechanicalKeyboards', 1, 'Sygaldry')
('MechanicalKeyboards', 1, 'DzyDzyDino')我试图过滤掉RDD元组的第一个元素中不包含'e‘的所有元素,这可能吗?
发布于 2018-09-30 09:21:47
我想你可以像下面这样做
>>> rdd = sc.parallelize([
... ('MechanicalKeyboards', 2, 'ForgetfulDoryFish'),
... ('MechanicalKeyboards', 1, 'cheshire26'),
... ('MechanicalKeyboards', 1, 'Sygaldry'),
... ('scala', 5, 'hyperforce'),
... ('xkcd', 3, 'brinjal66'),
... ('MechanicalKeyboards', 1, 'Sygaldry'),
... ('MechanicalKeyboards', 1, 'DzyDzyDino')
... ])
>>>
>>> rdd.filter(lambda x: True if 'e' in x[0] else False).collect()
[('MechanicalKeyboards', 2, 'ForgetfulDoryFish'), ('MechanicalKeyboards', 1, 'cheshire26'), ('MechanicalKeyboards', 1, 'Sygaldry'), ('MechanicalKeyboards', 1, 'Sygaldry'), ('MechanicalKeyboards', 1, 'DzyDzyDino')]https://stackoverflow.com/questions/52576413
复制相似问题