首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >重新排列大熊猫的弦

重新排列大熊猫的弦
EN

Stack Overflow用户
提问于 2022-10-20 17:08:07
回答 1查看 24关注 0票数 1

我在dataframe中有一个列,其类型是object。该列同时包含数组和字符串,其唯一值如下:

代码语言:javascript
复制
array(['[nan]', "['3. Medical and Health Sciences']",
       "['1. Natural Sciences']", "['2. Engineering and Technology']",
       "['1. Natural Sciences' '3. Medical and Health Sciences']",
       "['3. Medical and Health Sciences' '1. Natural Sciences']",
       "['5. Social Sciences']",
       "['1. Natural Sciences' '4. Agricultural Sciences'\n '3. Medical and Health Sciences']",
       "['1. Natural Sciences' '6. Humanities']",
       "['4. Agricultural Sciences']",
       "['1. Natural Sciences' '2. Engineering and Technology']",
       "['2. Engineering and Technology' '1. Natural Sciences']",
       "['6. Humanities']",
       "['3. Medical and Health Sciences' '6. Humanities']",
       "['4. Agricultural Sciences' '3. Medical and Health Sciences']",
       "['3. Medical and Health Sciences' '2. Engineering and Technology']",
       "['3. Medical and Health Sciences' '4. Agricultural Sciences']",
       "['2. Engineering and Technology' '3. Medical and Health Sciences']",
       "['1. Natural Sciences' '4. Agricultural Sciences']",
       "['3. Medical and Health Sciences' '5. Social Sciences']",
       "['4. Agricultural Sciences' '2. Engineering and Technology']",
       "['6. Humanities' '1. Natural Sciences']",
       "['6. Humanities' '2. Engineering and Technology']",
       "['2. Engineering and Technology' '5. Social Sciences']",
       "['1. Natural Sciences' '5. Social Sciences']",
       "['3. Medical and Health Sciences' '2. Engineering and Technology'\n '1. Natural Sciences']",
       "['2. Engineering and Technology' '3. Medical and Health Sciences'\n '1. Natural Sciences']",
       "['4. Agricultural Sciences' '1. Natural Sciences']",
       "['3. Medical and Health Sciences' '1. Natural Sciences'\n '2. Engineering and Technology']",
       "['5. Social Sciences' '1. Natural Sciences']",
       "['2. Engineering and Technology' '6. Humanities']",
       "['2. Engineering and Technology' '4. Agricultural Sciences']",
       "['5. Social Sciences' '4. Agricultural Sciences']",
       "['3. Medical and Health Sciences' '5. Social Sciences'\n '1. Natural Sciences']",
       "['6. Humanities' '3. Medical and Health Sciences']",
       "['6. Humanities' '5. Social Sciences']",
       "['5. Social Sciences' '3. Medical and Health Sciences']",
       "['5. Social Sciences' '2. Engineering and Technology']",
       "['5. Social Sciences' '6. Humanities']",
       "['1. Natural Sciences' '3. Medical and Health Sciences'\n '2. Engineering and Technology']",
       "['2. Engineering and Technology' '4. Agricultural Sciences'\n '1. Natural Sciences']",
       "['1. Natural Sciences' '2. Engineering and Technology' '6. Humanities']",
       "['3. Medical and Health Sciences' '1. Natural Sciences'\n '5. Social Sciences']",
       "['4. Agricultural Sciences' '1. Natural Sciences'\n '3. Medical and Health Sciences']",
       "['1. Natural Sciences' '2. Engineering and Technology'\n '3. Medical and Health Sciences']",
       "['2. Engineering and Technology' '1. Natural Sciences'\n '3. Medical and Health Sciences']",
       "['4. Agricultural Sciences' '5. Social Sciences']",
       "['2. Engineering and Technology' '1. Natural Sciences'\n '5. Social Sciences']",
       "['2. Engineering and Technology' '1. Natural Sciences'\n '4. Agricultural Sciences']",
       "['2. Engineering and Technology' '5. Social Sciences'\n '1. Natural Sciences']",
       "['1. Natural Sciences' '6. Humanities' '3. Medical and Health Sciences']",
       "['4. Agricultural Sciences' '3. Medical and Health Sciences'\n '1. Natural Sciences']",
       "['3. Medical and Health Sciences' '5. Social Sciences' '6. Humanities'\n '2. Engineering and Technology']",
       "['3. Medical and Health Sciences' '1. Natural Sciences'\n '4. Agricultural Sciences']",
       "['6. Humanities' '4. Agricultural Sciences' '1. Natural Sciences'\n '2. Engineering and Technology']",
       "['1. Natural Sciences' '3. Medical and Health Sciences'\n '5. Social Sciences' '2. Engineering and Technology']",
       "['1. Natural Sciences' '2. Engineering and Technology'\n '5. Social Sciences']",
       "['6. Humanities' '2. Engineering and Technology' '1. Natural Sciences']",
       "['1. Natural Sciences' '2. Engineering and Technology'\n '4. Agricultural Sciences']",
       "['1. Natural Sciences' '3. Medical and Health Sciences'\n '5. Social Sciences']",
       "['3. Medical and Health Sciences' '6. Humanities'\n '2. Engineering and Technology']",
       "['3. Medical and Health Sciences' '1. Natural Sciences' '6. Humanities']",
       "['2. Engineering and Technology' '6. Humanities' '1. Natural Sciences']",
       "['3. Medical and Health Sciences' '4. Agricultural Sciences'\n '1. Natural Sciences']",
       "['1. Natural Sciences' '5. Social Sciences'\n '2. Engineering and Technology']",
       "['1. Natural Sciences' '3. Medical and Health Sciences'\n '2. Engineering and Technology' '4. Agricultural Sciences']",
       "['1. Natural Sciences' '3. Medical and Health Sciences' '6. Humanities']",
       "['1. Natural Sciences' '5. Social Sciences'\n '3. Medical and Health Sciences']",
       "['6. Humanities' '3. Medical and Health Sciences' '1. Natural Sciences']",
       "['1. Natural Sciences' '3. Medical and Health Sciences'\n '4. Agricultural Sciences']",
       "['1. Natural Sciences' '6. Humanities' '2. Engineering and Technology']",
       "['4. Agricultural Sciences' '2. Engineering and Technology'\n '1. Natural Sciences']",
       "['5. Social Sciences' '3. Medical and Health Sciences'\n '1. Natural Sciences']",
       "['6. Humanities' '1. Natural Sciences' '2. Engineering and Technology']",
       "['3. Medical and Health Sciences' '6. Humanities' '1. Natural Sciences']",
       "['5. Social Sciences' '1. Natural Sciences'\n '2. Engineering and Technology']",
       "['2. Engineering and Technology' '4. Agricultural Sciences'\n '6. Humanities']",
       "['1. Natural Sciences' '5. Social Sciences'\n '2. Engineering and Technology' '3. Medical and Health Sciences']",
       "['6. Humanities' '3. Medical and Health Sciences'\n '2. Engineering and Technology']",
       "['4. Agricultural Sciences' '1. Natural Sciences'\n '2. Engineering and Technology']",
       "['6. Humanities' '5. Social Sciences' '1. Natural Sciences']",
       "['5. Social Sciences' '1. Natural Sciences'\n '3. Medical and Health Sciences']",
       "['6. Humanities' '1. Natural Sciences' '3. Medical and Health Sciences']",
       "['5. Social Sciences' '2. Engineering and Technology'\n '1. Natural Sciences']",
       "['6. Humanities' '1. Natural Sciences' '4. Agricultural Sciences']",
       "['1. Natural Sciences' '4. Agricultural Sciences'\n '2. Engineering and Technology']",
       "['4. Agricultural Sciences' '1. Natural Sciences'\n '3. Medical and Health Sciences' '2. Engineering and Technology']",
       "['6. Humanities' '3. Medical and Health Sciences' '1. Natural Sciences'\n '2. Engineering and Technology']",
       "['4. Agricultural Sciences' '5. Social Sciences' '1. Natural Sciences']",
       "['2. Engineering and Technology' '5. Social Sciences'\n '4. Agricultural Sciences']",
       "['4. Agricultural Sciences' '2. Engineering and Technology'\n '6. Humanities']",
       "['3. Medical and Health Sciences' '1. Natural Sciences'\n '5. Social Sciences' '2. Engineering and Technology']",
       "['2. Engineering and Technology' '1. Natural Sciences' '6. Humanities']",
       "['3. Medical and Health Sciences' '6. Humanities'\n '2. Engineering and Technology' '5. Social Sciences'\n '1. Natural Sciences']",
       "['4. Agricultural Sciences' '3. Medical and Health Sciences'\n '2. Engineering and Technology' '1. Natural Sciences']",
       "['3. Medical and Health Sciences' '2. Engineering and Technology'\n '4. Agricultural Sciences' '1. Natural Sciences']",
       "['2. Engineering and Technology' '5. Social Sciences'\n '3. Medical and Health Sciences']",
       "['3. Medical and Health Sciences' '4. Agricultural Sciences'\n '2. Engineering and Technology']",
       "['6. Humanities' '2. Engineering and Technology'\n '3. Medical and Health Sciences']",
       "['4. Agricultural Sciences' '6. Humanities']",
       "['1. Natural Sciences' '3. Medical and Health Sciences'\n '4. Agricultural Sciences' '2. Engineering and Technology']",
       "['3. Medical and Health Sciences' '2. Engineering and Technology'\n '6. Humanities']",
       "['2. Engineering and Technology' '1. Natural Sciences'\n '3. Medical and Health Sciences' '4. Agricultural Sciences']",
       "['1. Natural Sciences' '6. Humanities' '3. Medical and Health Sciences'\n '2. Engineering and Technology']",
       "['4. Agricultural Sciences' '1. Natural Sciences' '5. Social Sciences']",
       "['2. Engineering and Technology' '5. Social Sciences' '6. Humanities']",
       "['1. Natural Sciences' '2. Engineering and Technology'\n '3. Medical and Health Sciences' '6. Humanities']",
       "['1. Natural Sciences' '4. Agricultural Sciences' '5. Social Sciences']",
       "['3. Medical and Health Sciences' '5. Social Sciences'\n '2. Engineering and Technology']",
       "['5. Social Sciences' '1. Natural Sciences' '4. Agricultural Sciences']",
       "['2. Engineering and Technology' '3. Medical and Health Sciences'\n '1. Natural Sciences' '5. Social Sciences']",
       "['2. Engineering and Technology' '6. Humanities'\n '3. Medical and Health Sciences' '1. Natural Sciences']",
       "['5. Social Sciences' '6. Humanities' '3. Medical and Health Sciences']",
       "['2. Engineering and Technology' '6. Humanities'\n '3. Medical and Health Sciences']",
       "['5. Social Sciences' '6. Humanities' '1. Natural Sciences']"],
      dtype=object)

也许一个MWE来复制数据是这样的:

代码语言:javascript
复制
# initialize list of lists
data = [[1, [7,3], [1,1], "['5. Social Sciences' '6. Humanities' '1. Natural Sciences']"], [2, [1,5], [2,1], "['2. Engineering and Technology' '5. Social Sciences'\n '4. Agricultural Sciences']"], [3, [1,2,6], [2,0,2], '[nan]'],[5, [1,2], [2,0], "['1. Natural Sciences']"]]
  
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['docdb', 'cited_patents','dist_cited_patents','Fields'])

现在,我想获得的实际上是两个变量:一个是字符串列表(new_var1 ),另一个是简单字符串(下一个结果db中的new_var2),如下所示:

代码语言:javascript
复制
docdb. cited_patents dist_cited_patents new_var1
1           [7, 3]  [1, 1]              ['Social Sciences','Humanities','Natural Sciences']
2           [1, 5]  [2, 1]              ['Engineering and Technology','Social Sciences', 'Agricultural Sciences']
3         [1, 2, 6] [2, 0, 2]           []
5          [1, 2]   [2, 0]              ['Natural Sciences']

var2关心的是什么

代码语言:javascript
复制
docdb. cited_patents dist_cited_patents new_var2
1           [7, 3]  [1, 1]              'Social Sciences, Humanities, Natural Sciences'
2           [1, 5]  [2, 1]              'Engineering and Technology,Social Sciences,Agricultural Sciences'
3         [1, 2, 6] [2, 0, 2]           ''
5          [1, 2]   [2, 0]              'Natural Sciences'

非常感谢

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-10-20 17:20:31

我将使用regex模式查找所有匹配的事件:

代码语言:javascript
复制
df['new_var1'] = df['Fields'].str.findall(r"'\d+\.\s*(.*?)'")
df['new_var2'] = df['new_var1'].str.join(', ')

代码语言:javascript
复制
   docdb cited_patents dist_cited_patents                                                                               Fields                                                              new_var1                                                            new_var2
0      1        [7, 3]             [1, 1]                         ['5. Social Sciences' '6. Humanities' '1. Natural Sciences']                       [Social Sciences, Humanities, Natural Sciences]                       Social Sciences, Humanities, Natural Sciences
1      2        [1, 5]             [2, 1]  ['2. Engineering and Technology' '5. Social Sciences'\n '4. Agricultural Sciences']  [Engineering and Technology, Social Sciences, Agricultural Sciences]  Engineering and Technology, Social Sciences, Agricultural Sciences
2      3     [1, 2, 6]          [2, 0, 2]                                                                                [nan]                                                                    []                                                                    
3      5        [1, 2]             [2, 0]                                                              ['1. Natural Sciences']                                                    [Natural Sciences]                                                    Natural Sciences
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/74143758

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档