我需要将R中的dplyr (和stringr)转换为python中的pandas。这在R中很简单,但是我还没能用熊猫把我的头绕在它周围。基本上,我需要按一个(或多个)列分组,然后将其余的列连接在一起,然后用分隔符折叠它们。R有一个很好的矢量化的str_c函数,它完全可以做我想做的事情。
以下是R代码:
library(tidyverse)
df <- as_tibble(structure(list(file = c(1, 1, 1, 2, 2, 2), marker = c("coi", "12s", "16s", "coi", "12s", "16s"), start = c(1, 22, 99, 12, 212, 199), end = c(15, 35, 102, 150, 350, 1102)), row.names = c(NA, -6L), class = "data.frame") )
df %>%
group_by(file) %>%
summarise(markers = str_c(marker,"[",start,":",end,"]",collapse="|"))
#> # A tibble: 2 × 2
#> file markers
#> <dbl> <chr>
#> 1 1 coi[1:15]|12s[22:35]|16s[99:102]
#> 2 2 coi[12:150]|12s[212:350]|16s[199:1102]这里是python代码的开头。我认为agg或transform存在一些诡计,但我不知道如何组合和连接多个列:
from io import StringIO
import pandas as pd
s = StringIO("""
file,marker,start,end
1.f,coi,1,15
1.f,12s,22,35
1.f,16s,99,102
2.f,coi,12,150
2.f,12s,212,350
2.f,16s,199,1102
""")
df = pd.read_csv(s)
# ... now what? ...发布于 2021-12-04 00:21:38
(df.astype(str)
.assign(markers = lambda df: df.marker + "[" + (df.start + ":"+df.end) + "]")
.groupby('file', as_index=False)
.markers
.agg("|".join)
)
file markers
0 1.f coi[1:15]|12s[22:35]|16s[99:102]
1 2.f coi[12:150]|12s[212:350]|16s[199:1102]其思想是在分组和聚合之前先将列与python的str.join方法组合
发布于 2021-12-04 00:41:45
创建新的列标记,将标记和最后两列连接起来:
Groupby按文件并连接新的列标记。
df['markers']=df['marker']+'['+(df.astype(str).iloc[:,2:].agg(list,1).str.join(':'))+']'
df.groupby('file')['markers'].apply(lambda x: x.str.cat(sep='|')).to_frame()
markers
file
1.f coi[1:15]|12s[22:35]|16s[99:102]
2.f coi[12:150]|12s[212:350]|16s[199:1102]发布于 2022-03-16 01:27:14
您可以使用datar进行类似于R中的操作:
>>> from datar.all import f, tibble, group_by, summarise, paste0
>>>
>>> df = tibble(
... file=[1, 1, 1, 2, 2, 2],
... marker=["coi", "12s", "16s"] * 2,
... start=[1, 22, 99, 12, 212, 199],
... end=[15, 35, 102, 1150, 350, 1102],
... )
>>> (
... df
... >> group_by(f.file)
... >> summarise(
... markers=paste0(
... f.marker, "[", f.start, ":", f.end, "]",
... collapse="|",
... )
... )
... )
file markers
<int64> <object>
0 1 coi[1:15]|12s[22:35]|16s[99:102]
1 2 coi[12:1150]|12s[212:350]|16s[199:1102]https://stackoverflow.com/questions/70221950
复制相似问题