首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何用python和streamlit过滤数据

如何用python和streamlit过滤数据
EN

Stack Overflow用户
提问于 2022-08-31 00:39:20
回答 1查看 126关注 0票数 0

下面的dataframe有一个字段tech包含与-分隔的字符串值,我想将这个字段作为标记,一旦用户选择其中一个值,它就会显示按project_title分组的数据。

代码:

代码语言:javascript
复制
import streamlit as st
import pandas as pd
import numpy as np

data = {
    'project_title': ['LSE', 'DCP', 'Job-detection', 'Task management & Organizer'],
    'tech': ['python-RegExp-PyQt5', 'python-RegExp', 'python-RegExp-BeautifulSoup-pandas', 'python-pandas-MS_SQL-CSS/HTML-Javascript'],
    'Role': ['Junior developer ', 'Python developer', 'Python developer', 'Tech lead']
}

#split the tech column into multiple columns
df[['tech1','tech2','tech3','tech4','tech5']]=df['tech'].str.split('-', expand=True)

#create seperated list of each tech column
tech1 = df["tech1"].unique().tolist()
tech2 = df["tech2"].unique().tolist()
tech3 = df["tech3"].unique().tolist()
tech4 = df["tech4"].unique().tolist()
tech5 = df["tech5"].unique().tolist()

#concatinate all the lists into one list.
tech_all = tech1+tech2+tech3+tech4+tech5
tech_all = list(filter(None, tech_all))

#create multiselect widget that includes the created list of tech
regular_search_term =tech_all
choices = st.multiselect(" ",regular_search_term)

#return the dataframe based on the selected values from the multiselect widget.
df_result_search=df[df.loc[:,"tech1":"tech5"].isin(choices)]

st.write(df_result_search)

上面的代码没有返回我想要的结果。

基于@BeRT2me的回答

代码语言:javascript
复制
regular_search_term =df.tech.unique().tolist()

choices = st.selectbox(" ",regular_search_term)
df.loc[df.tech.eq(choices), 'project_title']

#  the below code doesn't return the correct result
choices = st.multiselect(" ",regular_search_term)
df.loc[df.tech.isin(choices), 'project_title']
EN

回答 1

Stack Overflow用户

发布于 2022-08-31 01:02:10

您可能不需要expand=True,而是要explode它。

代码语言:javascript
复制
data = {
        'project_title': ['LSE', 'DCP', 'Job-detection', 'Task management & Organizer'],
        'tech': ['python-RegExp-PyQt5', 'python-RegExp', 'python-RegExp-BeautifulSoup-pandas', 'python-pandas-MS_SQL-CSS/HTML-Javascript'],
        'Role': ['Junior developer', 'Python developer', 'Python developer', 'Tech lead']
}

df = pd.DataFrame(data)

df.tech = df.tech.str.split('-')
df = df.explode('tech', ignore_index=True)

print(df)

# Output:

                  project_title           tech              Role
0                           LSE         python  Junior developer
1                           LSE         RegExp  Junior developer
2                           LSE          PyQt5  Junior developer
3                           DCP         python  Python developer
4                           DCP         RegExp  Python developer
5                 Job-detection         python  Python developer
6                 Job-detection         RegExp  Python developer
7                 Job-detection  BeautifulSoup  Python developer
8                 Job-detection         pandas  Python developer
9   Task management & Organizer         python         Tech lead
10  Task management & Organizer         pandas         Tech lead
11  Task management & Organizer         MS_SQL         Tech lead
12  Task management & Organizer       CSS/HTML         Tech lead
13  Task management & Organizer     Javascript         Tech lead

现在,您可以使用特定的技术搜索项目标题:

代码语言:javascript
复制
>>> df.loc[df.tech.eq('pandas'), 'project_title']
8                   Job-detection
10    Task management & Organizer
Name: project_title, dtype: object

可能有更有效的方法..。但我现在能想到的最好的就是这个。

给定一个可迭代的术语,我们可以找到具有如下所有条件的项目:

代码语言:javascript
复制
>>> terms = ('pandas', 'RegExp')
>>> df.groupby('project_title')['tech'].agg(lambda x: all(x.eq(term).any() for term in terms))[lambda x: x].index.to_list()
['Job-detection']
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/73549772

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档