首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >基于三种相依性下降的大熊猫分类变量分析

基于三种相依性下降的大熊猫分类变量分析
EN

Stack Overflow用户
提问于 2022-11-09 20:22:30
回答 1查看 84关注 0票数 1

我有一个数据文件,它看起来像这样:

代码语言:javascript
复制
df = pd.DataFrame ({'id': {0: 84, 1: 84, 2: 84, 3: 84, 4: 124},
               'Version': { 0: 'SemVer4', 1: 'Timestamps', 2: 'Snapshots', 3: 'Names', 4: 'Numbered Versions'},
               'server_Version': {0: 'v1', 1: 'v2', 2: 'api/v1', 3: '1.1.0', 4: 'v4'},
               'owner': {0: 'vmware', 1:'microsoft', 2: 'nasa', 3: 'swagger-API', 4:'sqaas'},
               'repo_name': {0: 'container-service-extension', 1: 'azure-rest-api-specs', 2: 'api.nasa.gov', 3: 'swagger-ui', 4: 'sqaas'},
               'filepath':{0: 'openapi.yaml', 1: 'dapper.json', 2: 'dockstore-webservice/src/main/resources/openapi3/openapi.yaml', 3: 'api/cmd/kubermatic-api/swagger.json', 4: 'cmd/spec/openapi.jsonsqaas'}})

我想要创建一个可视化,它以三个依赖的下拉列表作为输入:所有者回购名称filepath。这三者之间的关系是,owner代表主API名称,reponames是所有者下存储库的名称,filepath是所选回购中不同操作的名称。

我想要的下拉列表的输出如下:

我在Elasticsearch中创建了这些下拉列表,但是在分析数据时出现了问题,因为有些字段在kibana中没有得到正确的解析。

我想要应用下拉过滤的可视化方法是:

代码语言:javascript
复制
fig = px.scatter(df.query("owner=='swagger-api'"), x="Year", y="Month", color="Version", text="server_version")
fig.update_traces(textposition="bottom right")
fig.show()

我在这里要做的是这些年和几个月,我知道哪些API版本和哪个服务器版本被使用过,但是正如您在代码中看到的那样,当我为一个所有者这样做时,我需要一些过滤来显示特定回购和filepath。

我目前正在使用Plotly作为我的可视化工具,而且我对这个库非常陌生。如果有任何其他的图书馆,可以帮助我实现这一点,请告诉我。任何关于如何进行这方面的帮助或建议都是非常感谢的!

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-11-11 20:42:34

由于您提到希望下拉列表是依赖的,因此下拉必须知道其他下拉列表的状态--这在plotly中是不可能的,但是在plotly-dash中是可能的,因为支持回调

要做到这一点,我们可以编写一个更新函数,为您的三个下拉列表接受所有可能的选择,并根据您的df来更新所有其他下拉列表。我认为,在同样的功能中,相应地更新数字也是有意义的。

唯一棘手的部分是,当您从一个下拉列表中选择而不是从其他下拉列表中选择时,您将None作为输入,但是如果您清除下拉列表中的选择,则[]作为输入,因此您的更新函数需要考虑到这种情况。如果您清楚从所有下拉列表中选择的内容,您可能希望所有可能的下拉选项再次出现--这种可能性也会被考虑在内。

我还扩展了示例df以包括一些下拉列表的多行,以检查dash应用程序是否仍然适用于这种情况。我可能还没有考虑到一些边缘情况,但目前看来,这个解决方案似乎是可行的。

代码语言:javascript
复制
import pandas as pd
import plotly.express as px
from dash import Dash, dcc, html, Input, Output, ctx

df = pd.DataFrame ({'id': {0: 84, 1: 84, 2: 84, 3: 84, 4: 124, 5:1, 6:1},
               'Version': { 0: 'SemVer4', 1: 'Timestamps', 2: 'Snapshots', 3: 'Names', 4: 'Numbered Versions', 5: 'test', 6: 'test'},
               'server_Version': {0: 'v1', 1: 'v2', 2: 'api/v1', 3: '1.1.0', 4: 'v4', 5: 'v5', 6: 'v5'},
               'owner': {0: 'vmware', 1:'microsoft', 2: 'nasa', 3: 'swagger-API', 4:'sqaas',5:'vmware',6:'nasa'},
               'Year': {0: '2018', 1:'2020', 2:'2018', 3:'2019', 4:'2019', 5:'2021',6:'2021'},
               'Month': {0: 1, 1:6, 2:2, 3:4, 4:5, 5:5, 6:10},
               'repo_name': {0: 'container-service-extension', 1: 'azure-rest-api-specs', 2: 'api.nasa.gov', 3: 'swagger-ui', 4: 'sqaas', 5:'vmware-test',6:'nasa-test'},
               'filepath':{0: 'openapi.yaml', 1: 'dapper.json', 2: 'dockstore-webservice/src/main/resources/openapi3/openapi.yaml', 3: 'api/cmd/kubermatic-api/swagger.json', 4: 'cmd/spec/openapi.jsonsqaas', 5:'vmware-test-path',6:'nasa-test-path'}})

df = df.sort_values(by='Year')

## default is to show all data
fig = px.scatter(df, x="Year", y="Month", color="Version", text="server_Version")
fig.update_traces(textposition="bottom right")

app = Dash(__name__)

## three dependent dropdowns: owner, repo name and filepath
dropdown_selections = {
    category:df[category].unique().tolist() 
    for category in ['owner','repo_name','filepath']
}

dropdown_id_to_col_mapping = {
    'owner-dropdown':'owner',
    'repo-name-dropdown': 'repo_name',
    'filepath-dropdown': 'filepath'
}

app.layout = html.Div(
    [
        html.Div(
            children=[
                dcc.Dropdown(
                    dropdown_selections['owner'], 
                    id='owner-dropdown',
                    placeholder="Select owner",
                    style={"display": "inline-block", "width": "220px"},
                    multi=True,
                ),
                dcc.Dropdown(
                    dropdown_selections['repo_name'], 
                    id='repo-name-dropdown',
                    placeholder="Select repo name",
                    style={"display": "inline-block", "width": "220px", 'padding-left': '5px'}, 
                    multi=True
                ),
                dcc.Dropdown(
                    dropdown_selections['filepath'], 
                    id='filepath-dropdown', 
                    placeholder="Select filepath",
                    style={"display": "inline-block", "width": "220px", 'padding-left': '5px'},
                    multi=True
                )
            ],
            style={"padding": "10px", 'padding-left': '6%'},
        ),
        dcc.Graph(figure=fig, id='px-scatter-fig')
    ]
)

## callback so that a selection from one figure updates the others
@app.callback(
    Output('owner-dropdown', 'options'),
    Output('repo-name-dropdown', 'options'),
    Output('filepath-dropdown', 'options'),
    Output('px-scatter-fig', 'figure'),
    Input('owner-dropdown', 'value'),
    Input('repo-name-dropdown', 'value'),
    Input('filepath-dropdown', 'value'),
    prevent_initial_call=True
)
def update_dropdowns(owner_selection, repo_name_selection, filepath_selection):
    # print(f'you selected: {owner_selection}, {repo_name_selection}, {filepath_selection}')
    dropdown_selected = ctx.triggered_id
    col_selected = dropdown_id_to_col_mapping[dropdown_selected]

    change_dropdowns = ['owner-dropdown','repo-name-dropdown','filepath-dropdown']
    change_dropdowns.remove(dropdown_selected)

    ## if you clear ALL dropdown selections, then we reset all dropdowns
    ## (and this will skip all of the following other if statements)
    if (owner_selection == []) & (repo_name_selection == []) & (filepath_selection == []):
        owner_selection = dropdown_selections['owner']
        repo_name_selection = dropdown_selections['repo_name']
        filepath_selection = dropdown_selections['filepath']

    ## if any dropdowns are cleared or not selected, we want all possible selections to return
    ## this is because if we clear some but not all dropdowns, the conditions from other dropdowns remain in place
    ## and the new dropdown selections are calculated from the subset dataframe and should be correct
    if owner_selection == []:
        owner_selection = dropdown_selections['owner']
    if repo_name_selection == []:
        repo_name_selection = dropdown_selections['repo_name']
    if filepath_selection == []:
        filepath_selection = dropdown_selections['filepath']

    if owner_selection == None:
        owner_selection = dropdown_selections['owner']
    if repo_name_selection == None:
        repo_name_selection = dropdown_selections['repo_name']
    if filepath_selection == None:
        filepath_selection = dropdown_selections['filepath']
    
    ## subset the dataframe by dropdown conditions
    df_subset = df[
        df['owner'].isin(owner_selection)
        & df['repo_name'].isin(repo_name_selection)
        & df['filepath'].isin(filepath_selection)
    ]

    owner_selection = df_subset['owner'].unique().tolist()
    repo_name_selection = df_subset['repo_name'].unique().tolist()
    filepath_selection = df_subset['filepath'].unique().tolist()

    fig_update = px.scatter(df_subset, x="Year", y="Month", color="Version", text="server_Version")
    fig_update.update_traces(textposition="bottom right")

    return owner_selection, repo_name_selection, filepath_selection, fig_update

if __name__ == '__main__':
    app.run_server(debug=True)

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/74380944

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档