文章/答案/技术大牛

发布

社区首页 >问答首页 >基于三种相依性下降的大熊猫分类变量分析

问基于三种相依性下降的大熊猫分类变量分析
EN

Stack Overflow用户

提问于 2022-11-09 20:22:30

回答 1查看 84关注 0票数 1

我有一个数据文件，它看起来像这样：

df = pd.DataFrame ({'id': {0: 84, 1: 84, 2: 84, 3: 84, 4: 124},
               'Version': { 0: 'SemVer4', 1: 'Timestamps', 2: 'Snapshots', 3: 'Names', 4: 'Numbered Versions'},
               'server_Version': {0: 'v1', 1: 'v2', 2: 'api/v1', 3: '1.1.0', 4: 'v4'},
               'owner': {0: 'vmware', 1:'microsoft', 2: 'nasa', 3: 'swagger-API', 4:'sqaas'},
               'repo_name': {0: 'container-service-extension', 1: 'azure-rest-api-specs', 2: 'api.nasa.gov', 3: 'swagger-ui', 4: 'sqaas'},
               'filepath':{0: 'openapi.yaml', 1: 'dapper.json', 2: 'dockstore-webservice/src/main/resources/openapi3/openapi.yaml', 3: 'api/cmd/kubermatic-api/swagger.json', 4: 'cmd/spec/openapi.jsonsqaas'}})

我想要创建一个可视化，它以三个依赖的下拉列表作为输入：所有者、回购名称和filepath。这三者之间的关系是，owner代表主API名称，reponames是所有者下存储库的名称，filepath是所选回购中不同操作的名称。

我想要的下拉列表的输出如下：

我在Elasticsearch中创建了这些下拉列表，但是在分析数据时出现了问题，因为有些字段在kibana中没有得到正确的解析。

我想要应用下拉过滤的可视化方法是：

fig = px.scatter(df.query("owner=='swagger-api'"), x="Year", y="Month", color="Version", text="server_version")
fig.update_traces(textposition="bottom right")
fig.show()

我在这里要做的是这些年和几个月，我知道哪些API版本和哪个服务器版本被使用过，但是正如您在代码中看到的那样，当我为一个所有者这样做时，我需要一些过滤来显示特定回购和filepath。

我目前正在使用Plotly作为我的可视化工具，而且我对这个库非常陌生。如果有任何其他的图书馆，可以帮助我实现这一点，请告诉我。任何关于如何进行这方面的帮助或建议都是非常感谢的！

python

pandas

plotly

plotly-dash

plotly-python

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-11-11 20:42:34

由于您提到希望下拉列表是依赖的，因此下拉必须知道其他下拉列表的状态--这在plotly中是不可能的，但是在plotly-dash中是可能的，因为支持回调。

要做到这一点，我们可以编写一个更新函数，为您的三个下拉列表接受所有可能的选择，并根据您的df来更新所有其他下拉列表。我认为，在同样的功能中，相应地更新数字也是有意义的。

唯一棘手的部分是，当您从一个下拉列表中选择而不是从其他下拉列表中选择时，您将None作为输入，但是如果您清除下拉列表中的选择，则[]作为输入，因此您的更新函数需要考虑到这种情况。如果您清楚从所有下拉列表中选择的内容，您可能希望所有可能的下拉选项再次出现--这种可能性也会被考虑在内。

我还扩展了示例df以包括一些下拉列表的多行，以检查dash应用程序是否仍然适用于这种情况。我可能还没有考虑到一些边缘情况，但目前看来，这个解决方案似乎是可行的。

import pandas as pd
import plotly.express as px
from dash import Dash, dcc, html, Input, Output, ctx

df = pd.DataFrame ({'id': {0: 84, 1: 84, 2: 84, 3: 84, 4: 124, 5:1, 6:1},
               'Version': { 0: 'SemVer4', 1: 'Timestamps', 2: 'Snapshots', 3: 'Names', 4: 'Numbered Versions', 5: 'test', 6: 'test'},
               'server_Version': {0: 'v1', 1: 'v2', 2: 'api/v1', 3: '1.1.0', 4: 'v4', 5: 'v5', 6: 'v5'},
               'owner': {0: 'vmware', 1:'microsoft', 2: 'nasa', 3: 'swagger-API', 4:'sqaas',5:'vmware',6:'nasa'},
               'Year': {0: '2018', 1:'2020', 2:'2018', 3:'2019', 4:'2019', 5:'2021',6:'2021'},
               'Month': {0: 1, 1:6, 2:2, 3:4, 4:5, 5:5, 6:10},
               'repo_name': {0: 'container-service-extension', 1: 'azure-rest-api-specs', 2: 'api.nasa.gov', 3: 'swagger-ui', 4: 'sqaas', 5:'vmware-test',6:'nasa-test'},
               'filepath':{0: 'openapi.yaml', 1: 'dapper.json', 2: 'dockstore-webservice/src/main/resources/openapi3/openapi.yaml', 3: 'api/cmd/kubermatic-api/swagger.json', 4: 'cmd/spec/openapi.jsonsqaas', 5:'vmware-test-path',6:'nasa-test-path'}})

df = df.sort_values(by='Year')

## default is to show all data
fig = px.scatter(df, x="Year", y="Month", color="Version", text="server_Version")
fig.update_traces(textposition="bottom right")

app = Dash(__name__)

## three dependent dropdowns: owner, repo name and filepath
dropdown_selections = {
    category:df[category].unique().tolist() 
    for category in ['owner','repo_name','filepath']
}

dropdown_id_to_col_mapping = {
    'owner-dropdown':'owner',
    'repo-name-dropdown': 'repo_name',
    'filepath-dropdown': 'filepath'
}

app.layout = html.Div(
    [
        html.Div(
            children=[
                dcc.Dropdown(
                    dropdown_selections['owner'], 
                    id='owner-dropdown',
                    placeholder="Select owner",
                    style={"display": "inline-block", "width": "220px"},
                    multi=True,
                ),
                dcc.Dropdown(
                    dropdown_selections['repo_name'], 
                    id='repo-name-dropdown',
                    placeholder="Select repo name",
                    style={"display": "inline-block", "width": "220px", 'padding-left': '5px'}, 
                    multi=True
                ),
                dcc.Dropdown(
                    dropdown_selections['filepath'], 
                    id='filepath-dropdown', 
                    placeholder="Select filepath",
                    style={"display": "inline-block", "width": "220px", 'padding-left': '5px'},
                    multi=True
                )
            ],
            style={"padding": "10px", 'padding-left': '6%'},
        ),
        dcc.Graph(figure=fig, id='px-scatter-fig')
    ]
)

## callback so that a selection from one figure updates the others
@app.callback(
    Output('owner-dropdown', 'options'),
    Output('repo-name-dropdown', 'options'),
    Output('filepath-dropdown', 'options'),
    Output('px-scatter-fig', 'figure'),
    Input('owner-dropdown', 'value'),
    Input('repo-name-dropdown', 'value'),
    Input('filepath-dropdown', 'value'),
    prevent_initial_call=True
)
def update_dropdowns(owner_selection, repo_name_selection, filepath_selection):
    # print(f'you selected: {owner_selection}, {repo_name_selection}, {filepath_selection}')
    dropdown_selected = ctx.triggered_id
    col_selected = dropdown_id_to_col_mapping[dropdown_selected]

    change_dropdowns = ['owner-dropdown','repo-name-dropdown','filepath-dropdown']
    change_dropdowns.remove(dropdown_selected)

    ## if you clear ALL dropdown selections, then we reset all dropdowns
    ## (and this will skip all of the following other if statements)
    if (owner_selection == []) & (repo_name_selection == []) & (filepath_selection == []):
        owner_selection = dropdown_selections['owner']
        repo_name_selection = dropdown_selections['repo_name']
        filepath_selection = dropdown_selections['filepath']

    ## if any dropdowns are cleared or not selected, we want all possible selections to return
    ## this is because if we clear some but not all dropdowns, the conditions from other dropdowns remain in place
    ## and the new dropdown selections are calculated from the subset dataframe and should be correct
    if owner_selection == []:
        owner_selection = dropdown_selections['owner']
    if repo_name_selection == []:
        repo_name_selection = dropdown_selections['repo_name']
    if filepath_selection == []:
        filepath_selection = dropdown_selections['filepath']

    if owner_selection == None:
        owner_selection = dropdown_selections['owner']
    if repo_name_selection == None:
        repo_name_selection = dropdown_selections['repo_name']
    if filepath_selection == None:
        filepath_selection = dropdown_selections['filepath']
    
    ## subset the dataframe by dropdown conditions
    df_subset = df[
        df['owner'].isin(owner_selection)
        & df['repo_name'].isin(repo_name_selection)
        & df['filepath'].isin(filepath_selection)
    ]

    owner_selection = df_subset['owner'].unique().tolist()
    repo_name_selection = df_subset['repo_name'].unique().tolist()
    filepath_selection = df_subset['filepath'].unique().tolist()

    fig_update = px.scatter(df_subset, x="Year", y="Month", color="Version", text="server_Version")
    fig_update.update_traces(textposition="bottom right")

    return owner_selection, repo_name_selection, filepath_selection, fig_update

if __name__ == '__main__':
    app.run_server(debug=True)

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/74380944

复制

相似问题

问基于三种相依性下降的大熊猫分类变量分析
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问基于三种相依性下降的大熊猫分类变量分析EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问基于三种相依性下降的大熊猫分类变量分析
EN