首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >我正在处理python问题来优化脚本

我正在处理python问题来优化脚本
EN

Stack Overflow用户
提问于 2020-06-01 09:20:28
回答 1查看 43关注 0票数 0
  1. 将option_labels列的行值转换为列标题
  2. (如果存在特定user_id的option_labels ),我将在创建的新列中应用option_values值,否则将为0.

样本数据是:(data.csv)

代码语言:javascript
复制
 user_id       country        option_values        option_labels

 abc456         Germany        256gb                  SSD
 abc123         Brazil         i5                    intel 
 xyz456         France         128gb                  SSD
 xyz123         Turkey         i7                    intel 
 abc123         Brazil         2gb                   nvidia
 abc456         Germany        32gb                   RAM
 xyz123         Turkey         4gb                   nvidia
 xyz456         France         16gb                   RAM

样本输出如下:

代码语言:javascript
复制
 user_id       country        option_values     option_labels     intel         nvidia       SSD        RAM 

 abc456         Germany        256gb             SSD                0              0        256gb        0
 abc123         Brazil         i5                intel              i5             0          0          0
 xyz456         France         256gb             SSD                0              0        128gb        0
 xyz123         Turkey         i7                intel              i7             0          0          0
 abc123         Brazil         2gb               nvidia             0              2gb        0          0  
 abc456         Germany        32gb              RAM                0              0          0          32gb
 xyz123         Turkey         4gb               nvidia             0              4gb        0          0
 xyz456         France         16gb              RAM                0              0          0          16gb

我用下面的示例代码完成了这个过程,

代码语言:javascript
复制
 import pandas as pd
 import numpy as np

 data_sample = pd.read_csv("data.csv")
 feature_list = data_sample["option_label"].unique().tolist()
 user_list = data_sample["user_id"].unique().tolist()
 country_list = data_sample["country"].unique().tolist()
 opt_val_list = data_sample["opt_val"].unique().tolist()

 def filterd_id(check_id):
     single_id_data= data_sample[data_sample['user_id'] == check_id]
     return single_id_data

 def finding_features(single_id_data):
     user_features = single_id_data["option_labels"].unique().tolist()
     return user_features

 def check_feature(feature_list, user_features): 
     feature_prs_not = []
     for i in feature_list:
         if(i in user_features):
             result = opt_val_list
         else:
             result = 0 
         feature_prs_not.append(result)          
     return feature_prs_not 

 user_id = []
 country = []

 for i in user_list: 
     check_id = i
     user_id.append(i)
     single_id_data = filterd_id(check_id)
     c = single_id_data["country"].unique().tolist()
     country.append(c)
     user_features = finding_features(single_id_data)
     feature_prst_not = check_feature(feature_list,user_features)    
     df = pd.DataFrame([feature_prst_not], columns = feature_list)
     df_feature = df_feature.append(df)
 df_user_id = pd.DataFrame(user_id, columns=['all_user_id'])
 df_country = pd.DataFrame(country, columns=['country_name'])

跑步要花更多的时间(如)。8-9小时),我的原始数据,近100 k ids。我仍然处于Python的学习阶段,我现在试图优化以减少脚本的运行时间。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-06-01 10:02:33

如果你想要它更快,你需要矢量化。我相信这段代码产生的输出与您的输出相同。

代码语言:javascript
复制
import numpy as np

for val in df['option_labels'].unique():
    df[val] = np.where(df['option_labels']==val, df['option_values'], 0)

我就是这样复制你的数据的

代码语言:javascript
复制
from io import StringIO

df = pd.read_csv(StringIO(''' 
"user_id","country","option_values","option_labels"
"abc456","Germany","256gb","SSD"
"abc123","Brazil","i5","intel" 
"xyz456","France","128gb","SSD"
"xyz123","Turkey","i7","intel" 
"abc123","Brazil","2gb","nvidia"
"abc456","Germany","32gb","RAM"
"xyz123","Turkey","4gb","nvidia"
"xyz456","France","16gb","RAM"'''))
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/62129137

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档