首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >R GSRUB函数

R GSRUB函数
EN

Stack Overflow用户
提问于 2021-04-20 13:36:07
回答 1查看 22关注 0票数 0

使用公共数据集,其中变量分类存储描述与LicenseNo关联的许可证类型的代码。任何许可证都可以有1到19种与不同的licenseNo相关联的不同的并发许可类型。一个函数似乎是将分类划分为1:19新列分类的正确策略。不知道从哪里开始。我还需要把代码转换成描述,创建了一个表来支持这篇文章,因为我从网站上读到的东西认为它可以作为一个rda文件被拉进来。不知道从哪里开始。

代码语言:javascript
复制
LicenseNo <- c("1000002","1000003","1000012","1000015","1000026")
Classifications <- c("C57","C-6","B","C60| C51", "HAZ| C36| C10| A| B| C57| C-2| C-8| C12| C21| C27| C29| C35| C42| C45| C39| C50| C51| C31")
data <- data.frame(LicenseNo,Classifications)
View(data)

Descriptions <- c("Cabinet, Millwork and Finish Carpentry Contractor","General Building Contractor",
                  "Well Drilling Contractor", "Structural Steel Contractor","Welding Contractor",
                  "Hazardous Substance Removal Certification","Plumbing Contractor","Electrical Contractor",
                  "General Engineering Contractor", "Insulation and Acoustical Contractor")
Classifications <- c("C-6","B","C57","C51","C60","HAZ","C36","C10","A","C-2")
class_type <- data.frame(Descriptions,Classifications)
View(class_type)

最后,为了创建以下输出,...only列出了用于观察1000026的4种分类以简化。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-04-20 13:51:08

潮间带

代码语言:javascript
复制
library(dplyr)
# library(tidyr) # unnest, pivot_*
out <- data %>%
  mutate(Classifications = strsplit(Classifications, "[|\\s]+")) %>%
  tidyr::unnest(Classifications) %>%
  mutate(Classifications = trimws(Classifications)) %>%
  left_join(class_type, by = "Classifications") %>%
  mutate(Classifications = coalesce(Descriptions, Classifications)) %>%
  select(-Descriptions)

out
# # A tibble: 24 x 2
#    LicenseNo Classifications                                  
#    <chr>     <chr>                                            
#  1 1000002   Well Drilling Contractor                         
#  2 1000003   Cabinet, Millwork and Finish Carpentry Contractor
#  3 1000012   General Building Contractor                      
#  4 1000015   Welding Contractor                               
#  5 1000015   Structural Steel Contractor                      
#  6 1000026   Hazardous Substance Removal Certification        
#  7 1000026   Plumbing Contractor                              
#  8 1000026   Electrical Contractor                            
#  9 1000026   General Engineering Contractor                   
# 10 1000026   General Building Contractor                      
# # ... with 14 more rows

备注:由于缺少组件,所以对描述和原始分类进行了coalesce。例如,如果没有coalesce,我们将看到:

代码语言:javascript
复制
out <- data %>%
  mutate(Classifications = strsplit(Classifications, "[|\\s]+")) %>%
  tidyr::unnest(Classifications) %>%
  mutate(Classifications = trimws(Classifications)) %>%
  left_join(class_type, by = "Classifications")
print(out,n=99)
# # A tibble: 24 x 3
#    LicenseNo Classifications Descriptions                                     
#    <chr>     <chr>           <chr>                                            
#  1 1000002   C57             Well Drilling Contractor                         
#  2 1000003   C-6             Cabinet, Millwork and Finish Carpentry Contractor
#  3 1000012   B               General Building Contractor                      
#  4 1000015   C60             Welding Contractor                               
#  5 1000015   C51             Structural Steel Contractor                      
#  6 1000026   HAZ             Hazardous Substance Removal Certification        
#  7 1000026   C36             Plumbing Contractor                              
#  8 1000026   C10             Electrical Contractor                            
#  9 1000026   A               General Engineering Contractor                   
# 10 1000026   B               General Building Contractor                      
# 11 1000026   C57             Well Drilling Contractor                         
# 12 1000026   C-2             Insulation and Acoustical Contractor             
# 13 1000026   C-8             <NA>                                             
# 14 1000026   C12             <NA>                                             
# 15 1000026   C21             <NA>                                             
# 16 1000026   C27             <NA>                                             
# 17 1000026   C29             <NA>                                             
# 18 1000026   C35             <NA>                                             
# 19 1000026   C42             <NA>                                             
# 20 1000026   C45             <NA>                                             
# 21 1000026   C39             <NA>                                             
# 22 1000026   C50             <NA>                                             
# 23 1000026   C51             Structural Steel Contractor                      
# 24 1000026   C31             <NA>                                             

我的猜测是,您更愿意保留“某样东西”的副NA,所以当缺少描述时,我默认使用分类替换NA。如果您的数据没有这样的顾虑,那么您可以跳过这一步(只需将描述重命名为分类)。

长格式对许多事情都有好处(尤指ggplot2和类似的“整洁”操作),但是如果您希望它是宽格式的,那么

代码语言:javascript
复制
out %>%
  group_by(LicenseNo) %>%
  mutate(rn = paste0("Classification", row_number())) %>%
  ungroup() %>%
  tidyr::pivot_wider(LicenseNo, names_from = rn, values_from = Classifications)
# # A tibble: 5 x 20
#   LicenseNo Classification1 Classification2 Classification3 Classification4 Classification5 Classification6 Classification7
#   <chr>     <chr>           <chr>           <chr>           <chr>           <chr>           <chr>           <chr>          
# 1 1000002   Well Drilling ~ <NA>            <NA>            <NA>            <NA>            <NA>            <NA>           
# 2 1000003   Cabinet, Millw~ <NA>            <NA>            <NA>            <NA>            <NA>            <NA>           
# 3 1000012   General Buildi~ <NA>            <NA>            <NA>            <NA>            <NA>            <NA>           
# 4 1000015   Welding Contra~ Structural Ste~ <NA>            <NA>            <NA>            <NA>            <NA>           
# 5 1000026   Hazardous Subs~ Plumbing Contr~ Electrical Con~ General Engine~ General Buildi~ Well Drilling ~ Insulation and~
# # ... with 12 more variables: Classification8 <chr>, Classification9 <chr>, Classification10 <chr>, Classification11 <chr>,
# #   Classification12 <chr>, Classification13 <chr>, Classification14 <chr>, Classification15 <chr>, Classification16 <chr>,
# #   Classification17 <chr>, Classification18 <chr>, Classification19 <chr>
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/67180080

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档