首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >转换Stata中的数据变量名

转换Stata中的数据变量名
EN

Stack Overflow用户
提问于 2021-10-21 17:52:12
回答 1查看 274关注 0票数 1

你好,我想使用国家科学基金会数据集,但原始excel文件变量名不能正确转换数据。是否有人有关于如何转换数据集以使其符合stata格式以进行分析的示例代码。这里是原始excel文件,这样您就可以理解问题了。

NSF网站的原始Excel文件

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-10-22 14:40:59

*编辑以添加导入视图

记住Nick关于SO本质的贴切点,这里我确实有一些示例代码可能会有所帮助。如果不知道"transpose“是什么意思,我就不能给出确切的答案,但是您可以根据您的目的调整下面的reshape命令。

导入和视图

代码语言:javascript
复制
// Import Excel File defining TL and BR of table
    import excel "nsb20197-tabs02-012.xlsx", cellrange(A6:K177) clear
    list in 1/15

     +----------------------------------------------------------------------------------------------------------------------------------------+
     |                                A       B       C       D      E                   F       G       H       I                   J      K |
     |----------------------------------------------------------------------------------------------------------------------------------------|
  1. |                Associate's level                                                                                                       |
  2. | American Indian or Alaska Native                                                                                                       |
  3. |                       All fields    6282    4131    2151   65.8                34.2    8935    5697    3238                63.8   36.2 |
  4. |                              S&E     608     386     222   63.5                36.5     942     495     447                52.5   47.5 |
  5. |                      Engineering      17       4      13   23.5                76.5      50       8      42                  16     84 |
     |----------------------------------------------------------------------------------------------------------------------------------------|
  6. |                 Natural sciences     384     220     164   57.3                42.7     453     174     279                38.4   61.6 |
  7. |   Social and behavioral sciences     207     162      45   78.3                21.7     439     313     126                71.3   28.7 |
  8. |                          Non-S&E    5674    3745    1929     66                  34    7993    5202    2791   65.09999999999999   34.9 |
  9. |        Asian or Pacific Islander                                                                                                       |
 10. |                       All fields   27313   15522   11791   56.8                43.2   54809   30916   23893                56.4   43.6 |
     |----------------------------------------------------------------------------------------------------------------------------------------|
 11. |                              S&E    2649    1284    1365   48.5                51.5    7862    3492    4370                44.4   55.6 |
 12. |                      Engineering     160      23     137   14.4   85.59999999999999     574     111     463                19.3   80.7 |
 13. |                 Natural sciences    2010     939    1071   46.7                53.3    4419    1562    2857                35.3   64.7 |
 14. |   Social and behavioral sciences     479     322     157   67.2                32.8    2869    1819    1050                63.4   36.6 |
 15. |                          Non-S&E   24664   14238   10426   57.7                42.3   46947   27424   19523                58.4   41.6 |

构造数据

代码语言:javascript
复制
// Create supercategories
    
    * level = Column A if column A contains the word level (ignoring case)
    gen level = word(A,1) if ustrregexm(A, "level", 1), before(A)
    
    * demographic = Column A if next ob in column A contains the word all fields (ignoring case)
    gen demographic = A if ustrregexm(A[_n+1], "all fields", 1), before(A)
    
    * fill down demographic and level, and drop blank rows
    foreach v of varlist level demographic {
        replace `v' = `v'[_n-1] if missing(`v')
    }
    drop if mi(demographic) | demographic == A | regexm(A, level)
    
// rename variables 

    * rename A
    rename A field
    
    * rename count columns
    local list "B C D E F" 
    local year = 2000
    rename (`list') (all_`year' female_`year' male_`year' perc_female_`year' perc_male_`year' )
    
    local list "G H I J K" 
    local year = 2017
    rename (`list') (all_`year' female_`year' male_`year' perc_female_`year' perc_male_`year' )
    
    * destring 
    destring *_2000 *_2017, replace

整形为长

代码语言:javascript
复制
    * reshape long
    drop perc*
    reshape long all_ male_ female_, i(level demographic field) j(year)
    
    rename *_ degrees*
    reshape long degrees,  i(level demographic field year) j(gender) string

--一个如何通过字段进行大范围整形的例子

代码语言:javascript
复制
    * test reshape wide by field + ensure the variable name is less than 32 characters post reshape 
    replace field = lower(strtoname(field))
    replace field = substr(field, 1, 32 - strlen("degrees") - 1) 
    reshape wide degrees, i(level demographic year gender) j(field) string
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/69666648

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档