文章/答案/技术大牛

发布

社区首页 >问答首页 >转换Stata中的数据变量名

问转换Stata中的数据变量名
EN

Stack Overflow用户

提问于 2021-10-21 17:52:12

回答 1查看 274关注 0票数 1

你好，我想使用国家科学基金会数据集，但原始excel文件变量名不能正确转换数据。是否有人有关于如何转换数据集以使其符合stata格式以进行分析的示例代码。这里是原始excel文件，这样您就可以理解问题了。

NSF网站的原始Excel文件

format

stata

transpose

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-10-22 14:40:59

*编辑以添加导入视图

记住Nick关于SO本质的贴切点，这里我确实有一些示例代码可能会有所帮助。如果不知道"transpose“是什么意思，我就不能给出确切的答案，但是您可以根据您的目的调整下面的reshape命令。

导入和视图

// Import Excel File defining TL and BR of table
    import excel "nsb20197-tabs02-012.xlsx", cellrange(A6:K177) clear
    list in 1/15

     +----------------------------------------------------------------------------------------------------------------------------------------+
     |                                A       B       C       D      E                   F       G       H       I                   J      K |
     |----------------------------------------------------------------------------------------------------------------------------------------|
  1. |                Associate's level                                                                                                       |
  2. | American Indian or Alaska Native                                                                                                       |
  3. |                       All fields    6282    4131    2151   65.8                34.2    8935    5697    3238                63.8   36.2 |
  4. |                              S&E     608     386     222   63.5                36.5     942     495     447                52.5   47.5 |
  5. |                      Engineering      17       4      13   23.5                76.5      50       8      42                  16     84 |
     |----------------------------------------------------------------------------------------------------------------------------------------|
  6. |                 Natural sciences     384     220     164   57.3                42.7     453     174     279                38.4   61.6 |
  7. |   Social and behavioral sciences     207     162      45   78.3                21.7     439     313     126                71.3   28.7 |
  8. |                          Non-S&E    5674    3745    1929     66                  34    7993    5202    2791   65.09999999999999   34.9 |
  9. |        Asian or Pacific Islander                                                                                                       |
 10. |                       All fields   27313   15522   11791   56.8                43.2   54809   30916   23893                56.4   43.6 |
     |----------------------------------------------------------------------------------------------------------------------------------------|
 11. |                              S&E    2649    1284    1365   48.5                51.5    7862    3492    4370                44.4   55.6 |
 12. |                      Engineering     160      23     137   14.4   85.59999999999999     574     111     463                19.3   80.7 |
 13. |                 Natural sciences    2010     939    1071   46.7                53.3    4419    1562    2857                35.3   64.7 |
 14. |   Social and behavioral sciences     479     322     157   67.2                32.8    2869    1819    1050                63.4   36.6 |
 15. |                          Non-S&E   24664   14238   10426   57.7                42.3   46947   27424   19523                58.4   41.6 |

构造数据

// Create supercategories
    
    * level = Column A if column A contains the word level (ignoring case)
    gen level = word(A,1) if ustrregexm(A, "level", 1), before(A)
    
    * demographic = Column A if next ob in column A contains the word all fields (ignoring case)
    gen demographic = A if ustrregexm(A[_n+1], "all fields", 1), before(A)
    
    * fill down demographic and level, and drop blank rows
    foreach v of varlist level demographic {
        replace `v' = `v'[_n-1] if missing(`v')
    }
    drop if mi(demographic) | demographic == A | regexm(A, level)
    
// rename variables 

    * rename A
    rename A field
    
    * rename count columns
    local list "B C D E F" 
    local year = 2000
    rename (`list') (all_`year' female_`year' male_`year' perc_female_`year' perc_male_`year' )
    
    local list "G H I J K" 
    local year = 2017
    rename (`list') (all_`year' female_`year' male_`year' perc_female_`year' perc_male_`year' )
    
    * destring 
    destring *_2000 *_2017, replace

整形为长

    * reshape long
    drop perc*
    reshape long all_ male_ female_, i(level demographic field) j(year)
    
    rename *_ degrees*
    reshape long degrees,  i(level demographic field year) j(gender) string

--一个如何通过字段进行大范围整形的例子

    * test reshape wide by field + ensure the variable name is less than 32 characters post reshape 
    replace field = lower(strtoname(field))
    replace field = substr(field, 1, 32 - strlen("degrees") - 1) 
    reshape wide degrees, i(level demographic year gender) j(field) string

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/69666648

复制

相似问题

问转换Stata中的数据变量名
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问转换Stata中的数据变量名EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问转换Stata中的数据变量名
EN