首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何改进GNUwin32联接命令?

如何改进GNUwin32联接命令?
EN

Stack Overflow用户
提问于 2013-11-09 12:12:12
回答 1查看 186关注 0票数 0

无法使用join产生期望的结果。

我正在Windows 7 64位上运行GNUwin32。正在运行join版本5.3.0.1936和gawk版本3.1.6.2962。

输入了以下两个表:

Table_1

代码语言:javascript
复制
UID_C   CID
C000002 31799
C000002 31800
C000386 14950
C000386 9807916
C000386 10255083
C008114 5318432
C008117 799
C008117 444150
C008117 46878464

Table_2

代码语言:javascript
复制
UID_C   CID name
C000002 31799   bevonium
C000002 31800   bevonium
C002284 24832095    hypromellose
C008117 799 indoleglycerol phosphate
C008117 444150  indoleglycerol phosphate
C008117 46878464    indoleglycerol phosphate

在bat文件中使用以下命令:

代码语言:javascript
复制
C:\gnuwin32\bin\join -t"|" -1 1 -2 1 -a1 -a2 -e "NULL" -o "0,1.2,2.2,2.3" C:\directory\Table_1.txt C:\directory\Table_2.txt > C:\directory\Table_3.txt

在我对堆栈溢出的说明中,表是使用制表符格式化的,以便于阅读,但实际上我使用管道作为输入和输出分隔符。

下表为输出:

Table_3

代码语言:javascript
复制
UID_C   CID CID name
C000002 31800   31799   bevonium
C000002 31800   31800   bevonium
C000002 31799   31799   bevonium
C000002 31799   31800   bevonium
C000386 10255083    NULL    NULL
C000386 9807916 NULL    NULL
C000386 14950   NULL    NULL
C002284 NULL    24832095    hypromellose
C008114 5318432 NULL    NULL
C008117 46878464    799 indoleglycerol phosphate
C008117 46878464    444150  indoleglycerol phosphate
C008117 46878464    46878464    indoleglycerol phosphate
C008117 444150  799 indoleglycerol phosphate
C008117 444150  444150  indoleglycerol phosphate
C008117 444150  46878464    indoleglycerol phosphate
C008117 799 799 indoleglycerol phosphate
C008117 799 444150  indoleglycerol phosphate
C008117 799 46878464    indoleglycerol phosphate

期望的输出是:

Table_4

代码语言:javascript
复制
UID_C   CID name
C000002 31799   bevonium
C000002 31800   bevonium
C000386 14950   NULL
C000386 9807916 NULL
C000386 10255083    NULL
C002284 24832095    hypromellose
C008114 5318432 NULL
C008117 799 indoleglycerol phosphate
C008117 444150  indoleglycerol phosphate
C008117 46878464    indoleglycerol phosphate

如何更改join命令以生成所需的输出?

或者,我应该如何使用awk作为Table_3生成Table_4的后处理?

提前谢谢你的建议。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2013-11-09 14:08:28

我认为你需要更多的逻辑,而不是加入普罗捷德:

代码语言:javascript
复制
awk -F"|" -v "OFS=|" '
    NR==FNR {uid_cid[$1 OFS $2]=1; next}
    { 
        key = $1 OFS $2
        if (key in uid_cid) {
            delete uid_cid[key]
        }
        print
    }
    END {
        for (key in uid_cid) {
            print key, "NULL"
        }
    }
' Table_1 Table_2 | sort -k1,1 -k2,2n -t "|"
代码语言:javascript
复制
C000002|31799|bevonium
C000002|31800|bevonium
C000386|14950|NULL
C000386|9807916|NULL
C000386|10255083|NULL
C002284|24832095|hypromellose
C008114|5318432|NULL
C008117|799|indoleglycerol phosphate
C008117|444150|indoleglycerol phosphate
C008117|46878464|indoleglycerol phosphate
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/19875796

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档