在原始数据文件中读取有问题。问题是,由于分隔符的存在,一些输入被切断。因为其中一个标题在实际标题前面有"\“,所以Book_Title输出仅为"\”。我想知道是否有办法忽略这些符号。
输入:
0195153448;"Classical Mythology";"Mark P. O. Morford";"2002";"Oxford University Press"
085409878X;"\"Pie-powder\"; being dust from the law courts;John Alderson Foote";"1973";"EP Publishing"守则:
data rating.books;
infile "&path\BX-Books.csv" dlm=';' missover dsd firstobs=2;
input ISBN: $12.
Book_Title: $quote150.
Book_Author: $quote60.
Year_Of_Publication: $quote8.
Publisher: $quote60.;
run;输出:
ISBN | Book-Title | Book-Author | Publisher | Publication-Year
0195153448 | Classical Mythology | Mark P. O. Morford | Oxford University Press | 2002
085409878X | \ | being dust from the law courts,"| 1973 | Missing value 期望产出:
ISBN | Book-Title | Book-Author | Publisher | Publication-Year
0195153448 | Classical Mythology | Mark P. O. Morford | Oxford University Press | 2002
085409878X | Pie-powder being dust from the law courts |John Alderson Foote | EP Publishing | 1973 发布于 2017-12-04 17:40:11
看起来您的源数据并不遵循任何已知的模式。
如果您不使用DSD选项来读取它,那么它将把第二行处理为有6个字段。
085409878X;"\"Pie-powder\"; being dust from the law courts;John Alderson Foote";"1973";"EP Publishing"
v1=085409878X
v2="\"Pie-powder\"
v3=being dust from the law courts
v4=John Alderson Foote"
v5="1973"
v6="EP Publishing"如果你试图“修正”转义引号
_infile_=tranwrd(_infile_,'\"','""');然后,您将结束只有4个字段。
085409878X;"""Pie-powder""; being dust from the law courts;John Alderson Foote";"1973";"EP Publishing"
v1=085409878X
v2="Pie-powder"; being dust from the law courts;John Alderson Foote
v3=1973
v4=EP Publishing
v5=
v6=要获得所需的输出,可以尝试删除\";和"\"字符串。
_infile_=tranwrd(_infile_,'\";',' ');
_infile_=tranwrd(_infile_,'"\"','');这确实能让你读到你想读的。
085409878X; Pie-powder being dust from the law courts;John Alderson Foote";"1973";"EP Publishing"
v1=085409878X
v2=Pie-powder being dust from the law courts
v3=John Alderson Foote"
v4=1973
v5=EP Publishing
v6=不确定这是否会推广到其他行的额外引号或额外分号。
发布于 2017-12-04 15:10:39
您必须稍微修改代码,才能将缺失的列放入字符串$150中。
data work.books;
infile "h:\desktop\test.csv" dlm=';' missover dsd firstobs=1;
input ISBN: $12.
Book_Title: $150.
Book_Author: $quote60.
Year_Of_Publication: $quote8.
Publisher: $quote60.;
run;然后,您必须使用从特殊字符中清除列“并使用以下宏函数:
%macro cleaningColumn(col);
compress(strip(&col),'\"',' ')
%mend cleaningColumn;可以将宏函数包含到proc语句中,如下所示:
proc sql;
create table want as
select
ISBN,
%cleaningColumn(Book_Title) as Book_Title,
Book_Author,
Year_Of_Publication,
Publisher
from books;
run;列Book_Title将如下所示:
Classical Mythology
Pie-powder致以敬意,
https://stackoverflow.com/questions/47635709
复制相似问题