我有一个文字记录的.txt文件,如下所示
MICHEAL: blablablabla.
further talk by Michael.
more talk by Michael.
VALERIE: blublublublu.
Valerie talks more.
MICHAEL: blibliblibli.
Michael talks again.
........总之,这一模式将持续到4000行,而不仅仅是两位发言者,而是多达七位不同的发言者,所有的名字都是用大写字母书写的(如上面的例子)。对于某些文本挖掘,我需要按照以下方式重新排列这个.txt文件
我知道一些基本的vim命令,但还不足以解决这个问题。尤其是第一个。我不知道我能在vim中实现哪种模式,以便它只连接每一个发言者的线条。
任何帮助都会得到极大的帮助!
发布于 2016-01-23 13:54:39
好吧,首先答案是:
:g/^\u\+:/,/\n\u\+:\|\%$/join现在的解释是:
E 117E 218毫不奇怪:因此,把它放在一起:对于每一个发言者,加入到行之前的下一个发言者或文件的结尾。
我现在最接近的排序是
:排序/\u+:/ r
它只会按说话人的名字排序,并反转另一行,所以这并不是你想要的。
发布于 2016-01-23 13:41:42
嗯,我对vim不太了解,但我正准备匹配对应的特定扬声器的线条,这是regex。
Regex: /([A-Z]+:)([A-Za-z\s\.]+)(?!\1)$/gm
解释:
([A-Z]+:)捕获扬声器的名称,其中只包含大写字母。
([A-Za-z\s\.]+)捕捉到了对话。
(?!\1)$反向引用议长的名字,并比较下一位发言者是否与上一位发言者相同。如果没有,则匹配,直到找到新的扬声器。
我希望这至少对你的匹配有所帮助。
发布于 2016-01-23 16:25:02
下面是解决您的问题的脚本解决方案。
它没有经过很好的测试,所以我添加了一些注释,这样您就可以轻松地修复它。
要使它运转,只需:
g:speakers变量中填充所需的大写名称;:sav /tmp/script.vim|so %);:call JoinAllSpeakLines()连接线路;:call SortSpeakLines()进行排序您可以调整不同的模式以更好地满足您的需要,例如添加一些空间容限(\u\{2,}\s*\ze:)。
以下是代码:
" Fill the following array with all the speakers names:
let g:speakers = [ 'MICHAEL', 'VALERIE', 'MATHIEU' ]
call sort(g:speakers)
function! JoinAllSpeakLines()
" In the whole file, join all the lines between two uppercase speaker names
" followed by ':', first inclusive:
silent g/\u\{2,}:/call JoinSpeakLines__()
endf
function! SortSpeakLines()
" Sort the whole file by speaker, keeping the order for
" each speaker.
" Must be called after JoinAllSpeakLines().
" Create a new dict, with one key for each speaker:
let speakerlines = {}
for speaker in g:speakers
let speakerlines[speaker] = []
endfor
" For each line in the file:
for line in getline(1,'$')
let speaker = GetSpeaker__(line)
if speaker == ''
continue
endif
" Add the line to the right speaker:
call add(speakerlines[speaker], line)
endfor
" Delete everything in the current buffer:
normal gg"_dG
" Add the sorted lines, speaker by speaker:
for speaker in g:speakers
call append(line('$'), speakerlines[speaker])
endfor
" Delete the first (empty) line in the buffer:
normal gg"_dd
endf
function! GetOtherSpeakerPattern__(speaker)
" Returns a pattern which matches all speaker names, except the
" one given as a parameter.
" Create an new list with a:speaker removed:
let others = copy(g:speakers)
let idx = index(others, a:speaker)
if idx != -1
call remove(others, idx)
endif
" Create and return the pattern list, which looks like
" this : "\v<MICHAEL>|<VALERIE>..."
call map(others, 'printf("<%s>:",v:val)')
return '\v' . join(others, '|')
endf
function! GetSpeaker__(line)
" Returns the uppercase name followed by a ':' in a line
return matchstr(a:line, '\u\{2,}\ze:')
endf
function! JoinSpeakLines__()
" When cursor is on a line with an uppercase name, join all the
" following lines until another uppercase name.
let speaker = GetSpeaker__(getline('.'))
if speaker == ''
return
endif
normal V
" Search for other names after the cursor line:
let srch = search(GetOtherSpeakerPattern__(speaker), 'W')
echo srch
if srch == 0
" For the last one only:
normal GJ
else
normal kJ
endif
endfhttps://stackoverflow.com/questions/34963609
复制相似问题