我正在清理我的数据,以获得一对文本,以便从X语言转换到Y语言进行机器翻译。
[['\ufeffMensahe di Pasco di Gobernador di Aruba 2019',
'Governor’s Christmas speech 2019'],
['Gobernador di Aruba Sr. Alfonso Boekhoudt a duna su mensahe di Pasco riba 24 december ultimo',
'On Christams eve, December 24, the Governor of Aruba Mr. Alfonso Boekhoudt gave his traditional Christmas speech'],
['Por a wak e discurso di Pasco di Gobernador via e canalnan di television local',
"The governor's Christmas speech was shown at the local television stations"],......上面是以下代码中的数据:
def clean_pairs(lines):
cleaned = list()
for pair in lines:
clean_pair = list()
for line in pair:
# normalize unicode characters
line = normalize('NFD', line).encode('ascii', 'ignore')
line = line.decode('UTF-8')
# tokenize on white space
line = line.split()
.
.
.
.
clean_pair.append(' '.join(line))
cleaned.append(clean_pair)
for i in range(10):
print('[%s]->[%s]' % (cleaned[i,0], cleaned[i,1]))我应该得到的输出如下:
[hi]->[hallo]
[hi]->[gru gott]
[run]->[lauf]
[wow]->[potzdonner]
[wow]->[donnerwetter]但是,我得到以下错误:
IndexError
追踪(最近一次调用)在4950为I在范围(10):-- 51打印(%s-‘>%s’% (clean_pairsi,0,clean_pairsi,1))
IndexError:数组的索引太多了:数组是一维的,但是有2个索引是索引的。
有人能帮我解决什么问题吗?谢谢!
发布于 2020-12-06 16:02:36
您的结构是一个列表列表。在Python中,对它们的索引如下:
clean[i][0] # not like clean[i,0]https://stackoverflow.com/questions/65170091
复制相似问题