问在Python模块中，csv.Sniffer().has_header()需要多少行才能准确？
EN

Stack Overflow用户

提问于 2013-12-06 23:33:53

回答 1查看 1.8K关注 0票数 1

Python的CSV模块有一个非常方便的 method。

我不知道它需要多少行才能准确地确定文件是否有一个头。

它一般在两行或三行的CSV上工作，还是需要更多的5-10行才能准确？

关于上下文，这是我的函数--你可以看到，我有一个检查，上面写着“如果文件少于X行，不要嗅探头”，目前我的X设置为3，不确定是否需要更高的值，甚至不能将其设置为2。

import csv

# input_file_has_header can be True, False, or 'Auto' if unsure. 
# input_file_has_header must be specified when file has less than 3 rows 
# because CSV's with two rows sometimes have a header and sometimes don't
# and I don't understand the magic underlying the csv.Sniffer().has_header() method

def csv_to_object_dict(input_csv, input_file_has_header='Auto', object_id_column=0, header_keys=[]):
    with open(input_csv,'rU') as object_file:
        object_reader = csv.reader(object_file)
        if input_file_has_header == 'Auto':
             while row_count < 5:
                for row in object_reader:
                    row_count += 1
        if input_file_has_header == True or (input_file_has_header == 'Auto' and csv.Sniffer().has_header(object_file.read(2048)) == True and row_count > 3): 
            next(object_reader, header_keys) #not sure this is correct
            print 'printing header keys ', header_keys # debug            
            assert header_keys != [], "File %s appears to have a header row, but there was a problem parsing it because header_keys remains empty" % input_csv      
        for row in object_reader:
            print 'printing new row ', row #debug
            if object_id_column not in object_dict:
                    object_dict[object_id_column] = {}
            for key in header_keys:            
                object_dict[object_id_column][key]= #value in the row that matches the key

python-2.7

csv

回答 1

Stack Overflow用户

回答已采纳

发布于 2013-12-06 23:42:15

当有疑问时，深入到源头：

def has_header(self, sample):
    # Creates a dictionary of types of data in each column. If any
    # column is of a single type (say, integers), *except* for the first
    # row, then the first row is presumed to be labels. If the type
    # can't be determined, it is assumed to be a string in which case
    # the length of the string is the determining factor: if all of the
    # rows except for the first are the same length, it's a header.
    # Finally, a 'vote' is taken at the end for each column, adding or
    # subtracting from the likelihood of the first row being a header.

快速浏览该方法表明，它并不试图强制执行最小数量的非标头行；因此，根据上述规则，它将工作在只有两行的文件上。

票数 3

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/20435474

复制

相似问题

问在Python模块中，csv.Sniffer().has_header()需要多少行才能准确？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在Python模块中，csv.Sniffer().has_header()需要多少行才能准确？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在Python模块中，csv.Sniffer().has_header()需要多少行才能准确？
EN