文章/答案/技术大牛

发布

社区首页 >问答首页 >在dataframe创建的列表列表中切换到多行索引

问在dataframe创建的列表列表中切换到多行索引
EN

Stack Overflow用户

提问于 2020-08-26 23:55:03

回答 1查看 36关注 0票数 0

我有一个从列表列表创建DataFrame的函数：

def logs_reader():
    path = Path("C:\\Users\\" + getpass.getuser() + "\\DCBviz\\logs\\")

cols1 = ['Station ID', 'Reciever type', 'Satellite system', 'Date installed', 'Date removed']
cols2 = ['Station ID', 'Antenna type', 'Cable length', 'Date installed', 'Date removed']

file_list = [f for f in path.glob('**/*.log') if f.is_file()]
receivers_data = []
antennas_data = []
for file in file_list:
    with open(file, encoding='utf8') as f:
        contents = f.read()
        station_id = re.findall("Four Character ID\s*:\s*(.*?)\s*$", contents, re.MULTILINE)
        
        receiver_types = re.findall("Receiver Type\s*:\s*(.*?)\s*$", contents, re.MULTILINE)
        satellite_sys = re.findall("Satellite System\s*:\s*(.*?)\s*$", contents, re.MULTILINE)
        date_installed = re.findall("Date Installed\s*:\s*(.*?)T.*$", contents, re.MULTILINE)
        date_removed = re.findall("Date Removed\s*:\s*(.*?)T.*$", contents, re.MULTILINE)
        
        antenna_types = re.findall("Antenna Type\s*:\s*(.*?)\s.*$", contents, re.MULTILINE)
        cable_lengths = re.findall("Antenna Cable Length\s*:\s*([0-9]+\.*[0-9]*)\s.*$", contents, re.MULTILINE)
        antenna_date_installed = re.findall("Date Installed\s*:\s*(.*?)T.*$", contents, re.MULTILINE)
        antenna_date_removed = re.findall("Date Removed\s*:\s*(.*?)T.*$", contents, re.MULTILINE)
        
        receivers_data.append([station_id, receiver_types, satellite_sys, date_installed, date_removed])
        antennas_data.append([station_id, antenna_types, cable_lengths, antenna_date_installed, antenna_date_removed])
        
        d = []
        
        for l in receivers_data:
            d.append({'Station ID': l[0]*len(l[1]), 
                  'Reciever type': l[1], 
                  'Satellite system': l[2], 
                  'Date installed': l[3][0:len(l[1])],
                  'Date removed': l[4][0:len(l[1])]})
        df = pd.DataFrame(d)   
return df

df = logs_reader()

作为回报，我得到了数据帧，它看起来像这样：

我想从第2-6列中拆分列表，并使用Station ID作为多行索引来创建纯字符串的单个条目。我该怎么做呢？

所需输出：

python

pandas

list

回答 1

Stack Overflow用户

发布于 2020-08-27 02:05:19

因此，您将正则表达式的数据放在列表中

receiver_types 
satellite_sys 
date_installed
date_removed
    
antenna_types
cable_lengths
antenna_date_installed
antenna_date_removed

现在我假设每个文件都对应一个列表，station_id = re.findall(...行仍然会返回一个列表。

然后，您将拥有一个station_id，它是一个长度为1的列表和一堆其他列表。如果所有接收者列表的长度都相同，您可以创建一个df并使用下面的代码在receivers_data中收集它(请再次复制，因为我删除了station_id周围的括号)。然后对antennae_data执行相同的操作。

请注意，您当前的代码在每次迭代中都丢弃了df，并且只从上次读取的文件中返回数据。

正如我在注释中提到的，如果同一行中的所有列表都具有相同的长度，那么最好的选择是从每个文件创建一个df，并在循环之后将它们连接起来

您可以替换该行

receivers_data.append([station_id, receiver_types, satellite_sys, date_installed, date_removed])

使用

receivers_data.append(
    pd.DataFrame(
        [station_id * len(receiver_types), receiver_types, satellite_sys, date_installed, date_removed],
        columns=list_of_column_names
    )
)
# or instead of a list use a dict with file_name as keys

读取完所有文件后，可以使用以下命令连接这两个列表

df_receivers = pd.concat(receivers_data)
df_antennae = pd.concat(antennae_data)

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/63601309

复制

相似问题

问在dataframe创建的列表列表中切换到多行索引
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在dataframe创建的列表列表中切换到多行索引EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在dataframe创建的列表列表中切换到多行索引
EN