我有以下几个数据帧,每列有两列,行数完全相同。如何将它们连接起来,以便从两个数据框架中得到一个具有两列和所有行的单一数据框架?
例如:
DataFrame-1
+--------------+-------------+
| colS | label |
+--------------+-------------+
| sample_0_URI | 0 |
| sample_0_URI | 0 |
+--------------+-------------+数据帧-2
+--------------+-------------+
| colS | label |
+--------------+-------------+
| sample_1_URI | 1 |
| sample_1_URI | 1 |
+--------------+-------------+DataFrame-3
+--------------+-------------+
| col1 | label |
+--------------+-------------+
| sample_2_URI | 2 |
| sample_2_URI | 2 |
+--------------+-------------+数据帧-4
+--------------+-------------+
| col1 | label |
+--------------+-------------+
| sample_3_URI | 3 |
| sample_3_URI | 3 |
+--------------+-------------+..。
我希望加入的结果是:
+--------------+-------------+
| col1 | label |
+--------------+-------------+
| sample_0_URI | 0 |
| sample_0_URI | 0 |
| sample_1_URI | 1 |
| sample_1_URI | 1 |
| sample_2_URI | 2 |
| sample_2_URI | 2 |
| sample_3_URI | 3 |
| sample_3_URI | 3 |
+--------------+-------------+现在,如果我想对label列执行一次热编码,应该是这样的:
oe = OneHotEncoder(inputCol="label",outputCol="one_hot_label")
df = oe.transform(df) # df is the joined dataframes <cols, label>发布于 2019-06-12 06:53:09
你在找union。
在本例中,我要做的是将数据存储在list中并使用reduce。
from functools import reduce
dataframes = [df_1, df_2, df_3, df_4]
result = reduce(lambda first, second: first.union(second), dataframes)https://stackoverflow.com/questions/56556177
复制相似问题