我有一个数据框架,其中包括一个包含一系列字符串的列。
books = pd.DataFrame([[1,'In Search of Lost Time'],[2,'Don Quixote'],[3,'Ulysses'],[4,'The Great Gatsby'],[5,'Moby Dick']], columns = ['Book ID', 'Title'])
Book ID Title
0 1 In Search of Lost Time
1 2 Don Quixote
2 3 Ulysses
3 4 The Great Gatsby
4 5 Moby Dick和一个有序的边界列表
boundaries = ['AAAAAAA','The Great Gatsby', 'zzzzzzzz']我想使用这些边界将数据帧中的值分类为字母类,类似于pd.cut()对数字数据的工作方式。我的愿望输出如下所示。
Book ID Title binning
0 1 In Search of Lost Time ['AAAAAAA','The Great Gatsby')
1 2 Don Quixote ['AAAAAAA','The Great Gatsby')
2 3 Ulysses ['The Great Gatsby','zzzzzzzz')
3 4 The Great Gatsby ['The Great Gatsby','zzzzzzzz')
4 5 Moby Dick ['AAAAAAA','The Great Gatsby')这个是可能的吗?
发布于 2019-05-16 16:54:24
searchsorted
boundaries = np.array(['The Great Gatsby'])
bins = np.array(['[A..The Great Gatsby)', '[The Great Gatsby..Z]'])
books.assign(binning=bins[boundaries.searchsorted(books.Title)])
Book ID Title binning
0 1 In Search of Lost Time [A..The Great Gatsby)
1 2 Don Quixote [A..The Great Gatsby)
2 3 Ulysses [The Great Gatsby..Z]
3 4 The Great Gatsby [A..The Great Gatsby)
4 5 Moby Dick [A..The Great Gatsby)将其扩展到其他一些边界:
from string import ascii_uppercase as letters
boundaries = np.array([*string.ascii_uppercase[1:-1]])
bins = np.array([f'[{a}..{b})' for a, b in zip(letters, letters[1:])])
books.assign(binning=bins[boundaries.searchsorted(books.Title)])
Book ID Title binning
0 1 In Search of Lost Time [I..J)
1 2 Don Quixote [D..E)
2 3 Ulysses [U..V)
3 4 The Great Gatsby [T..U)
4 5 Moby Dick [M..N)https://stackoverflow.com/questions/56173204
复制相似问题