我正在整理以下列表:游戏体裁列表
我想分开连接的单词,但似乎它们不会正确地使用大写字母表示缩略词(例如PVP、MMORPG、MOBA、DeFi)。
目前,我的regex代码如下:
Re.sub(r“(\w)(A)”,r“1 \2",ele)用于genre_list中的ele
正如您在下面看到的,它有时起作用,有时不起作用:
“收藏品开放-世界虚拟世界”、“繁殖卡PV P”、“汽车-战斗者育种策略”、“小型游戏开放-世界虚拟-世界”、“行动模拟体育”、“冒险MM OStrategy”、“冒险偶发难题”、“体育”、“收藏Sci-虚拟世界”、“战斗-Royalee体育运动MO”、“Action PV PShooter”、“P VP Sci-Tower-Fi-Defense”、“Action Card Royale”、“P VP Sci-Fi -Fi、‘育种收藏品挖掘’、“收藏体育”、“行动冒险射击”、“城市-建立收藏品仿真”、“行动战略”、“冒险开放-世界”、“培育竞赛运动”、“开放-世界虚拟世界”、“收藏品Idle”、“行动冒险”、“卡片收藏PV P”、“战斗-Royale Fantasy MO BA”、“城市-建筑”、“建筑MM OStrategy”、“冒险MM或PG”、“行动冒险Idle”、“M OB AR PG策略”、“M MO RP GStrategy”、“卡片收藏闲置”、“开放世界PV PR PG”、“De OSpace”、“收藏”、“卡片收藏PV P”、“Auto De Fi RP G”、“冒险MM OOpen-World”、“收藏开放-世界虚拟世界”、“收藏Idle RP G”、“卡片收藏PV P”、“动作冒险PV P”、“Sci- Fi Shooter生存”、“行动策略”、“Arcade迷你游戏”、“育种PV PV”、“MM”、‘动作体育’,'P VP空间转弯‘,'M MO战略塔-防御’
你能帮我看看哪个雷吉在这方面做得最好吗?还是regex对这个列表不起作用?谢谢!
发布于 2022-04-26 13:34:05
这是困难的,因为你有ALLCAPS的词,可能会被粘合。如果你有这样一个清单,它是可解的。
下面是您可以使用和增强的代码,以获得更好的输出精度:
import re
l = ['Collectible Open-World Virtual-World', 'Breeding Card PV P', 'Auto-Battler Breeding Strategy', 'Minigame Open-World Virtual-World', 'Action Simulation Sports', 'Adventure MM OStrategy', 'Adventure Casual Puzzle', 'Sports', 'Collectible Sci-Fi Virtual-World', 'Battle-Royalee Sports MO BA', 'Action PV PShooter', 'P VP Sci-Fi Tower-Defense', 'Action Battle-Royale', 'P VP Sci-Fi Shooter', 'Breeding Collectible Mining', 'Collectible De Fie Sports', 'Action Adventure Shooter', 'City-Building Collectible Simulation', 'Action Strategy', 'Adventure Open-World', 'Breeding Racing Sports', 'Open-World Virtual-World', 'Collectible Idle', 'Action Adventure', 'Card Collectible PV P', 'Battle-Royale Fantasy MO BA', 'City-Building', 'Building MM OStrategy', 'Adventure MM OR PG', 'Action Adventure Idle', 'M OB AR PG Strategy', 'M MO RP GStrategy', 'Card Collectible Idle', 'Open-World PV PR PG', 'De Fi MM OSpace', 'Collectible', 'Card Collectible PV P', 'Auto-Battler De Fi RP G', 'Adventure MM OOpen-World', 'Collectible Open-World Virtual-World', 'Collectible Idle RP G', 'Card Collectible PV P', 'Action Adventure PV P', 'Sci-Fi Shooter Survival', 'Action Strategy', 'Arcade Minigame', 'Breeding PV PRacing', 'M OB AP VP', 'Action Sports', 'P VP Space Turn-based', 'M MO Strategy Tower-Defense']
l = [''.join(s.split()) for s in l]
allcaps = ['RPG', 'MOBA', 'PVP', 'MMO']
rx_1 = re.compile(r'[a-z](?=[A-Z])|[A-Z](?=[A-Z][a-z])')
rx_2 = re.compile( fr"\b(?:{r'|'.join(allcaps)})(?=[A-Za-z])" )
rx_3 = re.compile( fr"(?<=[A-Za-z])(?:{r'|'.join(allcaps)})\b" )
for s in l:
print( r'{} => {}'.format(s, rx_3.sub(r" \g<0>", rx_2.sub(r"\g<0> ", rx_1.sub(r"\g<0> ", s)))) )见Python演示。输出:
CollectibleOpen-WorldVirtual-World => Collectible Open-World Virtual-World
BreedingCardPVP => Breeding Card PVP
Auto-BattlerBreedingStrategy => Auto-Battler Breeding Strategy
MinigameOpen-WorldVirtual-World => Minigame Open-World Virtual-World
ActionSimulationSports => Action Simulation Sports
AdventureMMOStrategy => Adventure MMO Strategy
AdventureCasualPuzzle => Adventure Casual Puzzle
Sports => Sports
CollectibleSci-FiVirtual-World => Collectible Sci-Fi Virtual-World
Battle-RoyaleeSportsMOBA => Battle-Royalee Sports MOBA
ActionPVPShooter => Action PVP Shooter
PVPSci-FiTower-Defense => PVP Sci-Fi Tower-Defense
ActionBattle-Royale => Action Battle-Royale
PVPSci-FiShooter => PVP Sci-Fi Shooter
BreedingCollectibleMining => Breeding Collectible Mining
CollectibleDeFieSports => Collectible De Fie Sports
ActionAdventureShooter => Action Adventure Shooter
City-BuildingCollectibleSimulation => City-Building Collectible Simulation
ActionStrategy => Action Strategy
AdventureOpen-World => Adventure Open-World
BreedingRacingSports => Breeding Racing Sports
Open-WorldVirtual-World => Open-World Virtual-World
CollectibleIdle => Collectible Idle
ActionAdventure => Action Adventure
CardCollectiblePVP => Card Collectible PVP
Battle-RoyaleFantasyMOBA => Battle-Royale Fantasy MOBA
City-Building => City-Building
BuildingMMOStrategy => Building MMO Strategy
AdventureMMORPG => Adventure MMO RPG
ActionAdventureIdle => Action Adventure Idle
MOBARPGStrategy => MOBA RPG Strategy
MMORPGStrategy => MMO RPG Strategy
CardCollectibleIdle => Card Collectible Idle
Open-WorldPVPRPG => Open-World PVP RPG
DeFiMMOSpace => De Fi MMO Space
Collectible => Collectible
CardCollectiblePVP => Card Collectible PVP
Auto-BattlerDeFiRPG => Auto-Battler De Fi RPG
AdventureMMOOpen-World => Adventure MMO Open-World
CollectibleOpen-WorldVirtual-World => Collectible Open-World Virtual-World
CollectibleIdleRPG => Collectible Idle RPG
CardCollectiblePVP => Card Collectible PVP
ActionAdventurePVP => Action Adventure PVP
Sci-FiShooterSurvival => Sci-Fi Shooter Survival
ActionStrategy => Action Strategy
ArcadeMinigame => Arcade Minigame
BreedingPVPRacing => Breeding PVP Racing
MOBAPVP => MOBA PVP
ActionSports => Action Sports
PVPSpaceTurn-based => PVP Space Turn-based
MMOStrategyTower-Defense => MMO Strategy Tower-Defense[a-z](?=[A-Z])|[A-Z](?=[A-Z][a-z])正则表达式(参见其演示)匹配
[a-z](?=[A-Z]) -一个小写字母,紧跟大写字母| -或[A-Z](?=[A-Z][a-z]) -大写字母后面跟着大写字母和小写字母。我们在这些比赛之后加了一个空格。
rx_2和rx_3正则表达式是从ALLCAPS单词列表中构建的,并在左侧或右侧添加一个空格,这取决于另一个字母出现的侧边。
发布于 2022-04-25 06:48:19
基于注释的编辑:您只需要在A,即r"(\w)([A-Z]+)"之后添加一个'+‘。这将匹配一个或多个大写字母。
https://stackoverflow.com/questions/71995355
复制相似问题