我有一个捕获代码及其描述的数据,我们需要从描述中提取数量。如何用正则表达式提取数量,就像数跟G/KG/L/ML一样
df
code description
1 ABC CHILLIE POWDER 100G
2 DEF POWDER 200G
3 DIL PDWR POWDER 100G
4 RAIN HILL HERB SOU GREED 40G 2 1FRE
5 DEAR CHILLI 200G+COCO POWDER 330ML
6 DIL PDWR 10L POWDERresult_df
code description qty
1 ABC CHILLIE POWDER 100G 100G
2 DEF POWDER 200G 200G
3 DIL PDWR POWDER 100G 100G
4 RAIN HILL HERB SOU GREED 40G 2 1FRE 40G
5 DEAR CHILLI 200G+COCO POWDER 330ML 200G
6 DIL PDWR 10L POWDER 10L我在用
df.withColumn("qty", F.regex_extract(F.col("description"), "\dG", 1)发布于 2019-11-06 12:21:04
你可以用
df.withColumn("qty", F.regex_extract(F.col("description"), r"(\d+\s?(?:K?G|M?L))\b", 1)(\d+\s?(?:K?G|M?L))\b模式匹配
(\d+\s?(?:K?G|M?L)) -捕获组1:\d+ - 1+数字,\s? -1或0白空间(?:K?G|M?L) -要么是可选的K和G,要么是可选的M然后是L
\b -词边界。见regex演示。
https://stackoverflow.com/questions/58729748
复制相似问题