首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >从字符串模板转换不规则数据集的问题

从字符串模板转换不规则数据集的问题
EN

Stack Overflow用户
提问于 2017-10-13 01:00:55
回答 1查看 385关注 0票数 1

我有一个数据集,我正试图将其规范化为一个PsCustomObject。我一直在尝试使用ConvertFrom-String的机器学习模板特性,并取得了部分成功。一个问题是,我可以找到的所有示例都具有相同结构的数据集。我的也不一样。

我确信一个wiz可以从原始数据中直接做到这一点,但我在某种程度上操纵了它以达到我所处的位置。

原始样本数据:

代码语言:javascript
复制
IDE00001-ENG99061-Production mode-Access control
IDE00001-ENG115730-Production mode-Aussenbeleuchtung
IDE00001-ENG112304-Production mode-Heckwischer
IDE00001-ENG98647-Production mode-Interior lighting
IDE00001-ENG115729-Production mode-Scheinwerferreinigung
IDE00001-ENG115731-Production mode-Virtuel_pedal
IDE00002-Transport mode
IDE00820-Activating and deactivating all development messages
IDE01550-Service position
IDE02152-Characteristics in production mode
IDE02269-MAS04382-Acknowledgement signals-Optical feedback during locking
IDE02332-Deactivate production mode
IDE02488-DWA Interior monitoring
IDE02711-ENG116690-Rear Window Wiper-Automatisches Heckwischen

使用以下脚本:

代码语言:javascript
复制
$lines = $testText.Split("`n") #$testText is the above data wrapped in a here-string
$NewLines = @()
foreach($line in $lines)
{
    [regex]$regex = '-'
    $HyphenCount = $regex.Matches($line).count
    #$HyphenCount
    switch ($HyphenCount)
    {
        1{
            $newLines += $line -replace "-",","
         }
        2{
            $split = $line.Split("-",2)
            $newlines += $split -join ","
         }
        3{
            if($line.Contains("mode-"))
            {
                #$line
                $split = $line.Split("-",4)
                $newlines += $split -join ","
            }
            else
            {
                $split = $line.Split("-",3)
                $newlines += $split -join ","
            }
         }        
        4{
           $split = $line.Split("-",3) #this assumes the fourth hyphen is part of description
           $newlines += $split -join ","
         }
        5{
           $split = $line.Split("-",4) 
           $newlines += $split -join ","
         }
    }
}

被操纵的数据集:

我得到的原始数据如下:

代码语言:javascript
复制
IDE00001,ENG99061,Production mode,Access control
IDE00001,ENG115730,Production mode,Aussenbeleuchtung
IDE00001,ENG112304,Production mode,Heckwischer
IDE00001,ENG98647,Production mode,Interior lighting
IDE00001,ENG115729,Production mode,Scheinwerferreinigung
IDE00001,ENG115731,Production mode,Virtuel_pedal
IDE00002,Transport mode
IDE00820,Activating and deactivating all development messages
IDE01550,Service position
IDE02152,Characteristics in production mode
IDE02269,MAS04382,Acknowledgement signals-Optical feedback during locking
IDE02332,Deactivate production mode
IDE02488,DWA Interior monitoring
IDE02711,ENG116690,Rear Window Wiper-Automatisches Heckwischen
IDE99999,Test-two hyphens
IDE99999,ENG123456,Test-four-Hyphens
IDE99999,ENG123456,Production mode,test-five-hyphens

通过下面的模板传递上面的数据使我非常接近我所需要的内容,但它仍然存在一些问题:

代码语言:javascript
复制
$template = @'
{object*:{ide:IDE00001},{code?:ENG99061},{mode?:Production mode},{description?:Access control}}
{object*:{ide:IDE00001},{code?:ENG115730},{mode?:Dev mode},{description?:Aussenbeleuchtung}}
{object*:{ide:IDE00001},{code?:ENG115731},{mode?:Production mode},{description?:Virtuel_pedal}}
{object*:{ide:IDE02711},{code?:ENG116690},{description?:Rear Window Wiper-Automatisches Heckwischen}}
{object*:{ide:IDE00820},{description?:{!mode?:{!code?:Activating and deactivating all development messages}}}}
{object*:{ide:IDE01550},{description?:{!mode?:{!code?:Service position}}}}
{object*:{ide:IDE02488},{description?:{!mode?:{!code?:DWA Interior monitoring}}}}
{object*:{ide:IDE00002},{mode?:Transport mode}}
'@

$testText | ConvertFrom-String -TemplateContent $template -OutVariable out | Out-Null
$out.object

迄今取得的成果:

结果如下:

代码语言:javascript
复制
ide      code      mode            description                                            
---      ----      ----            -----------                                            
IDE00001 ENG99061  Production mode Access control                                         
IDE00001 ENG115730 Production mode Aussenbeleuchtung                                      
IDE00001 ENG112304 Production mode Heckwischer                                            
IDE00001 ENG98647  Production mode Interior lighting                                      
IDE00001 ENG115729 Production mode Scheinwerferreinigung                                  
IDE00001 ENG115731 Production mode Virtuel_pedal                                          
IDE00002           Transport mode  Transport mode                                         
IDE00820                           Activating and deactivating all development messages   
IDE01550                           Service position                                       
IDE02152           production mode Characteristics in production mode                     
IDE02269 MAS04382                  Acknowledgement signals-Optical feedback during locking
IDE02332           production mode Deactivate production mode                             
IDE02488                           DWA Interior monitoring                                
IDE02711 ENG116690                 Rear Window Wiper-Automatisches Heckwischen            
IDE99999                           Test-two hyphens                                       
IDE99999 ENG123456                 Test-four-Hyphens    

问题领域:

代码语言:javascript
复制
IDE00002           Transport mode  Transport mode

IDE02152           production mode Characteristics in production mode

IDE02332           production mode Deactivate production mode 
  1. Transport mode不应该在description列中。
  2. production mode不应该在mode列中。它不知何故从description上学到了这一点。

我就是想不出来。因此,如果有人有任何想法..。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-10-13 08:10:15

作为另一种选择,如果输入数据足够系统化,则可以使用正则表达式解析它:

代码语言:javascript
复制
$inputText = @"
IDE00001-ENG99061-Production mode-Access control
IDE00001-ENG115730-Production mode-Aussenbeleuchtung
IDE00001-ENG112304-Production mode-Heckwischer
IDE00001-ENG98647-Production mode-Interior lighting
IDE00001-ENG115729-Production mode-Scheinwerferreinigung
IDE00001-ENG115731-Production mode-Virtuel_pedal
IDE00002-Transport mode
IDE00820-Activating and deactivating all development messages
IDE01550-Service position
IDE02152-Characteristics in production mode
IDE02269-MAS04382-Acknowledgement signals-Optical feedback during locking
IDE02332-Deactivate production mode
IDE02488-DWA Interior monitoring
IDE02711-ENG116690-Rear Window Wiper-Automatisches Heckwischen
"@ -split "`n"

$pattern = '^((?<ide>[IDE0-9]+)-)((?<code>[A-Z0-9]+)-)?((?<mode>Production mode|Transport mode)-?)?(?<description>.*?)$'

foreach ($line in $inputText)
{
    $isMatch = $line -match $pattern
    if (-not $isMatch)
    {
        Write-Warning "Cannot parse expression: $line"
        continue
    }

    New-Object psobject -Property ([ordered]@{
        'Ide' = $Matches.ide
        'Code' = $Matches.code
        'Mode' = $Matches.mode
        'Description' = $Matches.description
    })
}

你说过你的数据不是以同样的方式构造的。也许您的正则表达式需要比上面给出的要复杂得多。或者使用不同的正则表达式多次运行解析,如果可以识别可能发生的所有不同的结构。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/46721012

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档