文章/答案/技术大牛

发布

社区首页 >问答首页 >Haskell删除对do中列表的引用(IO)

问Haskell删除对do中列表的引用(IO)
EN

Stack Overflow用户

提问于 2014-02-03 01:39:23

回答 1查看 126关注 0票数 1

我是Haskell的新手(也是FP和lazy-evaluation的新手)。我正在尝试编写一个日志分析器，但目前它分配了4G的内存，因此即使对于小到90M的日志也会崩溃。

我剥离了程序，只收集经常引用页面的一部分。此外，我将它们存储在一个三元trie中(因为大多数URL共享公共前缀)，所以它们不应该占用那么多内存。

因此，我期望程序只需要几MB的内存，只存储这些引用，而不是那么多。

我认为罪魁祸首是下面主文件中的readStats函数：

-- main.hs
import Record
import Output
import Stats

import System.Environment    
import Data.List
import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as C8

readStats :: String -> IO Stats
readStats p = do
    f <- B.readFile p
    return $ foldl' 
               (\t l -> applyEither t (parseLogLine l)) 
               emptyStats
               (C8.lines f)
    where applyEither t (Right rec) = applyRecord t rec 
          applyEither t (Left err)  = applyError t err 

main :: IO ()
main = do
    args <- getArgs
    stats <- readStats $ head args 
    putStrLn $ page stats

我在想，因为我将结果从B.readFile赋值给f，所以整个文件以[Char]的形式存储在内存中，我想这会因为指针而占用更多的内存。

如何使GC在解析parseLogLine中所需的代码行后立即从f进行收集

另外，我非常感谢所有关于结构/编码风格的建议，因为我是Haskell的新手。

谢谢。

编辑:以下是其他函数/结构：

Trie：

data Trie a = Node Char (Trie a) (Trie a) (Trie a) (Maybe a)
              | Empty deriving (Show, Eq) 

sanify :: Trie a -> Trie a
sanify (Node _ Empty Empty Empty Nothing) = Empty
sanify (Node _ Empty lo    Empty Nothing) = lo
sanify (Node _ Empty Empty hi    Nothing) = hi
sanify t = t 

update :: Trie a -> String -> (Maybe a -> Maybe a) -> Trie a
update _ [] _ = error "Can not insert an empty string to a Trie"
update Empty (x:[]) f = sanify $ Node x Empty Empty Empty (f Nothing) 
update Empty (x:xs) f = sanify $ Node x (update Empty xs f) Empty Empty Nothing
update (Node c eq lo hi val) xss@(x:xs) f = 
    case x `compare` c of
        LT -> sanify $ Node c eq (update lo xss f) hi val 
        GT -> sanify $ Node c eq lo (update hi xss f) val 
        EQ -> case xs of
                [] -> sanify $ Node c eq lo hi (f val)
                _  -> sanify $ Node c (update eq xs f) lo hi val

记录：

import Network.URL

data Record = Record {
    ip :: IP, 
    date :: UTCTime,
    method :: Method,
    path :: URL,
    referer :: Maybe URL,
    status :: Integer,
    userAgent :: String
} deriving (Show, Eq) 

parseRecord :: Parser Record
parseRecord = do
    ip <- parseIP
    P8.skipWhile (/= '[')
    date <- parseDate
    P.string (B8.pack " \"")
    method <- P8.takeWhile (/= ' ')
    .....


data LogError = LogError {msg :: String, line :: B8.ByteString}
parseLogLine :: B8.ByteString -> Either LogError Record
parseLogLine line = case parseOnly parseRecord line of
                        Right a -> Right a
                        Left msg -> Left $ LogError msg line

统计数据：

type StringCounter = T.Trie Int 
increment :: StringCounter -> String -> StringCounter
increment t s = T.update t s incNode
                where incNode n = case n of  
                                    Nothing -> Just 1
                                    Just i -> Just (i+1)

sortCounter :: StringCounter -> [(String, Int)]
sortCounter = sortWith (negate.snd) . T.toList

data Stats = Stats {
    paths :: StringCounter,
    referers :: StringCounter,
    errors :: [LogError]
}

emptyStats :: Stats
emptyStats = Stats T.Empty T.Empty []

buildStats :: [Record] -> Stats
buildStats = foldl' applyRecord emptyStats 

applyRecord :: Stats -> Record -> Stats
applyRecord env rec = env {
    paths = increment (paths env) (exportURL $ path rec),
    referers = case referer rec of
                 Nothing -> referers env 
                 Just ref -> increment (referers env) (exportURL $ stripParams ref)
    }   

applyError :: Stats -> LogError -> Stats
applyError env err = env { errors = err : errors env }

haskell

garbage-collection

lazy-evaluation

回答 1

Stack Overflow用户

发布于 2014-02-03 03:53:03

我并没有真正看过你的代码，但是有一个通用的建议:使用管道，Luke。对于处理数据流- like日志流-它们真的很棒。最重要的是，它们使您能够在O(1)空间中运行。不要搞懒IO，比如readFile；它是为一次性代码准备的。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/21513869

复制

相似问题

问Haskell删除对do中列表的引用(IO)
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Haskell删除对do中列表的引用(IO)EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Haskell删除对do中列表的引用(IO)
EN