从R6RS计划过渡到R7RS是非常有教育意义和乐趣的。赤壁计划是我最喜欢的,因为它密切符合标准。
在启动一个新项目时,我希望重用我在Chez (R6RS)中开发的Trie数据过程库。一开始,这种转变是无痛的。然而,当开始在一些单元测试上工作时,性能并不好。在一本大型(约172,000本)单词词典中阅读并对它们进行计数的测试花费了数百倍的时间。这基本上是即时在Chez,增加了一个几乎不明显的延迟开始的测试。
在方便的R7RS发行版中运行,执行时间要短得多。
下面是一个小例子。这是一个大大简化的原始库版本。它加载一本词典,并对字典中的单词进行计数。(用于构建字典的文件来自:https://www.wordgamedictionary.com/enable/)
;;;
;;; trie-problem.scm -- illustrate performance problem with trie library.
;;;
(import (scheme base)
(scheme file)
(scheme write)
;; Depending on Scheme implementation used, you may have to
;; un-comment one of the following.
;;(chibi time)
;;(gauche time)
)
(define alphabet-size 256)
;; In a trie, each node consists of an "end-of-word" marker and
;; and a vector of pointers to child nodes.
(define-record-type trie
(mk-trie c end-of-word? children)
trie?
(c trie-c trie-c-set!)
(end-of-word? trie-end-of-word? trie-end-of-word-set!)
(children trie-children trie-children-set!))
;; Return a new, initialized trie node.
(define (trie-new-node)
(let ((node (mk-trie #\null #f (make-vector alphabet-size '()))))
node))
;; Return the child node associated with the index into the child vector.
(define (child-for-index node idx)
(vector-ref (trie-children node) idx))
;; Return the child node associated with the character in the child vector.
(define (child-for-char node c)
(child-for-index node (char->integer c)))
;; Add a word to the trie node.
(define (trie-add-word! node word)
(let loop ((crawler node)
(lst (string->list word)))
(if (null? lst)
(trie-end-of-word-set! crawler #t)
(let* ((c (car lst))
(idx (char->integer c)))
(when (null? (child-for-index crawler idx))
(let ((new-node (trie-new-node)))
(trie-c-set! new-node c)
(vector-set! (trie-children crawler) idx new-node)))
(loop (child-for-index crawler idx) (cdr lst))))))
;; Convenience function to convert a vector index to the corresponding
;; letter and append it to the string.
(define (append-char-for-index s i)
(string-append s (string (integer->char i))))
;; Do an in-order traversal of the tree and, for each complete
;; word, invoke the callback function with the word.
(define (trie-process-helper node s callback)
(when (trie-end-of-word? node)
(callback s))
(let loop ((idx 0))
(when (< idx alphabet-size)
(let ((child (child-for-index node idx)))
(when (not (null? child))
(trie-process-helper child (append-char-for-index s idx) callback))
(loop (+ 1 idx))))))
;; Return the number of words stored in the trie.
(define (trie-count-words node)
(let* ((cntr 0)
(count-fn (lambda (s) (set! cntr (+ 1 cntr)))))
(trie-process-helper node "" count-fn)
cntr))
;; A dedicated word-counting procedure. Uses no "helper" function
;; and does not contain code to generalize to additional abilities.
(define (faster-counter node)
(letrec ((word-count 0)
(cf (lambda (nd)
(when (trie-end-of-word? nd)
(set! word-count (+ 1 word-count)))
(let loop ((idx 0))
(when (< idx alphabet-size)
(let ((child (child-for-index nd idx)))
(when (not (null? child))
(cf child))
(loop (+ 1 idx))))))))
(cf node)
word-count))
;; Return an ordered list of lines in the named file.
(define (readlines filename)
(call-with-input-file filename
(lambda (p)
(let loop ((line (read-line p))
(result '()))
(if (eof-object? line)
result
;;(reverse result)
(loop (read-line p) (cons line result)))))))
(let ((root (trie-new-node)))
;; Open the dictionary file, containing one word per line, and
;; build a trie from the contents.
(define (trie-build-dictionary! word-list-file-name root-node)
(display "Enter 'trie-build-dictionary!") (newline)
(with-input-from-file word-list-file-name
;; Process the file by adding words from the
;; file, one word per line, to the trie root-node.
(lambda ()
(let loop ((line (read-line (current-input-port))))
(when (not (eof-object? line))
(trie-add-word! root-node line)
(loop (read-line (current-input-port)))))))
(display "Exit 'trie-build-dictionary!") (newline))
(time "Build dictionary"
;; Use the small dictionary file for debugging.
;;(trie-build-dictionary! "./word-lists/subset_list.txt" root))
(trie-build-dictionary! "./word-lists/ENABLE_word_list.txt" root))
(display "Done loading dictionary. Counting...") (newline)
(time "Counting words"
(begin
(display "Words in trie: ") (display (trie-count-words root))
(newline)))
(time "Fast counting words"
(begin
(display "Fast count of words: ")
(display (faster-counter root)) (newline))))正如我提到的,运行这段代码几乎是瞬间的,它是用Chez编写的程序的启动例程的一部分。但在赤壁我得到了:
;; Run with chibi
➜ david in trees chibi-scheme trie-problem.scm
Enter 'trie-build-dictionary!
Exit 'trie-build-dictionary!
Build dictionary: 356168 ms
Done loading dictionary. Counting...
Words in trie: 172820
Counting words: 238451 ms
Fast count of words: 172820
Fast counting words: 30125 ms(2017年iMac Pro运行macOS 13.3 x86)
我已经看到,Chibi的开发人员不像开发工作的其他方面那样关心性能,但我仍然对这种差异感到惊讶。六分钟来加载文件并构建基于trie的字典。用四分钟数字典中的字数。
此代码使用trie-count-words过程进行计数。与完整库中的其他几个过程一样,trie-count-words将真正的工作推送给了“助手”trie-process-helper。这个过程需要两个参数来指定它的行为。在这种情况下,只使用一个。当找到一个单词时,它只增加一个计数器。
考虑到广义处理例程可能不是最优的,我添加了第二个专门用于计数的过程,faster-counter。结果显示速度显著提高,但仍需30秒进行计数。
也许赤壁不是最好的选择?因此,我尝试使用其他几个R7RS实现,Gauche方案和小鸡方案。以下是一些结果。
;; Run with the Gauche interpreter
➜ david in trees gosh -r7 trie-problem.scm
Enter 'trie-build-dictionary!
Exit 'trie-build-dictionary!
;(time "Build dictionary" (trie-build-dictionary! "./word-lists/ENABLE_w"...
; real 1.554
; user 2.080
; sys 0.460
Done loading dictionary. Counting...
Words in trie: 172820
;(time "Counting words" (begin (display "Words in trie: ") (display (tri ...
; real 8.902
; user 8.880
; sys 0.030
Fast count of words: 172820
;(time "Fast counting words" (begin (display "Fast count of words: ") (d ...
; real 8.750
; user 8.750
; sys 0.000;; Run with Chicken interpreter
➜ david in trees csi -R r7rs -q -batch trie-problem.scm
Enter 'trie-build-dictionary!
Exit 'trie-build-dictionary!
5.032s CPU time, 2.217s GC time (major), 1866429/163502 mutations (total/tracked), 12/29010 GCs (major/minor), maximum live heap: 775.96 MiB
Done loading dictionary. Counting...
Words in trie: 172820
57.799s CPU time, 0.182s GC time (major), 16193729/2250046 mutations (total/tracked), 1/786642 GCs (major/minor), maximum live heap: 775.95 MiB
Fast count of words: 172820
56.73s CPU time, 0.184s GC time (major), 15785834/2246632 mutations (total/tracked), 1/784933 GCs (major/minor), maximum live heap: 775.95 MiB;; Run with Chicken compiler
➜ david in trees csc -R r7rs trie-problem.scm
➜ david in trees ./trie-problem
Enter 'trie-build-dictionary!
Exit 'trie-build-dictionary!
2.584s CPU time, 1.872s GC time (major), 387893/25139 mutations (total/tracked), 10/3388 GCs (major/minor), maximum live heap: 775.67 MiB
Done loading dictionary. Counting...
Words in trie: 172820
4.948s CPU time, 387884/135 mutations (total/tracked), 0/73428 GCs (major/minor), maximum live heap: 775.67 MiB
Fast count of words: 172820
4.773s CPU time, 4/0 mutations (total/tracked), 0/68512 GCs (major/minor), maximum live heap: 337.52 KiB再次,好得多,但仍然不是很好的Chez。
不过也有一些奇怪的地方。“更快”的计票程序几乎没有什么区别。(也许更好的优化者?)此外,计数过程比加载和构建字典花费更长的时间。这似乎不对。
问题是,是否有更好的方法来做到这一点?我没有想到的更快的方法?
谢谢你的帮助。
发布于 2023-03-30 16:08:30
你想知道慢下来是怎么来的。
(let ((child (child-for-index nd idx)))
(when (not (null? child))
(cf child))
(loop (+ 1 idx))))))))我们能不能把那个(incf idx)移开,这样它就不会在尽头了?也许通过使idx成为递归cf调用的一个参数?
我推测您的R6编译器巧妙地找到了一些代码动作来启用TCO,而R7编译器却没有跳过太多的循环。
检查生成的程序集代码将确认或拒绝这一假设。
发布每行CPU分析的比较也会有帮助。
tl;dr:“慢”代码似乎是递归的,而不是尾递归的。
发布于 2023-04-06 15:58:16
在审查了这个问题并提出了更多的问题后,我想到Chez实现的时间并没有显示出来。我试着用Chez写一个问题的版本来收集这些时间。
;; Run with Chez
➜ david in trees chez --script trie-problem.ss
Building dictionary...
(time (trie-build-dictionary! "./word-lists/ENABLE_word_list.txt" ...))
101 collections
1.337951594s elapsed cpu time, including 1.218552223s collecting
1.337968000s elapsed real time, including 1.218811000s collecting
847145312 bytes allocated, including 40059392 bytes reclaimed
Counting words...
Words in trie: 172820
(time (begin (display "Words in trie: ") ...))
3 collections
0.561774181s elapsed cpu time, including 0.008546659s collecting
0.561770000s elapsed real time, including 0.008564000s collecting
24809104 bytes allocated, including 23145280 bytes reclaimed这比预期的要慢一些。部分原因是用于示例程序的alphabet-size是最初用Chez编写的程序的两倍大。所以,有一些表演。
最初实现的Trie数据结构的一个问题是,它占用了大量内存,其中很多内存通常是空的。而指定字母表大小的需要是丑陋的。
因此,我使用SRFI 69中的哈希表重新实现了完整的库(以及这里显示的示例程序)来存储子库。因此,Trie中使用的字母表没有硬连线限制,所使用的内存取决于所存储的字符,而不是用于保存指向孩子的指针的固定长度向量。
这里是原始问题中提供的示例的一个版本,但是使用哈希表而不是向量来存储指向当前节点的子节点的指针。
;;;
;;; try-hash-problem.scm -- illustrate performance problem with trie library.
;;; This version is re-worked to use a hash-table as the basis for the
;;; trie children rather than a vector. Also eliminates the need to
;;; hard-wire an alphabet size into the implementation.
;;;
(import (scheme base)
(scheme file)
(scheme write)
(srfi 69))
(cond-expand
(gauche (import (gauche time)))
(chibi (import (chibi time)))
(else))
;; In a trie, each node consists of an "end-of-word" marker and
;; a hash-table of pointers to child nodes.
(define-record-type trie
(mk-trie c end-of-word? children)
trie?
(c trie-c trie-c-set!)
(end-of-word? trie-end-of-word? trie-end-of-word-set!)
(children trie-children trie-children-set!))
;; Return a new, initialized trie node.
(define (trie-new-node)
(let ((node (mk-trie #\null #f (make-hash-table))))
node))
;; Add a word to the trie.
(define (trie-add-word! trie word)
(let loop ((node trie)
(chars (string->list word)))
(cond ((null? chars)
(trie-end-of-word-set! node #t))
(else
(let* ((c (car chars))
(children (trie-children node))
(child (hash-table-ref/default children c #f)))
(unless child
(set! child (trie-new-node))
(hash-table-set! children c child))
(loop child (cdr chars)))))))
;; Return the number of words stored in the trie.
(define (trie-count-words trie)
(let ((children (trie-children trie)))
(hash-table-fold
children
(lambda (k v count)
(+ count (trie-count-words v)))
(if (trie-end-of-word? trie) 1 0))))
;; Open the dictionary file, containing one word per line, and
;; build a trie from the contents.
(define (trie-build-dictionary! word-list-file-name root-node)
(with-input-from-file word-list-file-name
;; Process the file by adding words from the
;; file, one word per line, to the trie root-node.
(lambda ()
(let loop ((line (read-line (current-input-port))))
(when (not (eof-object? line))
(trie-add-word! root-node line)
(loop (read-line (current-input-port))))))))
;; Do some timing on building a trie from a file of words, then
;; counting the words in the trie.
(let ((root (trie-new-node)))
(display "Building dictionary...") (newline)
(time
;; Use the small dictionary file for debugging.
;;(trie-build-dictionary! "./word-lists/subset_list.txt" root))
(trie-build-dictionary! "./word-lists/ENABLE_word_list.txt" root))
(display "\nCounting words...") (newline)
(time
(begin
(display "Words in trie: ")
(display (trie-count-words root))
(newline))))所需的更改几乎是微不足道的,甚至在更完整的库版本(完整库中更多的过程)中也是如此。
运行这一计划显示,所收集的时间有了很大的改善。在测试的实现中,Gauche解释器报告了最佳性能:
;; Run with Gauche
➜ david in trees gosh -r7 try-hash-problem.scm
Building dictionary...
;(time (trie-build-dictionary! "./word-lists/ENABLE_word_list.txt" root))
; real 0.705
; user 1.890
; sys 0.060
Counting words...
Words in trie: 172820
;(time (begin (display "Words in trie: ") (display (trie-count-words roo ...
; real 0.258
; user 0.590
; sys 0.010例子中的问题也是行数减少。
还请注意,trie-count-words过程已被重新处理以使用一种类型的折叠,即SRFI中的hash-table-fold。
有了这些更改,一些实现的时间就进入了范围,从而使库再次有用。
https://codereview.stackexchange.com/questions/284236
复制相似问题