这个双循环在Chez方案中比C++ (分别用--optimize-level 3和-O3编译)慢50倍。
(import
(rnrs)
(rnrs r5rs))
(let* ((n (* 1024 16))
(a (make-vector n))
(acc 0))
(do ((i 0 (+ i 1)))
((= i n) #f)
(vector-set! a i (cons (cos i) (sin i))))
(do ((i 0 (+ i 1)))
((= i n) #f)
(do ((j 0 (+ j 1)))
((= j n) #f)
(let ((ai (vector-ref a i))
(aj (vector-ref a j)))
(set! acc (+ acc (+ (* (car ai) (cdr aj))
(* (cdr ai) (car aj))))))))
(write acc)
(newline))
(exit)vs
#include <iostream>
#include <cmath>
#include <vector>
#include <algorithm>
typedef std::pair<double, double> pr;
typedef std::vector<pr> vec;
double loop(const vec& a)
{
double acc = 0;
const int n = a.size();
for(int i = 0; i < n; ++i)
for(int j = 0; j < n; ++j)
{
const pr& ai = a[i];
const pr& aj = a[j];
acc += ai .first * aj.second +
ai.second * aj .first;
}
return acc;
}
int main()
{
const int n = 1024 * 16;
vec v(n);
for(int i = 0; i < n; ++i)
v[i] = pr(std::cos(i), std::sin(i));
std::cout << loop(v) << std::endl;
}我意识到方案中的内存间接比C++中的内存多,但是性能上的差异还是令人惊讶的.
是否有一个简单的方法来加快计划的版本?(没有将内存布局更改为完全统一的)
发布于 2018-09-27 16:23:56
因此,虽然这些程序看起来是一样的,但它们并不相同。在C版本中使用的是fixnum算法,而Scheme版本使用的是标准的数字塔。要使C版本更像Scheme,请尝试使用bignum库进行计算。
作为一个测试,我用(rnrs arithmetic flonums)和(rnrs arithmetic fixnums)代替了算法,它将DrRacket中的执行时间减半。我希望在任何实现中都会发生同样的情况。
现在,我的初步测试表明,C代码执行速度大约是预期的25倍,而不是预期的50倍。通过转换为浮点算法,C代码的执行速度大约快了15倍。
我认为我可以通过使用不安全的过程来使它更快,因为Scheme在运行时检查每个参数的类型--它在每个过程之前执行操作,这在C版本中是不会发生的。作为一种测试,我将其更改为在实现中使用不安全的过程,现在它只慢了10倍。
希望它对Chez也有帮助:)
编辑
以下是我修改的源代码,它将速度提高了2倍:
#!r6rs
(import
(rnrs)
;; import the * and + that only work on floats (which are faster, but they still check their arguments)
(only (rnrs arithmetic flonums) fl+ fl*))
(let* ((n (* 1024 16))
(a (make-vector n))
(acc 0.0)) ; We want float, lets tell Scheme about that!
;; using inexact f instead of integer i
;; makes every result of cos and sin inexact
(do ((i 0 (+ i 1))
(f 0.0 (+ f 1)))
((= i n) #f)
(vector-set! a i (cons (cos f) (sin f))))
(do ((i 0 (+ i 1)))
((= i n) #f)
(do ((j 0 (+ j 1)))
((= j n) #f)
(let ((ai (vector-ref a i))
(aj (vector-ref a j)))
;; use float versions of + and *
;; since this is where most of the time is used
(set! acc (fl+ acc
(fl+ (fl* (car ai) (cdr aj))
(fl* (cdr ai) (car aj))))))))
(write acc)
(newline))而特定于实现的(锁定)仅仅是为了说明在运行时完成的类型检查确实有影响,这段代码比以前的优化运行速度快30%:
#lang racket
;; this imports import the * and + for floats as unsafe-fl* etc.
(require racket/unsafe/ops)
(let* ((n (* 1024 16))
(a (make-vector n))
(acc 0.0)) ; We want float, lets tell Scheme about that!
(do ((i 0 (+ i 1))
(f 0.0 (+ f 1)))
((= i n) #f)
;; using inexact f instead of integer i
;; makes every result of cos and sin inexact
(vector-set! a i (cons (cos f) (sin f))))
(do ((i 0 (+ i 1)))
((= i n) #f)
(do ((j 0 (+ j 1)))
((= j n) #f)
;; We guarantee argument is a vector
;; and nothing wrong will happen using unsafe accessors
(let ((ai (unsafe-vector-ref a i))
(aj (unsafe-vector-ref a j)))
;; use unsafe float versions of + and *
;; since this is where most of the time is used
;; also use unsafe car/cdr as we guarantee the argument is
;; a pair.
(set! acc (unsafe-fl+ acc
(unsafe-fl+ (unsafe-fl* (unsafe-car ai) (unsafe-cdr aj))
(unsafe-fl* (unsafe-cdr ai) (unsafe-car aj))))))))
(write acc)
(newline))我努力保持原来代码的风格。这不是很惯用的计划。例如:我根本不会使用set!,但它不会影响速度。
https://stackoverflow.com/questions/52528682
复制相似问题