总的来说,我对并发/并行编程还不熟悉。为了尝试(并希望看到) goroutines的性能优势,我编写了一个小型测试程序,它只生成1亿个随机int--首先是在一个goroutine中,然后是在runtime.NumCPU()报告的同样多的峡谷中。
然而,使用更多的goroutines总是比使用单一的更糟糕的性能。我想我在我的程序设计或者我使用goroutines/channels/其他Go特性的方式中缺少了一些重要的东西。任何反馈都是非常感谢的。
我附上下面的代码。
package main
import "fmt"
import "time"
import "math/rand"
import "runtime"
func main() {
// Figure out how many CPUs are available and tell Go to use all of them
numThreads := runtime.NumCPU()
runtime.GOMAXPROCS(numThreads)
// Number of random ints to generate
var numIntsToGenerate = 100000000
// Number of ints to be generated by each spawned goroutine thread
var numIntsPerThread = numIntsToGenerate / numThreads
// Channel for communicating from goroutines back to main function
ch := make(chan int, numIntsToGenerate)
// Slices to keep resulting ints
singleThreadIntSlice := make([]int, numIntsToGenerate, numIntsToGenerate)
multiThreadIntSlice := make([]int, numIntsToGenerate, numIntsToGenerate)
fmt.Printf("Initiating single-threaded random number generation.\n")
startSingleRun := time.Now()
// Generate all of the ints from a single goroutine, retrieve the expected
// number of ints from the channel and put in target slice
go makeRandomNumbers(numIntsToGenerate, ch)
for i := 0; i < numIntsToGenerate; i++ {
singleThreadIntSlice = append(singleThreadIntSlice,(<-ch))
}
elapsedSingleRun := time.Since(startSingleRun)
fmt.Printf("Single-threaded run took %s\n", elapsedSingleRun)
fmt.Printf("Initiating multi-threaded random number generation.\n")
startMultiRun := time.Now()
// Run the designated number of goroutines, each of which generates its
// expected share of the total random ints, retrieve the expected number
// of ints from the channel and put in target slice
for i := 0; i < numThreads; i++ {
go makeRandomNumbers(numIntsPerThread, ch)
}
for i := 0; i < numIntsToGenerate; i++ {
multiThreadIntSlice = append(multiThreadIntSlice,(<-ch))
}
elapsedMultiRun := time.Since(startMultiRun)
fmt.Printf("Multi-threaded run took %s\n", elapsedMultiRun)
}
func makeRandomNumbers(numInts int, ch chan int) {
source := rand.NewSource(time.Now().UnixNano())
generator := rand.New(source)
for i := 0; i < numInts; i++ {
ch <- generator.Intn(numInts*100)
}
}发布于 2017-01-13 10:51:49
首先,让我们对代码中的一些内容进行修正和优化:
从Go 1.5开始,GOMAXPROCS默认为可用的CPU核数,因此不需要设置这个值(尽管它没有坏处)。
要生成的数字:
var numIntsToGenerate = 100000000
var numIntsPerThread = numIntsToGenerate / numThreads如果numThreads是3,对于多个组,生成的数字会更少(由于整数除法),所以让我们更正一下:
numIntsToGenerate = numIntsPerThread * numThreads不需要为1亿数值提供缓冲,将其降至合理值(例如1000):
ch := make(chan int, 1000)如果您想使用append(),您创建的片应该有0长度(和适当的容量):
singleThreadIntSlice := make([]int, 0, numIntsToGenerate)
multiThreadIntSlice := make([]int, 0, numIntsToGenerate)但在您的情况下,这是不必要的,因为只有一个goroutine在收集结果,您可以简单地使用索引,并创建如下所示的切片:
singleThreadIntSlice := make([]int, numIntsToGenerate)
multiThreadIntSlice := make([]int, numIntsToGenerate)在收集结果时:
for i := 0; i < numIntsToGenerate; i++ {
singleThreadIntSlice[i] = <-ch
}
// ...
for i := 0; i < numIntsToGenerate; i++ {
multiThreadIntSlice[i] = <-ch
}好的。代码现在更好了。尝试运行它时,您仍然会体验到,multi版本运行得更慢。为什么会这样呢?
这是因为控制、同步和收集来自多个峡谷的结果确实有开销。如果他们执行的任务很少,那么通信开销就会更大,总体来说,您将失去性能。
你的案子就是这样的。一旦设置了rand.Rand(),生成一个随机数是相当快的。
让我们修改您的“任务”,使其足够大,这样我们就可以看到多个goroutines的好处:
// 1 million is enough now:
var numIntsToGenerate = 1000 * 1000
func makeRandomNumbers(numInts int, ch chan int) {
source := rand.NewSource(time.Now().UnixNano())
generator := rand.New(source)
for i := 0; i < numInts; i++ {
// Kill time, do some processing:
for j := 0; j < 1000; j++ {
generator.Intn(numInts * 100)
}
// and now return a single random number
ch <- generator.Intn(numInts * 100)
}
}在这种情况下,为了得到一个随机数,我们生成1000个随机数,并在生成返回的随机数之前丢弃它们(进行一些计算/终止时间)。我们这样做是为了使工作人员的计算时间超过多个用户的通信开销。
现在运行这个应用程序,我的结果在一台4核机器上:
Initiating single-threaded random number generation.
Single-threaded run took 2.440604504s
Initiating multi-threaded random number generation.
Multi-threaded run took 987.946758ms版本运行速度比快2.5倍。这意味着如果你的大猩猩能在1000个块中传递随机数,你会看到执行速度比单一的哥鲁丁一代快2.5倍。
最后一个注意事项:
你的单纯线版本也使用多个goroutines: 1生成数字,1收集结果。很可能收集器没有充分利用CPU核心,大部分只是等待结果,但是仍然:使用了2个CPU核心。让我们估计"1.5“CPU核被利用了。而multi版本则使用了4个CPU内核。正如粗略估计:4/ 1.5 = 2.66,非常接近我们的性能增益。
发布于 2017-01-13 11:31:48
如果你真的想并行地生成随机数,那么每个任务都应该是生成数字,然后一次返回它们,而不是一次生成一个数字,然后把它们输入一个通道,因为在多步例程情况下,读写通道会使事情变慢。下面是修改后的代码,然后任务一次生成所需的数字,这在多个go例程情况下表现得更好,我还使用切片来收集多个go例程的结果。
package main
import "fmt"
import "time"
import "math/rand"
import "runtime"
func main() {
// Figure out how many CPUs are available and tell Go to use all of them
numThreads := runtime.NumCPU()
runtime.GOMAXPROCS(numThreads)
// Number of random ints to generate
var numIntsToGenerate = 100000000
// Number of ints to be generated by each spawned goroutine thread
var numIntsPerThread = numIntsToGenerate / numThreads
// Channel for communicating from goroutines back to main function
ch := make(chan []int)
fmt.Printf("Initiating single-threaded random number generation.\n")
startSingleRun := time.Now()
// Generate all of the ints from a single goroutine, retrieve the expected
// number of ints from the channel and put in target slice
go makeRandomNumbers(numIntsToGenerate, ch)
singleThreadIntSlice := <-ch
elapsedSingleRun := time.Since(startSingleRun)
fmt.Printf("Single-threaded run took %s\n", elapsedSingleRun)
fmt.Printf("Initiating multi-threaded random number generation.\n")
multiThreadIntSlice := make([][]int, numThreads)
startMultiRun := time.Now()
// Run the designated number of goroutines, each of which generates its
// expected share of the total random ints, retrieve the expected number
// of ints from the channel and put in target slice
for i := 0; i < numThreads; i++ {
go makeRandomNumbers(numIntsPerThread, ch)
}
for i := 0; i < numThreads; i++ {
multiThreadIntSlice[i] = <-ch
}
elapsedMultiRun := time.Since(startMultiRun)
fmt.Printf("Multi-threaded run took %s\n", elapsedMultiRun)
//To avoid not used warning
fmt.Print(len(singleThreadIntSlice))
}
func makeRandomNumbers(numInts int, ch chan []int) {
source := rand.NewSource(time.Now().UnixNano())
generator := rand.New(source)
result := make([]int, numInts)
for i := 0; i < numInts; i++ {
result[i] = generator.Intn(numInts * 100)
}
ch <- result
}https://stackoverflow.com/questions/41632285
复制相似问题