我正在尝试学习在julia中使用并行计算,并且正在尝试执行这段代码(可以在这里找到)。
using Distributed
addprocs(2, exeflags="--project=.")
@everywhere begin
using Distributed
using StatsBase
using BenchmarkTools
end
data = rand(1000,2000)
@everywhere function t2(d1,d2)
append!(d1,d2)
d1
end
@btime begin
res = @distributed (t2) for col = 1:size(data)[2]
[(myid(),col, StatsBase.mean(data[:,col]))]
end
end结果是在我的笔记本电脑上有4个核心和8个线程(2.21 GHz)是
11.836 ms (182 allocations: 78.06 KiB) 但当我尝试扩大规模,再添加2个内核时,时间似乎并没有改善:
addprocs(2, exeflags="--project=.")
nworkers() # result 4
@everywhere begin
using Distributed
using StatsBase
using BenchmarkTools
end
data = rand(1000,2000)
@everywhere function t2(d1,d2)
append!(d1,d2)
d1
end
@btime begin
res = @distributed (t2) for col = 1:size(data)[2]
[(myid(),col, StatsBase.mean(data[:,col]))]
end
end最终计算时间为:
15.449 ms (340 allocations: 132.34 KiB)你知道我做错了什么吗?谢谢1:@distributed seems to work, function return is wonky
发布于 2021-05-06 22:01:40
下面是一个使用GroupedDataFrames的示例:
using DataFrames
using CSV
function main(input_file::String, output_file::String, by_index::Array{Symbol,1})
data = DataFrame(CSV.File(input_file))
grouped_rows = groupby(data, by_index)
Threads.@threads for group in collect(SubDataFrame, grouped_rows)
index_value = group[1, by_index]
println(index_value)
# compute slow function for group of rows in dataframe
output_vector = costly_function(group)
# copy vector to elements in the dataframe
group[:, :p] .= output_vector
end
CSV.write(output_file, data)
endhttps://stackoverflow.com/questions/67360088
复制相似问题