这是一个Part2 to 这个问题,任务是绘制我从时间测量中获得的数据。
我从来没有与朱庇特笔记本(ipynb)合作,因为我交谈的大多数人都在批评他们(在不好的方面)。
大量的重复是显而易见的,这是因为我还没有掌握代码块是如何工作的(我认为它们是独立工作的,->生命周期变量)。
编辑:.ipynb文件是在这里格式化的噩梦,我使用:
###
# Plot [n] placeholder (indicated where the plot is)
######
# new cell (indicates new cell)
###lists = [[]]
sizes = [10, 100, 1000, 10000, 100000]
current_list = 0
with open('data.txt', 'r') as file:
lines = file.readlines()
for line in lines:
line = line.strip()
if line == "":
lists.append([])
current_list += 1
elif line.startswith('[time]'):
time = int(line.split(':')[1].strip())
lists[current_list].append(time)
ms1_times = lists[0]
qs1_times = lists[1]
hs1_times = lists[2]
ms2_times = lists[3]
qs2_times = lists[4]
hs2_times = lists[5]
ms3_times = lists[6]
qs3_times = lists[7]
hs3_times = lists[8]
###
# new cell
###
import matplotlib.pyplot as plt
sizes = [10, 100, 1000, 10000, 100000]
ms1_times = lists[0]
ms2_times = lists[3]
ms3_times = lists[6]
plt.figure(figsize=(10, 6))
plt.plot(sizes, ms1_times, marker='o', label='MS #1')
plt.plot(sizes, ms2_times, marker='s', label='MS #2')
plt.plot(sizes, ms3_times, marker='^', label='MS #3')
plt.xlabel('Sizes')
plt.ylabel('Times [ms]')
plt.title('Comparison of Measure Sort Execution')
plt.legend()
plt.show()
###
# Plot 1 placeholder
###
###
# new cell
###
import matplotlib.pyplot as plt
sizes = [10, 100, 1000, 10000, 100000]
ms1_times = lists[1]
ms2_times = lists[4]
ms3_times = lists[7]
plt.figure(figsize=(10, 6))
plt.plot(sizes, ms1_times, marker='o', label='QS #1')
plt.plot(sizes, ms2_times, marker='s', label='QS #2')
plt.plot(sizes, ms3_times, marker='^', label='QS #3')
plt.xlabel('Sizes')
plt.ylabel('Times [ms]')
plt.title('Comparison of Quick Sort Execution')
plt.legend()
plt.show()
###
# Plot 2 placeholder
###
###
# new cell
###
import matplotlib.pyplot as plt
sizes = [10, 100, 1000, 10000, 100000]
ms1_times = lists[2]
ms2_times = lists[5]
ms3_times = lists[8]
plt.figure(figsize=(10, 6))
plt.plot(sizes, ms1_times, marker='o', label='HS #1')
plt.plot(sizes, ms2_times, marker='s', label='HS #2')
plt.plot(sizes, ms3_times, marker='^', label='HS #3')
plt.xlabel('Sizes')
plt.ylabel('Time [ms]')
plt.title('Comparison of Heap Sort Execution')
plt.legend()
plt.show()
###
# Plot 3 placeholder
###
###
# new cell
###
import matplotlib.pyplot as plt
ms_10 = ms1_times[0]
qs_10 = qs1_times[0]
hs_10 = hs1_times[0]
ms_100 = ms1_times[1]
qs_100 = qs1_times[1]
hs_100 = hs1_times[1]
ms_1000 = ms1_times[2]
qs_1000 = qs1_times[2]
hs_1000 = hs1_times[2]
x = ['10', '100', '1000']
y_ms = [ms_10, ms_100, ms_1000]
y_qs = [qs_10, qs_100, qs_1000]
y_hs = [hs_10, hs_100, hs_1000]
plt.figure(figsize=(10, 6)) # Width: 8 inches, Height: 6 inches
plt.scatter(x, y_hs, c='red', label='Heap Sort')
plt.plot(x, y_hs, c='red', linestyle='-')
plt.scatter(x, y_ms, c='blue', label='Merge Sort')
plt.plot(x, y_ms, c='blue', linestyle='--')
plt.scatter(x, y_qs, c='green', label='Quick Sort')
plt.plot(x, y_qs, c='green', linestyle='-')
plt.xlabel('Size')
plt.ylabel('Time')
plt.title('Smaller Sizes [1 - 100]')
plt.legend()
plt.show()
###
# Blue plot and the Red plot are overlapping due to the speed of the sorting
# algorithm with small arrays (size: 10, 100, 1000)
###
###
# Plot 4 placeholder
###
###
# new cell
###
import numpy as np
import matplotlib.pyplot as plt
ms1_times = lists[0]
ms2_times = lists[3]
ms3_times = lists[6]
ms_times = [ms1_times, ms2_times, ms3_times]
qs_times = [qs1_times, qs2_times, qs3_times]
hs_times = [hs1_times, hs2_times, hs3_times]
ms_avg = np.mean(ms_times, axis=0)
qs_avg = np.mean(qs_times, axis=0)
hs_avg = np.mean(hs_times, axis=0)
sizes = [1, 10, 100, 1000, 10000]
plt.figure(figsize=(10, 6)) # Width: 8 inches, Height: 6 inches
plt.plot(sizes, ms_avg, marker='o', markersize=4, alpha=1, label='Merge Sort')
plt.plot(sizes, qs_avg, marker='o', markersize=4, alpha=1, label='Quick Sort')
plt.plot(sizes, hs_avg, marker='o', markersize=4, alpha=1, label='Heap Sort')
plt.xlabel('Sizes')
plt.ylabel('Time')
plt.title('Averages of MS, QS, and HS')
plt.legend()
plt.show()
###
# Plot 5 placeholder
###
###
# new cell
###
import matplotlib.pyplot as plt
ms_1000 = ms1_times[2]
qs_1000 = qs1_times[2]
hs_1000 = hs1_times[2]
ms_10000 = ms1_times[3]
qs_10000 = qs1_times[3]
hs_10000 = hs1_times[3]
ms_100000 = ms1_times[4]
qs_100000 = qs1_times[4]
hs_100000 = hs1_times[4]
x = ['1000','10000', '1000000']
y_ms = [ms_1000, ms_10000, ms_100000]
y_qs = [qs_1000, qs_10000, qs_100000]
y_hs = [hs_1000, hs_10000, hs_100000]
plt.figure(figsize=(10, 6))
plt.scatter(x, y_hs, c='red', label='Heap Sort')
plt.plot(x, y_hs, c='red', linestyle='-')
plt.scatter(x, y_ms, c='blue', label='Merge Sort')
plt.plot(x, y_ms, c='blue', linestyle='-')
plt.scatter(x, y_qs, c='green', label='Quick Sort')
plt.plot(x, y_qs, c='green', linestyle='-')
plt.xlabel('Size')
plt.ylabel('Time')
plt.title('Bigger Sizes [1000 - 100000]')
plt.legend()
plt.show()
###
# Plot 6 placeholder
###data.txt中,因为这是我的脚本在目录中寻找的文件。[time]:2
[time]:31
[time]:168
[time]:2015
[time]:22956
[time]: 2
[time]: 13
[time]: 186
[time]: 2125
[time]: 24799
[time]:1
[time]:15
[time]:246
[time]:3192
[time]:41653
[time]: 2
[time]: 14
[time]: 169
[time]: 2061
[time]: 23578
[time]: 2
[time]: 12
[time]: 160
[time]: 2218
[time]: 26630
[time]: 1
[time]: 17
[time]: 253
[time]: 3367
[time]: 42713
[time]: 2
[time]: 14
[time]: 163
[time]: 1980
[time]: 22682
[time]: 2
[time]: 12
[time]: 164
[time]: 2092
[time]: 25826
[time]: 1
[time]: 15
[time]: 245
[time]: 3253
[time]: 40700我希望得到一些关于潜在改进和设计选择建议的反馈意见。我将在明年进入Data Science,在那里,木星、笔记本和Python的使用非常普遍。
发布于 2023-05-30 07:51:14
您的代码需要很多改进。
for line in lines:
line = line.strip()
if line == "":
lists.append([])
current_list += 1
elif line.startswith('[time]'):
time = int(line.split(':')[1].strip())
lists[current_list].append(time)在上面的代码中,您将您的示例转换为:
[[2, 31, 168, 2015, 22956],
[2, 13, 186, 2125, 24799],
[1, 15, 246, 3192, 41653],
[2, 14, 169, 2061, 23578],
[2, 12, 160, 2218, 26630],
[1, 17, 253, 3367, 42713],
[2, 14, 163, 1980, 22682],
[2, 12, 164, 2092, 25826],
[1, 15, 245, 3253, 40700]]这是一种相当复杂和低效的方法。
观察每组行有五个项,每个数字前面有一个公共前缀[time]:。
所以你可以用一个清单理解来完成它。
首先由两个换行符'\n\n'拆分,得到每组五个行,然后将五行组拆分成单独的行并删除前缀,然后转换为int:
times = np.array([
[
int(row.removeprefix('[time]:'))
for row
in group.splitlines()
]
for group
in lines.split('\n\n')
])ms1_times = lists[0]
qs1_times = lists[1]
hs1_times = lists[2]
ms2_times = lists[3]
qs2_times = lists[4]
hs2_times = lists[5]
ms3_times = lists[6]
qs3_times = lists[7]
hs3_times = lists[8]以上线路的用途是什么?它们是完全没有必要的,也是完全无用的。
如果您想访问list中的任何内容,只需使用索引,就像您已经做的那样。
然后在下面的行中,您立即重新分配名称。因此,即使在您的脚本中,上面的行也是没有目的的。
但从变量名称来看,您可能希望这样做:
array([[ 2, 31, 168, 2015, 22956],
[ 2, 13, 186, 2125, 24799],
[ 1, 15, 246, 3192, 41653],
[ 2, 14, 169, 2061, 23578],
[ 2, 12, 160, 2218, 26630],
[ 1, 17, 253, 3367, 42713],
[ 2, 14, 163, 1980, 22682],
[ 2, 12, 164, 2092, 25826],
[ 1, 15, 245, 3253, 40700]])变成这样:
array([[[ 2, 31, 168, 2015, 22956],
[ 2, 13, 186, 2125, 24799],
[ 1, 15, 246, 3192, 41653]],
[[ 2, 14, 169, 2061, 23578],
[ 2, 12, 160, 2218, 26630],
[ 1, 17, 253, 3367, 42713]],
[[ 2, 14, 163, 1980, 22682],
[ 2, 12, 164, 2092, 25826],
[ 1, 15, 245, 3253, 40700]]])这很容易做。
使用基于列表理解的方法:
times = [times[i:i+3] for i in range(0, len(times), 3)]但这是低效的。您可以使用np.reshape以更好的效率执行此操作:
times = times.reshape((len(times)//15, 3, 5))import matplotlib.pyplot as plt
上线在全局范围内多次出现。别干那事。导入名称后,该范围内的所有后续代码都可以看到该名称,因此绝对不需要重新导入相同的名称。您需要将导入的名称放在脚本的顶部,以符合PEP8。
sizes = [10, 100, 1000, 10000, 100000]
ms1_times = lists[0]
ms2_times = lists[3]
ms3_times = lists[6]
plt.figure(figsize=(10, 6))
plt.plot(sizes, ms1_times, marker='o', label='MS #1')
plt.plot(sizes, ms2_times, marker='s', label='MS #2')
plt.plot(sizes, ms3_times, marker='^', label='MS #3')
plt.xlabel('Sizes')
plt.ylabel('Times [ms]')
plt.title('Comparison of Measure Sort Execution')
plt.legend()
plt.show()您使用了上述结构三次,但变化很小。你应该把它变成一种功能。sizes已经被定义了,而且您从未更改过它,所以绝对没有必要重新定义它。而且,由于它不是变异的,所以您需要使它成为一个全局不变的tuple。
重新考虑:
SIZES = (10, 100, 1000, 10000, 100000)
MARKERS = ('o', 's', '^')
def plot(rows, algorithm):
plt.figure(figsize=(10, 6))
letter = algorithm[0].upper()
for i, row in enumerate(rows):
plt.plot(SIZES, row, marker=MARKERS[i], label=f'{letter}S #{i+1}')
plt.xlabel('Sizes')
plt.ylabel('Times [ms]')
plt.title(f'Comparison of {algorithm} Sort Execution')
plt.legend()
plt.show()像这样使用:
plot(arr[:, 0], 'Measure')然后你的第二个和第三个情节:
plot(arr[:, 1], 'Quick')
plot(arr[:, 2], 'Heap')您可以在一个循环中完成所有三个步骤:
algorithms = ('Measure', 'Quick', 'Heap')
for i, algorithm in enumerate(algorithms):
plot(arr[:, i], algorithm)接下来的三个代码块也包含大量的重复,并且还需要重构为函数。但我有事情要做,如果我为你做这一切,你什么也学不到。我会把它作为任务留给你。
以下是给你的一些建议:
如果你按照我的指示,ms1_times, qs1_times, hs1_times就是times[0]。
ms_times = times[:, 0]
qs_times = times[:, 1]
hs_times = times[:, 2]times[..., 2]
times[..., 3]
times[..., 4]你知道上面三个例子是什么意思吗?
发布于 2023-05-30 08:30:32
使用最合适的工具
对于某些用例,朱庇特笔记本是一个很好的工具。它的设计允许组合代码、丰富的文本和表示输出。它确实在学术界得到了相当广泛的应用(包括数据科学在内的许多领域),因为它可以成为一个很好的教学和演示工具。
但是,示例代码没有利用这些特性。它运行得很好(尽管可以改进,请参见Ξένη-Γήινος's的回答)作为标准的Python脚本,因为没有文本/标记单元格。
考虑到这一点,你可以选择两条途径:
https://codereview.stackexchange.com/questions/285242
复制相似问题