文章/答案/技术大牛

发布

社区首页 >问答首页 >用Python绘图

问用Python绘图
EN

Code Review用户

提问于 2023-05-30 03:44:48

回答 2查看 697关注 0票数 3

这是一个Part2 to 这个问题，任务是绘制我从时间测量中获得的数据。

我从来没有与朱庇特笔记本(ipynb)合作，因为我交谈的大多数人都在批评他们(在不好的方面)。

大量的重复是显而易见的，这是因为我还没有掌握代码块是如何工作的(我认为它们是独立工作的，->生命周期变量)。

编辑：.ipynb文件是在这里格式化的噩梦，我使用：

###
# Plot [n] placeholder (indicated where the plot is)
###

###
# new cell (indicates new cell)
###

代码：

lists = [[]]
sizes = [10, 100, 1000, 10000, 100000]
current_list = 0

with open('data.txt', 'r') as file:
    lines = file.readlines()

for line in lines:
    line = line.strip()
    if line == "":
        lists.append([])
        current_list += 1
    elif line.startswith('[time]'):
        time = int(line.split(':')[1].strip())
        lists[current_list].append(time)

ms1_times = lists[0]
qs1_times = lists[1]
hs1_times = lists[2]
ms2_times = lists[3]
qs2_times = lists[4]
hs2_times = lists[5]
ms3_times = lists[6]
qs3_times = lists[7]
hs3_times = lists[8]

###
# new cell
###

import matplotlib.pyplot as plt

sizes = [10, 100, 1000, 10000, 100000]
ms1_times = lists[0]
ms2_times = lists[3]
ms3_times = lists[6]

plt.figure(figsize=(10, 6))

plt.plot(sizes, ms1_times, marker='o', label='MS #1')
plt.plot(sizes, ms2_times, marker='s', label='MS #2')
plt.plot(sizes, ms3_times, marker='^', label='MS #3')

plt.xlabel('Sizes')
plt.ylabel('Times [ms]')
plt.title('Comparison of Measure Sort Execution')
plt.legend()

plt.show()

###
# Plot 1 placeholder
###

###
# new cell
###

import matplotlib.pyplot as plt

sizes = [10, 100, 1000, 10000, 100000]
ms1_times = lists[1]
ms2_times = lists[4]
ms3_times = lists[7]

plt.figure(figsize=(10, 6)) 

plt.plot(sizes, ms1_times, marker='o', label='QS #1')
plt.plot(sizes, ms2_times, marker='s', label='QS #2')
plt.plot(sizes, ms3_times, marker='^', label='QS #3')

plt.xlabel('Sizes')
plt.ylabel('Times [ms]')
plt.title('Comparison of Quick Sort Execution')
plt.legend()
plt.show()

###
# Plot 2 placeholder
###

###
# new cell
###

import matplotlib.pyplot as plt

sizes = [10, 100, 1000, 10000, 100000]
ms1_times = lists[2]
ms2_times = lists[5]
ms3_times = lists[8]

plt.figure(figsize=(10, 6))

plt.plot(sizes, ms1_times, marker='o', label='HS #1')
plt.plot(sizes, ms2_times, marker='s', label='HS #2')
plt.plot(sizes, ms3_times, marker='^', label='HS #3')

plt.xlabel('Sizes')
plt.ylabel('Time [ms]')
plt.title('Comparison of Heap Sort Execution')
plt.legend()
plt.show()

###
# Plot 3 placeholder
###

###
# new cell
###

import matplotlib.pyplot as plt

ms_10 = ms1_times[0]
qs_10 = qs1_times[0]
hs_10 = hs1_times[0]

ms_100 = ms1_times[1]
qs_100 = qs1_times[1]
hs_100 = hs1_times[1]

ms_1000 = ms1_times[2]
qs_1000 = qs1_times[2]
hs_1000 = hs1_times[2]

x = ['10', '100', '1000']
y_ms = [ms_10, ms_100, ms_1000]
y_qs = [qs_10, qs_100, qs_1000]
y_hs = [hs_10, hs_100, hs_1000]

plt.figure(figsize=(10, 6))  # Width: 8 inches, Height: 6 inches

plt.scatter(x, y_hs, c='red', label='Heap Sort')
plt.plot(x, y_hs, c='red', linestyle='-')

plt.scatter(x, y_ms, c='blue', label='Merge Sort')
plt.plot(x, y_ms, c='blue', linestyle='--')

plt.scatter(x, y_qs, c='green', label='Quick Sort')
plt.plot(x, y_qs, c='green', linestyle='-')

plt.xlabel('Size')
plt.ylabel('Time')
plt.title('Smaller Sizes [1 - 100]')
plt.legend()
plt.show()

###
# Blue plot and the Red plot are overlapping due to the speed of the sorting
# algorithm with small arrays (size: 10, 100, 1000)
###

###
# Plot 4 placeholder
###

###
# new cell
###

import numpy as np
import matplotlib.pyplot as plt

ms1_times = lists[0]
ms2_times = lists[3]
ms3_times = lists[6]

ms_times = [ms1_times, ms2_times, ms3_times]
qs_times = [qs1_times, qs2_times, qs3_times]
hs_times = [hs1_times, hs2_times, hs3_times]

ms_avg = np.mean(ms_times, axis=0)
qs_avg = np.mean(qs_times, axis=0)
hs_avg = np.mean(hs_times, axis=0)

sizes = [1, 10, 100, 1000, 10000]
plt.figure(figsize=(10, 6))  # Width: 8 inches, Height: 6 inches

plt.plot(sizes, ms_avg, marker='o', markersize=4, alpha=1, label='Merge Sort')
plt.plot(sizes, qs_avg, marker='o', markersize=4, alpha=1, label='Quick Sort')
plt.plot(sizes, hs_avg, marker='o', markersize=4, alpha=1, label='Heap Sort')

plt.xlabel('Sizes')
plt.ylabel('Time')
plt.title('Averages of MS, QS, and HS')
plt.legend()
plt.show()

###
# Plot 5 placeholder
###

###
# new cell
###

import matplotlib.pyplot as plt

ms_1000 = ms1_times[2]
qs_1000 = qs1_times[2]
hs_1000 = hs1_times[2]

ms_10000 = ms1_times[3]
qs_10000 = qs1_times[3]
hs_10000 = hs1_times[3]

ms_100000 = ms1_times[4]
qs_100000 = qs1_times[4]
hs_100000 = hs1_times[4]

x = ['1000','10000', '1000000']
y_ms = [ms_1000, ms_10000, ms_100000] 
y_qs = [qs_1000, qs_10000, qs_100000]  
y_hs = [hs_1000, hs_10000, hs_100000]  

plt.figure(figsize=(10, 6))

plt.scatter(x, y_hs, c='red', label='Heap Sort')
plt.plot(x, y_hs, c='red', linestyle='-')

plt.scatter(x, y_ms, c='blue', label='Merge Sort')
plt.plot(x, y_ms, c='blue', linestyle='-')

plt.scatter(x, y_qs, c='green', label='Quick Sort')
plt.plot(x, y_qs, c='green', linestyle='-')

plt.xlabel('Size')
plt.ylabel('Time')
plt.title('Bigger Sizes [1000 - 100000]')
plt.legend()
plt.show()

###
# Plot 6 placeholder
###

数据

注意:如果有人要检查这一点，请将以下数据复制到data.txt中，因为这是我的脚本在目录中寻找的文件。

[time]:2
[time]:31
[time]:168
[time]:2015
[time]:22956

[time]: 2
[time]: 13
[time]: 186
[time]: 2125
[time]: 24799

[time]:1
[time]:15
[time]:246
[time]:3192
[time]:41653

[time]: 2
[time]: 14
[time]: 169
[time]: 2061
[time]: 23578

[time]: 2
[time]: 12
[time]: 160
[time]: 2218
[time]: 26630

[time]: 1
[time]: 17
[time]: 253
[time]: 3367
[time]: 42713

[time]: 2
[time]: 14
[time]: 163
[time]: 1980
[time]: 22682

[time]: 2
[time]: 12
[time]: 164
[time]: 2092
[time]: 25826

[time]: 1
[time]: 15
[time]: 245
[time]: 3253
[time]: 40700

我希望得到一些关于潜在改进和设计选择建议的反馈意见。我将在明年进入Data Science，在那里，木星、笔记本和Python的使用非常普遍。

jupyter

python

homework

回答 2

Code Review用户

发布于 2023-05-30 07:51:14

您的代码需要很多改进。

分裂数据

for line in lines:
    line = line.strip()
    if line == "":
        lists.append([])
        current_list += 1
    elif line.startswith('[time]'):
        time = int(line.split(':')[1].strip())
        lists[current_list].append(time)

在上面的代码中，您将您的示例转换为：

[[2, 31, 168, 2015, 22956],
 [2, 13, 186, 2125, 24799],
 [1, 15, 246, 3192, 41653],
 [2, 14, 169, 2061, 23578],
 [2, 12, 160, 2218, 26630],
 [1, 17, 253, 3367, 42713],
 [2, 14, 163, 1980, 22682],
 [2, 12, 164, 2092, 25826],
 [1, 15, 245, 3253, 40700]]

这是一种相当复杂和低效的方法。

观察每组行有五个项，每个数字前面有一个公共前缀[time]:。

所以你可以用一个清单理解来完成它。

首先由两个换行符'\n\n'拆分，得到每组五个行，然后将五行组拆分成单独的行并删除前缀，然后转换为int：

times = np.array([
    [
        int(row.removeprefix('[time]:')) 
        for row 
        in group.splitlines()
    ] 
    for group 
    in lines.split('\n\n')
])

重组数据

ms1_times = lists[0]
qs1_times = lists[1]
hs1_times = lists[2]
ms2_times = lists[3]
qs2_times = lists[4]
hs2_times = lists[5]
ms3_times = lists[6]
qs3_times = lists[7]
hs3_times = lists[8]

以上线路的用途是什么？它们是完全没有必要的，也是完全无用的。

如果您想访问list中的任何内容，只需使用索引，就像您已经做的那样。

然后在下面的行中，您立即重新分配名称。因此，即使在您的脚本中，上面的行也是没有目的的。

但从变量名称来看，您可能希望这样做：

array([[    2,    31,   168,  2015, 22956],
       [    2,    13,   186,  2125, 24799],
       [    1,    15,   246,  3192, 41653],
       [    2,    14,   169,  2061, 23578],
       [    2,    12,   160,  2218, 26630],
       [    1,    17,   253,  3367, 42713],
       [    2,    14,   163,  1980, 22682],
       [    2,    12,   164,  2092, 25826],
       [    1,    15,   245,  3253, 40700]])

变成这样：

array([[[    2,    31,   168,  2015, 22956],
        [    2,    13,   186,  2125, 24799],
        [    1,    15,   246,  3192, 41653]],

       [[    2,    14,   169,  2061, 23578],
        [    2,    12,   160,  2218, 26630],
        [    1,    17,   253,  3367, 42713]],

       [[    2,    14,   163,  1980, 22682],
        [    2,    12,   164,  2092, 25826],
        [    1,    15,   245,  3253, 40700]]])

这很容易做。

使用基于列表理解的方法：

times = [times[i:i+3] for i in range(0, len(times), 3)]

但这是低效的。您可以使用np.reshape以更好的效率执行此操作：

times = times.reshape((len(times)//15, 3, 5))

重复全球进口

import matplotlib.pyplot as plt

上线在全局范围内多次出现。别干那事。导入名称后，该范围内的所有后续代码都可以看到该名称，因此绝对不需要重新导入相同的名称。您需要将导入的名称放在脚本的顶部，以符合PEP8。

重复代码

sizes = [10, 100, 1000, 10000, 100000]
ms1_times = lists[0]
ms2_times = lists[3]
ms3_times = lists[6]

plt.figure(figsize=(10, 6))

plt.plot(sizes, ms1_times, marker='o', label='MS #1')
plt.plot(sizes, ms2_times, marker='s', label='MS #2')
plt.plot(sizes, ms3_times, marker='^', label='MS #3')

plt.xlabel('Sizes')
plt.ylabel('Times [ms]')
plt.title('Comparison of Measure Sort Execution')
plt.legend()

plt.show()

您使用了上述结构三次，但变化很小。你应该把它变成一种功能。sizes已经被定义了，而且您从未更改过它，所以绝对没有必要重新定义它。而且，由于它不是变异的，所以您需要使它成为一个全局不变的tuple。

重新考虑：

SIZES = (10, 100, 1000, 10000, 100000)
MARKERS = ('o', 's', '^')

def plot(rows, algorithm):
    plt.figure(figsize=(10, 6))
    letter = algorithm[0].upper()
    for i, row in enumerate(rows):
        plt.plot(SIZES, row, marker=MARKERS[i], label=f'{letter}S #{i+1}')

    plt.xlabel('Sizes')
    plt.ylabel('Times [ms]')
    plt.title(f'Comparison of {algorithm} Sort Execution')
    plt.legend()

    plt.show()

像这样使用：

plot(arr[:, 0], 'Measure')

然后你的第二个和第三个情节：

plot(arr[:, 1], 'Quick')
plot(arr[:, 2], 'Heap')

您可以在一个循环中完成所有三个步骤：

algorithms = ('Measure', 'Quick', 'Heap')
for i, algorithm in enumerate(algorithms):
    plot(arr[:, i], algorithm)

重复

接下来的三个代码块也包含大量的重复，并且还需要重构为函数。但我有事情要做，如果我为你做这一切，你什么也学不到。我会把它作为任务留给你。

以下是给你的一些建议：

如果你按照我的指示，ms1_times, qs1_times, hs1_times就是times[0]。

ms_times = times[:, 0]
qs_times = times[:, 1]
hs_times = times[:, 2]

times[..., 2]
times[..., 3]
times[..., 4]

你知道上面三个例子是什么意思吗？

票数 5

Code Review用户

发布于 2023-05-30 08:30:32

为作业

使用最合适的工具

对于某些用例，朱庇特笔记本是一个很好的工具。它的设计允许组合代码、丰富的文本和表示输出。它确实在学术界得到了相当广泛的应用(包括数据科学在内的许多领域)，因为它可以成为一个很好的教学和演示工具。

但是，示例代码没有利用这些特性。它运行得很好(尽管可以改进，请参见Ξένη-Γήινος's的回答)作为标准的Python脚本，因为没有文本/标记单元格。

考虑到这一点，你可以选择两条途径：

使用vanilla :这是最无聊的路径，因为您可以简单地以Python脚本的形式运行当前代码
使用朱庇特笔记本的特性:您的代码处理一些未标记的数据，并将其表示为少数图表。你可以通过解释数据的来源，为什么每个图表都显示一个有趣的结果，你可以从这些结果中得出什么结论来增加数据的价值。这就是这个工具的意义所在。

票数 2

页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://codereview.stackexchange.com/questions/285242

复制

相似问题

问用Python绘图
EN

代码：

数据

回答 2

Code Review用户

分裂数据

重组数据

重复全球进口

重复代码

重复

Code Review用户

为作业

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用Python绘图EN

代码：

数据

回答 2

Code Review用户

分裂数据

重组数据

重复全球进口

重复代码

重复

Code Review用户

为作业

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用Python绘图
EN