首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Python包索引递归包依赖解析器

Python包索引递归包依赖解析器
EN

Code Review用户
提问于 2023-05-11 16:49:35
回答 2查看 441关注 0票数 6

我的工作有一个独立的网络,开发人员可以编写应用程序在其中运行。这些开发人员经常编写Python代码。这段Python代码通常要求使用类似于pip的工具下载(PyPi)中的模块。为了让这些开发人员在这个孤立的网络上运行他们的软件,我需要为他们提供一个PyPi镜像,以便他们能够获得他们的软件所依赖的模块。为此,我在DMZ中的VM上设置了一个班德瑞奇镜像,并将VM配置为使用此镜像,而不是使用(从他们的角度看是无法访问的) internet。到现在为止还好。

策略规定必须首先批准进入孤立网络的软件,这意味着我不能简单地对整个Python包索引做一个完整的镜像(如果我有这样的磁盘空间的话.我不确定我做了什么),让人们拥有他们想要的一切。Bandersnatch支持这个没有问题;通过将它包含在它的配置文件中,我基本上只有一个列表,这些包将显示在镜像上:

代码语言:javascript
复制
 [allowlist]
 packages =
    absl-py
    astunparse
    caproto
    confluent_kafka
    ConfigArgParse
    ess-streaming-data-types
    flatbuffers
    gast
    gpytorch
    ......

不幸的是,Bander抓举不对此包列表执行任何依赖项解析,从而允许镜像和他们似乎不太可能实现这一功能。。似乎没有任何其他的PyPi镜像服务器被设置为允许这样做,但是如果我遗漏了什么,请告诉我。

因此,我发现自己有一个Python包的列表,我需要递归地了解它们的所有依赖项,这样我就可以反映整个列表。我不是Python程序员,所以我用Perl编写了它。我觉得很管用,但我真的很想再给你一双眼睛看看。

我的代码:https://pastebin.com/hR6Yru4W

代码语言:javascript
复制
#!/usr/bin/env perl
use strict;
use warnings;

# This script is a (probably naive) attempt at writing a recursive dependency resolver for the Python package index (PyPi)
# A list of (space seperated) Python Package names provided on the command line are the starting point
# Each package in that list is added to a graph, then every node(vertex) in that graph has edges defined between the node itself
# and each of it's dependencies (which become their own nodes as a result)
# Every node in the graph is iterated through, has it's dependencies(edges) defined, and the process is repeated until 
# the graph stops changing. Once there are no more dependencies to find, it prints out a (alphabetically sorted) list
# of every package that would be required to install every package given as an argument to this script. Note that this list
# WILL INCLUDE the packages that were initially provided as part of the list.

use JSON;
use Graph;

# Our main dependency graph objects
# We need two so we can check if the graph changed between dependency runs
# When the two graphs stay the same after a dependency run, we know our graph is complete
my $Agraph = Graph->new();
my $Bgraph = Graph->new();

# Accepts a list of python packages to check dependencies for
# Returns a list of python packages the arguments depend upon
# Note that this function is NOT RECURSIVE. It ONLY provides the direct dependencies.
sub get_deps {
    my $ret = [];
    my $json = JSON->new;
    foreach my $package ( @_ ) {
        my $curl = `curl -s "https://pypi.org/pypi/$package/json"`;
        my $reqs = $json->decode($curl)->{info}->{requires_dist};
        if (defined($reqs)) {
            foreach my $dep (@$reqs) {
                $dep =~ s/[^a-zA-Z0-9-].*$//;
                push(@$ret, $dep);
            }
        }
    }
    return $ret;
}

# @ARGV is a list of packages we need to find all dependencies for and is the roots of our dependency graph
foreach my $package (@ARGV) {
    $Agraph->add_vertex($package);
}

# A list of packages we have already gotten dependnecies for so we don't check twice
my $checked = [];

# Loop until our graphs stop changing between dependency runs
while ( $Agraph ne $Bgraph ) {
    $Bgraph = $Agraph->deep_copy();

    # Check every single vertice in our graph
    my @vertlist = $Agraph->vertices;
    foreach my $package (@vertlist) {
        # If we haven't checked this package before, get it added to the graph with all of it's dependencies
        if (! grep( /^$package$/, @$checked )) {
            my $deplist = get_deps($package);
            foreach my $dep (@$deplist) {
                $Agraph->add_edge($package, $dep);
            }
            # Add this package to our list of already checked packages so we don't waste time
            push(@$checked, $package);
        }
    }
}

my @fulldeplist = sort $Agraph->vertices;
foreach my $package (@fulldeplist) {
    print "$package\n";
}
EN

回答 2

Code Review用户

回答已采纳

发布于 2023-05-11 17:51:25

代码的布局非常好,很好地使用了垂直空格和一致的缩进。它也很好地利用了评论。很高兴您使用了strictwarnings

grep行中,最好使用\Q\E来转义$package变量中的任何潜在正则表达式元字符。另见商塔

其余的建议大多是为了良好的编码风格,有些是我个人的喜好。

当我使用use模块(如JSON )时,我只喜欢导入代码所使用的内容,以避免名称空间混乱。在您的例子中,由于您使用了面向对象的接口,您可以指定一个空列表;我将在下面展示这一点。

当我有标题注释时,我喜欢将它们放在Perl 吊舱 (=head等)中。当您使用perldoc命令时,这将为您提供一个命令行手册:

代码语言:javascript
复制
perldoc script.pl

当取消引用数组时,我更喜欢使用额外的大括号:

代码语言:javascript
复制
@{ $ret }

有关更多详细信息,请参阅此评审员策略:Perl::Critic::Policy::References::ProhibitDoubleSigil

由于backticks (``)很容易被忽略,所以我更喜欢使用qx引用操作符;它在代码中更突出。

这是经过修改的代码。我还通过拼写检查器运行了您的代码,并修复了您的注释中的一些输入(dependneciesseperated)。

代码语言:javascript
复制
#!/usr/bin/env perl

use strict;
use warnings;

=head1 Description

 This script is a (probably naive) attempt at writing a recursive dependency resolver for the Python package index (PyPi)
 A list of (space separated) Python Package names provided on the command line are the starting point
 Each package in that list is added to a graph, then every node(vertex) in that graph has edges defined between the node itself
 and each of it's dependencies (which become their own nodes as a result)
 Every node in the graph is iterated through, has it's dependencies(edges) defined, and the process is repeated until 
 the graph stops changing. Once there are no more dependencies to find, it prints out a (alphabetically sorted) list
 of every package that would be required to install every package given as an argument to this script. Note that this list
 WILL INCLUDE the packages that were initially provided as part of the list.

=cut

use JSON  qw();
use Graph qw();

# Our main dependency graph objects
# We need two so we can check if the graph changed between dependency runs
# When the two graphs stay the same after a dependency run, we know our graph is complete
my $Agraph = Graph->new();
my $Bgraph = Graph->new();

# Accepts a list of python packages to check dependencies for
# Returns a list of python packages the arguments depend upon
# Note that this function is NOT RECURSIVE. It ONLY provides the direct dependencies.
sub get_deps {
    my $ret = [];
    my $json = JSON->new;
    foreach my $package ( @_ ) {
        my $curl = qx(curl -s "https://pypi.org/pypi/$package/json");
        my $reqs = $json->decode($curl)->{info}->{requires_dist};
        if (defined($reqs)) {
            foreach my $dep (@{ $reqs }) {
                $dep =~ s/[^a-zA-Z0-9-].*$//;
                push(@{ $ret }, $dep);
            }
        }
    }
    return $ret;
}

# @ARGV is a list of packages we need to find all dependencies for and is the roots of our dependency graph
foreach my $package (@ARGV) {
    $Agraph->add_vertex($package);
}

# A list of packages we have already gotten dependencies for so we don't check twice
my $checked = [];

# Loop until our graphs stop changing between dependency runs
while ( $Agraph ne $Bgraph ) {
    $Bgraph = $Agraph->deep_copy();

    # Check every single vertice in our graph
    my @vertlist = $Agraph->vertices;
    foreach my $package (@vertlist) {
        # If we haven't checked this package before, get it added to the graph with all of it's dependencies
        if (! grep( /^\Q$package\E$/, @{ $checked } )) {
            my $deplist = get_deps($package);
            foreach my $dep (@{ $deplist }) {
                $Agraph->add_edge($package, $dep);
            }
            # Add this package to our list of already checked packages so we don't waste time
            push(@{ $checked }, $package);
        }
    }
}

my @fulldeplist = sort $Agraph->vertices;
foreach my $package (@fulldeplist) {
    print "$package\n";
}

我喜欢在warnings中使用这一行,但对于某些人的口味来说,它可能过于严格:

代码语言:javascript
复制
use warnings FATAL => 'all';

致命

票数 6
EN

Code Review用户

发布于 2023-05-12 14:21:29

我怀疑,带有“虚拟环境”的Python的PIP可以用来跟踪依赖关系,创建一组目录,然后再将这些目录复制到您的“墙花园”中。

这将防止您的开发人员需要安装任何东西(他们将自动获得所有批准的库),但只有在库交叉兼容的情况下才能工作。

主要的优点是如果您希望在库代码被复制到您的独立环境之前检查它。如果开发的应用程序必须协同工作,也可能有利于解决一些跨包的不兼容问题。

票数 0
EN
页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://codereview.stackexchange.com/questions/284931

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档