文章/答案/技术大牛

发布

社区首页 >问答首页 >重构:减少子类的Pythonic方法？

问重构:减少子类的Pythonic方法？
EN

Software Engineering用户

提问于 2020-09-13 07:10:12

回答 1查看 147关注 0票数 -1

背景:所以，我正在研究一个NLP问题。在这里，我需要根据文本文档中不同类型的上下文提取不同类型的特性。我现在有一个设置，其中有一个FeatureExtractor基类，它根据不同类型的上下文被多次子类化，它们都计算不同的特性集，独立地返回一个熊猫数据帧作为输出。

所有这些子类都由一个名为FeatureExtractionRunner的包装器类型类进一步调用，该类调用所有子类并计算文档上的特性，并返回所有类型上下文的输出。

问题:这种计算特性的模式导致了大量的子类，就像现在设置的那样。目前，我有14个子类，因为我有14个不同的contexts.it可能会进一步扩展。这是太多的课程需要维护。有没有其他方法可以做到这一点？用更少的子类。

下面是我解释过的一些示例代表代码：

from abc import ABCMeta, abstractmethod

class FeatureExtractor(metaclass=ABCMeta):
    #base feature extractor class
    def __init__(self, document):
        self.document = document
        
        
    @abstractmethod
    def doc_to_features(self):
        return NotImplemented
    
    
class ExtractorTypeA(FeatureExtractor):
    #do some feature calculations.....
    
    def _calculate_shape_features(self):
        return None
    
    def _calculate_size_features(self):
        return None
    
    def doc_to_features(self):
        #calls all the fancy feature calculation methods like 
        f1 = self._calculate_shape_features(self.document)
        f2 = self._calculate_size_features(self.document)
        #do some calculations on the document and return a pandas dataframe by merging them  (merge f1, f2....etc)
        data = "dataframe-1"
        return data
    
    
class ExtractorTypeB(FeatureExtractor):
    #do some feature calculations.....
    
    def _calculate_some_fancy_features(self):
        return None
    
    def _calculate_some_more_fancy_features(self):
        return None
    
    def doc_to_features(self):
        #calls all the fancy feature calculation methods
        f1 = self._calculate_some_fancy_features(self.document)
        f2 = self._calculate_some_more_fancy_features(self.document)
        #do some calculations on the document and return a pandas dataframe (merge f1, f2 etc)
        data = "dataframe-2"
        return data
    
class ExtractorTypeC(FeatureExtractor):
    #do some feature calculations.....
    
    def doc_to_features(self):
        #do some calculations on the document and return a pandas dataframe
        data = "dataframe-3"
        return data

class FeatureExtractionRunner:
    #a class to call all types of feature extractors 
    def __init__(self, document, *args, **kwargs):
        self.document = document
        self.type_a = ExtractorTypeA(self.document)
        self.type_b = ExtractorTypeB(self.document)
        self.type_c = ExtractorTypeC(self.document)
        #more of these extractors would be there
        
    def call_all_type_of_extractors(self):
        type_a_features = self.type_a.doc_to_features()
        type_b_features = self.type_b.doc_to_features()
        type_c_features = self.type_c.doc_to_features()
        #more such extractors would be there....
        
        return [type_a_features, type_b_features, type_c_features]
        
        
all_type_of_features = FeatureExtractionRunner("some document").call_all_type_of_extractors()

因此，在默认情况下，基类计算一些特性，每个特性表示为一个方法。大概有10种不同的方法。稍后，当子类被分类时，每个子类都有所有的默认特性，加上它们计算的一些额外的特殊特性，它们的范围从2/3方法到6个方法，最大。这些特殊的方法/特性是特定于每个上下文的，因此其他子类不会知道/需要/不会共享它。

functional-programming

refactoring

design-patterns

object-oriented

python

回答 1

Software Engineering用户

回答已采纳

发布于 2020-09-13 10:34:25

你的班级结构似乎很合理。您已经将公共功能提取到基类中。也许您可以在一些提取器之间找到更多的共同点，并将其放入类层次结构的一个新的中间级别。或者把它变成一个自由的函数。没有什么可以强迫您使用严格的OOP方法。

但是，最后，如果您有一个具有复杂的个体行为的用例，那么您的实现将反映出这种复杂性。没有办法可以绕过它。

您可以轻松地避免运行程序中的代码重复。一个简单的解决方案是充当注册表的Extractor类型列表：

class ExtractorTypeA …
class ExtractorTypeB …
#...

# Doesn’t have to be global.
# Could also be an attribute of the runner.
extractor_classes = [
    ExtractorTypeA,
    ExtractorTypeB,
    # ...
]

class FeatureExtractionRunner:
    def __init__(self, document, *args, **kwargs):
        self.document = document
        # instead of duplicated code a list comprehension using the registry
        self.extractors = [
            Extractor(self.document) for Extractor in extractor_classes
        ]

    def call_all_type_of_extractors(self):
        return [extr.doc_to_features() for extr in self.extractors]

现在剩下的唯一的重复就是注册表本身。如果添加提取器类，则必须显式地将其添加到注册表中。这可能不是一个问题，因为它很容易文档化，并且忘记它很容易被单元测试捕获。

您可以使用自定义元类来自动注册。我的头顶上还没有经过测试：

extractor_classes = []

class MetaFeatureExtractor(ABCMeta):
    def __init__(cls, name, bases, attrs):
        super().__init__(name, bases, attrs)

        # registry must only contain subclasses
        if name != 'FeatureExtractor':
            extractor_classes.append(cls)

class FeatureExtractor(metaclass=MetaFeatureExtractor):
    # ...

class ExtractorTypeA(FeatureExtractor) …
class ExtractorTypeB(FeatureExtractor) …
#...

票数 1

页面原文内容由Software Engineering提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://softwareengineering.stackexchange.com/questions/415843

复制

相似问题

问重构:减少子类的Pythonic方法？
EN

回答 1

Software Engineering用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问重构:减少子类的Pythonic方法？EN

回答 1

Software Engineering用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问重构:减少子类的Pythonic方法？
EN