文章/答案/技术大牛

发布

问Pythonic减少子类的方法
EN

Stack Overflow用户

提问于 2020-09-13 01:52:11

回答 1查看 23关注 0票数 0

背景:所以，我正在研究一个NLP问题。我需要根据不同类型的文本文档提取不同类型的特征。我目前有一个设置，其中有一个FeatureExtractor基类，它根据不同类型的文档被多次子类化，所有这些基类都计算一组不同的特征，并返回一个pandas数据帧作为输出。

所有这些子类都由一个名为FeatureExtractionRunner的包装器类型类进一步调用，它调用所有子类，计算所有文档上的特性，并返回所有类型文档的输出。

问题:这种计算特征的模式会导致大量的子类。目前，我有14个子类，因为我有14种类型的docs.it可以进一步扩展。而且这个类太多了，很难维护。有没有其他方法可以做到这一点？更少的子类化

下面是我所解释的一些代表性代码示例：

from abc import ABCMeta, abstractmethod

class FeatureExtractor(metaclass=ABCMeta):
    #base feature extractor class
    def __init__(self, document):
        self.document = document
        
        
    @abstractmethod
    def doc_to_features(self):
        return NotImplemented
    
    
class ExtractorTypeA(FeatureExtractor):
    #do some feature calculations.....
    
    def _calculate_shape_features(self):
        return None
    
    def _calculate_size_features(self):
        return None
    
    def doc_to_features(self):
        #calls all the fancy feature calculation methods like 
        f1 = self._calculate_shape_features(self.document)
        f2 = self._calculate_size_features(self.document)
        #do some calculations on the document and return a pandas dataframe by merging them  (merge f1, f2....etc)
        data = "dataframe-1"
        return data
    
    
class ExtractorTypeB(FeatureExtractor):
    #do some feature calculations.....
    
    def _calculate_some_fancy_features(self):
        return None
    
    def _calculate_some_more_fancy_features(self):
        return None
    
    def doc_to_features(self):
        #calls all the fancy feature calculation methods
        f1 = self._calculate_some_fancy_features(self.document)
        f2 = self._calculate_some_more_fancy_features(self.document)
        #do some calculations on the document and return a pandas dataframe (merge f1, f2 etc)
        data = "dataframe-2"
        return data
    
class ExtractorTypeC(FeatureExtractor):
    #do some feature calculations.....
    
    def doc_to_features(self):
        #do some calculations on the document and return a pandas dataframe
        data = "dataframe-3"
        return data

class FeatureExtractionRunner:
    #a class to call all types of feature extractors 
    def __init__(self, document, *args, **kwargs):
        self.document = document
        self.type_a = ExtractorTypeA(self.document)
        self.type_b = ExtractorTypeB(self.document)
        self.type_c = ExtractorTypeC(self.document)
        #more of these extractors would be there
        
    def call_all_type_of_extractors(self):
        type_a_features = self.type_a.doc_to_features()
        type_b_features = self.type_b.doc_to_features()
        type_c_features = self.type_c.doc_to_features()
        #more such extractors would be there....
        
        return [type_a_features, type_b_features, type_c_features]
        
        
all_type_of_features = FeatureExtractionRunner("some document").call_all_type_of_extractors()

python-3.x

design-patterns

nlp

pipeline

composition

回答 1

Stack Overflow用户

发布于 2020-09-17 15:29:08

首先回答这个问题，您可以避免完全以每次编写__init__方法为代价来实现子类化。或者，您可以完全删除这些类，并将它们转换为一组函数。或者你甚至可以把所有的班级加入到一个班级中。请注意，这些方法都不会使代码更简单或更易维护，事实上，它们只是在某种程度上改变了代码的形状。

这种情况是inherent problem complexity的一个很好的例子，我的意思是领域(NLP)和特定用例(文档特征提取)本身就很复杂。

例如，featureX和featureY可能是完全不同的东西，无法完全计算，因此您最终只能得到一个方法。类似地，在数据帧中合并这些功能的过程可能与合并奇特功能的过程不同。在这种情况下有许多函数/类对我来说似乎是完全合理的，而且将它们分开也是合乎逻辑和可维护的。

也就是说，如果你能将一些feature calculation methods组合成一个更通用的函数，那么真正的代码减少可能是可能的，尽管我不能肯定这是否可能。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/63863272

复制

相似问题

问Pythonic减少子类的方法
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Pythonic减少子类的方法EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Pythonic减少子类的方法
EN