我使用Python 3来使用Faker包掩蔽数据集。我在:http://blog.districtdatalabs.com/a-practical-guide-to-anonymizing-datasets-with-python-faker获得了一个可用的代码。
代码:
def anonymize_rows(rows):
"""
Rows is an iterable of dictionaries that contain name and
email fields that need to be anonymized.
"""
# Load the faker and its providers
faker = Factory.create()
# Create mappings of names & emails to faked names & emails.
c1 = defaultdict(faker.CARD_NO_ID)
c2 = defaultdict(faker.ISS_USER_NAME)
# Iterate over the rows and yield anonymized rows.
for row in rows:
# Replace the name and email fields with faked fields.
row['CARD_NO_ID'] = c1[row['CARD_NO_ID']]
row['ISS_USER_NAME'] = c2[row['ISS_USER_NAME']]
# Yield the row back to the caller
yield row
"""
The source argument is a path to a CSV file containing data to
anonymize, while target is a path to write the anonymized CSV data to.
"""
source = 'card_transaction_data_all.csv'
target = 'card_transaction_data_all_fake.csv'
with open(source, 'rU') as f:
with open(target, 'w') as o:
# Use the DictReader to easily extract fields
reader = csv.DictReader(f)
writer = csv.DictWriter(o, reader.fieldnames)
# Read and anonymize data, writing to target file.
for row in anonymize_rows(reader):
writer.writerow(row)但我经常犯以下错误:
C:\Anaconda3.4\lib\site-packages\spyderlib\widgets\externalshell\start_ipython_kernel.py:1: DeprecationWarning:'U‘模式被废弃# --编码: utf-8 --回溯(最近一次调用):
文件"",第5行,在writer = csv.DictWriter(o,reader.fieldnames)中
文件"C:\Anaconda3.4\lib\csv.py",第96行,字段名为self._fieldnames = next(self.reader)
文件"C:\Anaconda3.4\lib\site-packages\unicodecsv\py3.py",第55行,在next中返回
文件"C:\Anaconda3.4\lib\site-packages\unicodecsv\py3.py",第51行,f=(f中bs的bs.decode(编码,errors=errors) )
AttributeError:'str‘对象没有属性'decode’
有人能帮我用Python 3实现代码吗?非常感谢。
发布于 2018-01-09 15:22:33
对于Python3,使用标准csv (导入csv)并删除“rU”中的U
发布于 2019-01-20 05:49:45
我也花了一些时间把网上找到的python2伪例子转换成python3。下面的转换应该是有效的(感谢@AKhooli的回答!)
import csv
from faker import Faker
from collections import defaultdict
def anonymize_rows(rows):
"""
Rows is an iterable of dictionaries that contain name and
email fields that need to be anonymized.
"""
# Load the faker and its providers
faker = Faker()
# Create mappings of names & emails to faked names & emails.
c1 = defaultdict(faker.msisdn)
c2 = defaultdict(faker.name)
# Iterate over the rows and yield anonymized rows.
for row in rows:
# Replace the name and email fields with faked fields.
row['CARD_NO_ID'] = c1[row['CARD_NO_ID']]
row['ISS_USER_NAME'] = c2[row['ISS_USER_NAME']]
# Yield the row back to the caller
yield row
"""
The source argument is a path to a CSV file containing data to
anonymize, while target is a path to write the anonymized CSV data to.
"""
source = 'card_transaction_data_all.csv'
target = 'card_transaction_data_all_fake.csv'
with open(source, 'r') as f:
with open(target, 'w', newline='') as o:
# Use the DictReader to easily extract fields
reader = csv.DictReader(f)
writer = csv.DictWriter(o, reader.fieldnames)
# Read and anonymize data, writing to target file
# with header!
writer.writeheader()
for row in anonymize_rows(reader):
writer.writerow(row)https://stackoverflow.com/questions/44094916
复制相似问题