CSV文件如下所示
Stock
asin name quantiy
----------------------------------
B01EI65LTI Blue Shoes 20
B079VGHTM2 Black Shoes 10
B07B33J5CK Green Shoes 3
Storage fees
asin currency estimated-monthly-storage-fee
----------------------------------------------------
B01EI65LTI EUR 0.7988
B079VGHTM2 GBP 0.4656
B07B33J5CK EUR 0.1264
Sales
amazon-order-id asin quantity item-price
----------------------------------------------------------
404-2135868-6928346 B01EI65LTI 1 59.95
402-0310960-9618709 B079VGHTM2 1 18.95
403-6094647-7799558 B07B33J5CK 1 14.95 目前,我的解决方案是向实体添加额外的is和关联,例如,股票和存储费用
/**
* @ORM\Entity(repositoryClass="App\Repository\StockRepository")
* @ORM\Table(indexes={@ORM\Index(name="asin_idx", columns={"asin"})})
*/
class Stock
{
/**
* @ORM\Id()
* @ORM\GeneratedValue()
* @ORM\Column(type="integer")
*/
private $id;
/** @ORM\Column(type="string", length=16) */
private $asin;
/** @ORM\Column(type="string", length=255) */
private $productName;
/** @ORM\Column(type="integer") */
private $afnWarehouseQuantity;
/**
* @OneToOne(targetEntity="StorageFee")
* @JoinColumn(name="storageFeeId", referencedColumnName="id")
*/
private $storageFee;
}
/** @ORM\Entity(repositoryClass="App\Repository\StorageFeeRepository") */
class StorageFee
{
/**
* @ORM\Id()
* @ORM\GeneratedValue()
* @ORM\Column(type="integer")
*/
private $id;
/** @ORM\Column(type="string", length=16) */
private $asin;
/** @ORM\Column(type="string", length=8) */
private $currency;
/** @ORM\Column(type="float") */
private $estimatedMonthlyStorageFee;
}和“批量插入”实体
class StockRepository extends ServiceEntityRepository
{
public function insertFromFile(string $fileName)
{
$this->getEntityManager()->getConnection()
->getConfiguration()->setSQLLogger(null);
// read csv...
foreach ($csv as $row) {
$stock = (new Stock())
->setAsin($row['asin'])
->setName($row['name'])
->setQuantity($row['quantity'])
;
$this->getEntityManager()->persist($stock);
if (($numInsert % $20) === 0) {
$this->getEntityManager()->flush();
$this->getEntityManager()->clear();
}
}
$this->getEntityManager()->flush(); // flush remaining
$this->getEntityManager()->clear();
}
}并且在第二步中“批量更新”该关联。
public function updateAssociation()
{
$this->getEntityManager()->getConnection()
->getConfiguration()->setSQLLogger(null);
$query = $this->getEntityManager()->createQuery('SELECT sf.id, sf.asin FROM App\Entity\StorageFee sf');
$storageFees = $query->getResult();
$query = $this->getEntityManager()->createQuery('UPDATE App\Entity\Stock s SET s.storageFee = :id WHERE s.asin = :asin');
foreach ($storageFees as $row) {
$query->setParameter('id', $row['id'])
->setParameter('asin', $row['asin']);
$query->execute();
}
}我尽了最大的努力,但对于这些大文件(最多35.000行),我仍然有很长的脚本运行时间-20-40秒。在第二步中设置关联看起来并不是很“专业”。我认为。但混合插入和更新+60秒运行时(?)也是一个问题。
也许我有一个错误的概念。这里建议使用已有的asin作为密钥?以前从未使用过字符串键...并且还没有在理论上做太多工作。
对于任何建议,我都会非常高兴。谢谢&致以最良好的问候
发布于 2019-03-30 15:56:43
关于Jakumi -我已经更新了我的代码,按照正确的导入顺序,它的速度是原来的两倍。(批处理大小为200会导致更高的内存使用峰值)
class StockRepository extends ServiceEntityRepository
{
public function insertFromFile(string $fileName)
{
$this->getEntityManager()->getConnection()
->getConfiguration()->setSQLLogger(null);
// get storeage fee ids
$query = $this->getEntityManager()
->createQuery(/** @lang DQL */'
SELECT sf.id, sf.asin
FROM App\Entity\StorageFee sf
');
$map = [];
foreach($query->getResult() as $row) {
$map[$row['asin']] = $row['id'];
}
// read csv...
foreach ($csv as $row) {
$stock = (new Stock())
->setAsin($row['asin'])
->setName($row['name'])
->setQuantity($row['quantity'])
;
// add reference
if (isset($map[$row['asin']])) {
$storageFee = $this->getEntityManager()->getReference('App\Entity\StorageFee', $map[$row['asin']]);
$stock->setStorageFee($storageFee);
}
$this->getEntityManager()->persist($stock);
if (($numInsert % 200) === 0) {
$this->getEntityManager()->flush();
$this->getEntityManager()->clear();
}
}
$this->getEntityManager()->flush(); // flush remaining
$this->getEntityManager()->clear();
}
}https://stackoverflow.com/questions/55426180
复制相似问题