首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何使用Phrets加速从MLS导入Rets (光明MLS)

如何使用Phrets加速从MLS导入Rets (光明MLS)
EN

Stack Overflow用户
提问于 2019-06-28 07:25:41
回答 2查看 369关注 0票数 1

我是RETS导入的新手,所以我开发了这个脚本:

代码语言:javascript
复制
<?php
use Aws\Credentials\Credentials;
use Aws\S3\S3Client;

date_default_timezone_set('America/New_York');

/* --------------------------------------------------------
------- GET THE ARRAY OF SELECTED FIELDS TO INDEX FROM MLS ---
--------------------------------------------------------*/

require_once ('retsfields.php');
$filteredretsfields = array_filter( $retsfields );

/* --------------------------------------------------------
------- DATABASE ------------
--------------------------------------------------------*/

$servername = "xxx";
$username = "xxx";
$password = "xxx";
$dbname = "xxx";

// Create connection
$conn = new mysqli($servername, $username, $password, $dbname);
mysqli_query($conn, "SET SESSION sql_mode = ''");
// Check connection
if ($conn->connect_error) {
    die("Connection failed: " . $conn->connect_error);
}

$startTime = date('Y-m-d H:i:s');

/* ----------------------------------------------------------------------*/
/* ------------ IS A PREVIOUS FEED ALREADY RUNNING? ----------------*/
/* -------------------------------------------------------------------*/

$getFeedRunning = "SELECT * FROM feedsettings WHERE name = 'feedrunning';";
$rowFeedRunning = $conn->query( $getFeedRunning );

while($FeedRunning = $rowFeedRunning->fetch_assoc()) {
    $feedRunningValue = $FeedRunning['val'];
}

if ($feedRunningValue == '0') {

    /* ----------------------------------------------------------------------*/
    /* ---------------- UPDATE FEEDRUNNING -------------------------*/
    /* -----------------------------------------------------------------------*/

    $updateFeedRunning = "UPDATE feedsettings SET val='1', updated_at='".$startTime."' WHERE name='feedrunning';";

    if ($conn->query($updateFeedRunning) === TRUE) {
        //echo "Record updated successfully";
    } else {
        //echo "Error updating record: " . $conn->error;
    }

    /* ---------------------------------------------------------------------*/
    /* ---------------- ADD TO LOG TABLE -------------------------- */
    /* ------------------------------------------------------------------------*/

    $sqlLogs = "INSERT INTO importlogs (totallistings, starttime, endtime, totalimages, lastindex, newlastindex, comments) VALUES ('', '$startTime', '', '', '', '', 'Feed started!')";

    if ( $conn->query( $sqlLogs ) === true ) {
        //echo "New record created successfully";
    } else {
        //echo "Error: " . $sql . "<br>" . $conn->error;
    }

    /* --------------------------------------------------------
    ------- RETS APIS ------------
    --------------------------------------------------------*/

    require_once("vendor/autoload.php");

    $config = new \PHRETS\Configuration;
    $config->setLoginUrl('xxx')
           ->setUsername('xxx')
           ->setPassword('xxx')
           ->setUserAgent('Bright RETS Application/1.0')
            ->setRetsVersion('1.7.2');

    $rets = new \PHRETS\Session($config);

    $connect = $rets->Login();

    /* -----------------------------------------------------------------*/
    /* ---------------- AMAZON S3 ---------------------------*/
    /* -----------------------------------------------------------------*/

    $credentials = new Aws\Credentials\Credentials('xxx', 'xxx');

    $s3 = new Aws\S3\S3Client([
        'region'  => 'us-east-2',
        'version' => 'latest',
        'credentials' => $credentials,
        //'debug' => true
    ]);

    $s3->registerStreamWrapper();

    /* ----------------------------------------------------------------------*/
    /*  LASTINDEX: WHAT WAS THE LAST IMPORTED DATE: EG: 2019-06-24T08:45:45 */
    /* -------------------------------------------------------------------*/

    $getLastIndex = "SELECT * FROM feedsettings WHERE name = 'lastindex';";
    $rowLastIndex = $conn->query( $getLastIndex );

    while($lastIndex = $rowLastIndex->fetch_assoc()) {
        $lastIndexValue = $lastIndex['val'];
    }

    $newLastIndex = date('Y-m-d').'T'.date('H:i:s');

    /* ----------------------------------------------------------------------*/
    /* --- FETCH NUMBER OF LISTINGS: MAP LASTINDEX TO MODIFICATION DATE ----*/
    /* -------------------------------------------------------------------*/

    $totalProperties = $rets->Search('Property', 'ALL', '(ModificationTimestamp='.$lastIndexValue.'+)', ['Limit' => 1]);

    $totalPropertiesCount = $totalProperties->getTotalResultsCount();
    $numberOfOffsetPages = ceil($totalPropertiesCount / 1000);

    $iCount = 0;
    $numberOfPhotos = 0;

    for ($offset = 1; $offset <= $totalPropertiesCount; $offset=($iCount*1000)+1) {

        /* ------------ FETCH RECORDS PER 1000 LISTINGS --------------- */
        $results = $rets->Search('Property', 'ALL', '(ModificationTimestamp='.$lastIndexValue.'+)', [
            'Limit' => 1000,
            'Offset' => $offset,
            'StandardNames' => 0, // give system names
        ]);

        foreach ( $results as $r ) {

            $retsdata = array();

            foreach ( $filteredretsfields as $key => $value ) {
                $retsdata[ $key ] = htmlspecialchars( $r[ $value ], ENT_QUOTES );
            }

            $date = date( 'Y-m-d H:i:s' );

            $retsdata['created_at'] = $date;
            $retsdata['updated_at'] = $date;

            $columns = implode( ",", array_keys( $retsdata ) );
            $values  = "'" . implode( "','", $retsdata ) . "'";

            $searchsql = "SELECT * FROM properties WHERE ListingId = '" . $r['ListingId'] . "'";
            $checkExistingListings = $conn->query( $searchsql );

            if ( $checkExistingListings->num_rows > 0 ) {
                //DELETE EXISTING RECORDS
                $conn->query( 'DELETE FROM properties WHERE ListingId = "' . $r['ListingId'] . '";' );
                $conn->query( 'DELETE FROM images WHERE ListingId = "' . $r['ListingId'] . '";' );
                //TODO DELETE THE EXISTING IMAGES
                //INSERT NEW RECORD
                $sql = "INSERT INTO properties ($columns) VALUES ($values)";
            } else {
                //INSERT NEW RECORD
                $sql = "INSERT INTO properties ($columns) VALUES ($values)";
            }

            if ( $conn->query( $sql ) === true ) {
                //echo "New record created successfully -> ";
                //echo $sql;
            } else {
                //echo "Error: " . $sql . "<br>" . $conn->error;
            }

            /* -------------------------------------------------------------------*/
            /* ------------ FETCH IMAGES ----------------*/
            /* -------------------------------------------------------------------*/

            $numberOfPhotos = $numberOfPhotos + $r['TotalPhotos'];

            $photos = '';
            $photoNumber = 0;

            if ($r['TotalPhotos'] != 0) {
                for ($photoNumber = 1; $photoNumber <= $r['TotalPhotos']; $photoNumber++) {

                    /* ----- FETCH PHOTO PER PHOTO TO AVOID MEMORY ISSUES ----- */
                    /* ---- FETCH ALL PHOTOS IN ONE GETOBJECT() CRASHED THE SERVER ---- */
                    $photos = $rets->GetObject("Property", "Thumbnail", $r['ListingKey'], $photoNumber);

                    for ($i = 0; $i < 4096; $i++) {
                        preg_replace('/\d/', '', 1);
                    }

                    foreach ($photos as $photo) {
                        $listing = $r['ListingId'];
                        $number = $photo->getObjectId();

                        if (!$photo->isError()) {
                            $key = $listing.'/image-'.$listing.'-'.$number.'.jpg';
                            // Send a PutObject request and get the result object.

                            $context = stream_context_create(array(
                                's3' => array(
                                    'ACL' => 'public-read'
                                )
                            ));

                            /* -- COPY THE PHOTO TO S3 -- */
                            file_put_contents('s3://my-bucket-url/'.$key, $photo->getContent(), 0, $context);

                            $sqlImages = "INSERT INTO images (ListingId, storedUrl) VALUES ('$listing', '$key')";

                            if ( $conn->query( $sqlImages ) === true ) {
                                //echo "New record created successfully -> ";
                                //echo $sql;
                            } else {
                                //echo "Error: " . $sqlImages . "<br>" . $conn->error;
                            }
                        }
                        else {
                            //echo 'error';
                        }
                    }
                }

                unset($photos);
            }

            /* ----- CLEAN VARIABLES --------------------------------*/
            unset($columns, $values, $retsdata);
        }

        /* ----- CLEAN VARIABLES --------------------------------*/
        unset($results);
        $iCount++;
    }

    $endTime = date('Y-m-d H:i:s');

    /* ----------------------------------------------------------------------*/
    /* ---------------- UPDATE LASTINDEX -------------------------*/
    /* -----------------------------------------------------------------------*/

    $updateLastIndex = "UPDATE feedsettings SET val='".$newLastIndex."', updated_at='".$endTime."' WHERE name='lastindex';";

    if ($conn->query($updateLastIndex) === TRUE) {
        //echo "Record updated successfully";
    } else {
        //echo "Error updating record: " . $conn->error;
    }

    /* ----------------------------------------------------------------------*/
    /* ---------------- UPDATE FEEDRUNNING -------------------------*/
    /* -----------------------------------------------------------------------*/

    $updateFeedRunning = "UPDATE feedsettings SET val='0', updated_at='".$endTime."' WHERE name='feedrunning';";

    if ($conn->query($updateFeedRunning) === TRUE) {
        //echo "Record updated successfully";
    } else {
        //echo "Error updating record: " . $conn->error;
    }

    /* ---------------------------------------------------------------------*/
    /* ---------------- ADD TO LOG TABLE -------------------------- */
    /* ------------------------------------------------------------------------*/

    $sqlLogs = "INSERT INTO importlogs (totallistings, starttime, endtime, totalimages, lastindex, newlastindex) VALUES ($totalPropertiesCount, '$startTime', '$endTime', $numberOfPhotos, '$lastIndexValue', '$newLastIndex')";

    if ( $conn->query( $sqlLogs ) === true ) {
        //echo "New record created successfully -> ";
        //echo $sql;
    } else {
        //echo "Error: " . $sql . "<br>" . $conn->error;
    }

} else {

    $endTime = date('Y-m-d H:i:s');

    /* ---------------------------------------------------------------------*/
    /* ---------------- ADD TO LOG TABLE -------------------------- */
    /* ------------------------------------------------------------------------*/

    $sqlLogs = "INSERT INTO importlogs (totallistings, starttime, endtime, totalimages, lastindex, newlastindex, comments) VALUES ('', '$startTime', '$endTime', '', '', '', 'Feed already running!')";

    if ( $conn->query( $sqlLogs ) === true ) {
        //echo "New record created successfully -> ";
        //echo $sql;
    } else {
        //echo "Error: " . $sql . "<br>" . $conn->error;
    }
}

$conn->close();

原则上,代码运行良好,虽然我有一些内存问题,但请参阅代码中的注释。我主要关心的是如何改进,加快进口过程。我需要索引的MLS是光明MLS。我们需要存储所有的数据,包括服务器上的图像。所以我在过去的18个小时里运行了这段代码,它导入了大约8000个列表和200.000张图片,然后由于一个坏的网关而崩溃。

我知道这种大规模的导入只需要一次(在开始时),因为我可以每x小时做一次部分更新。

所以我有两个问题:

  • 如何使它更稳定,让我从这些崩溃中更容易恢复,因为这意味着我需要重新启动每一次从一开始。
  • 如何加快导入速度,因为在这个速度下,导入所有的列表需要11天的时间。也许我不需要有五年的历史

我正在考虑导入没有图像的所有内容,存储图像urls,这样完整的MLS就被索引了。之后,我将运行一个进程,一个接一个地获取这些图像(这将需要11天)。任何将图像导入到Amazon队列中的想法(没有经验)。

非常感谢

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2019-12-31 09:06:28

您可以拆分列表、数据和图像导入进程。先下载所有的清单细节,然后尝试获取它们的图像。这样,就不会出现崩溃或内存问题。

票数 1
EN

Stack Overflow用户

发布于 2020-03-18 08:33:49

我只是在回答你的两个问题

  • 如何使它更稳定,让我从这些崩溃中更容易恢复,因为这意味着我需要重新启动每一次从一开始。

您可以拆分查询以获取数据。考虑到您需要最近5年的历史数据,那么您可以像下面这样拆分查询

代码语言:javascript
复制
$query1 = (ModificationTimestamp=2016-01-01T00:00:00-2016-12-31T23:59:59);
$query2 = (ModificationTimestamp=2017-01-01T00:00:00-2017-12-31T23:59:59);
$query3 = (ModificationTimestamp=2018-01-01T00:00:00-2018-12-31T23:59:59);
$query4 = (ModificationTimestamp=2019-01-01T00:00:00-2019-12-31T23:59:59);
$query5 = (ModificationTimestamp=2020-01-01T00:00:00+);

如果需要的话,分头再跑。可以按照您的逻辑在循环中运行这些拆分查询。因此,如果在中间停止,您知道哪个批处理崩溃了,以及您需要重新启动的位置。

  • 如何加快导入速度,因为在这个速度下,导入所有的列表需要11天的时间。也许我不需要有五年的历史

这个完全取决于MLS板。它们保持着提取速度。一些MLS板可以根据请求临时增加带宽。在这段时间内,尝试提取完整的数据。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/56802639

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档