使用salsify';用PHP解析大型(100MB)JSON;PHPProBid中的JsonStreamerParser

使用salsify';用PHP解析大型(100MB)JSON;PHPProBid中的JsonStreamerParser,php,large-data,large-files,jsonstream,Php,Large Data,Large Files,Jsonstream,我有一个JSON文件: curl https://api.mercadolibre.com/sites/MLM/categories/all > categoriesMLM.gz 其中包含一个对象对象(约60000+)。我还安装了via composer,将每个项目保存到PHPProBid中的数据库中,如以下代码所示: public function SaveCategoryByStream(){ require_once dirname(__FILE__).'/../../.

我有一个JSON文件:

curl https://api.mercadolibre.com/sites/MLM/categories/all  > categoriesMLM.gz
其中包含一个对象对象(约60000+)。我还安装了via composer,将每个项目保存到PHPProBid中的数据库中,如以下代码所示:

public function SaveCategoryByStream(){
    require_once dirname(__FILE__).'/../../../../../../vendor/autoload.php';
    ini_set('memory_limit', '4024M');
    ini_set('max_execution_time', 0);
    $testfile = '/home/richi/Desktop/categoriesMLM.json';

    $listener = new \JsonStreamingParser\Listener\GeoJsonListener(function ($category) {
        
        $category_name = Array(
            $category['name']
        );
        $category['id'] = preg_replace('/[^0-9]/', '', $category['id']);
        $id = $order_id = Array(
            $category['id']
        );
        $parent_id = $category['path_from_root'];
        if($parent_id == null){

        }else{
            $parent_id = end($category['path_from_root']);
            $parent_id = prev($category['path_from_root']);
            $parent_id = preg_replace('/[^0-9]/', '', $parent_id['id']);
        }
        $new_category = Array(
            'parent_id' => $parent_id,
            'id' => $id,
            'order_id' => $order_id,
            'name' => $category_name,
            'full_name' => $category_name
        );
        try {
            $categoriesService = new Service\Table\Relational\Categories();
            $categoriesService->save($new_category);
            $parent_id = $id;//['id'];
            //Saving all children categories of that specific category
            for($ii = 0; $ii < count($category['children_categories']); $ii++){
                $this->SaveChildrenCategory($parent_id, $category['children_categories'][$ii], $categoriesService);
            }
        }catch(Exception $e){
            echo $e;
        }
    });
    $stream = fopen($testfile, 'r');
    try {
        $parser = new \JsonStreamingParser\Parser($stream, $listener);
        $parser->parse();
        fclose($stream);
    } catch (Exception $e) {
        fclose($stream);
        throw $e;
    }

    $controller = 'Mercadolibre';
    $headline = $this->_('MeliSync');
    $filter = 'first_time';
    return array(
        'controller' => $controller,
        'headline'   => $headline,
        'messages'   => $this->_flashMessenger->getMessages(),
        'filter'     => $filter
    );

}
或以人类可读的方式:

{
   "MLMXXXXX":{...},
   "MLMXXXY":{...},
    ...
}
尽管如此,当我调用该函数时,在保存了3552之后,它会被卡住。我还了解到GeoJsonListener在内存中加载JSON。我的问题是如何创建一个单独加载每个对象的侦听器,而不是在内存中加载整个JSON

以下是第3552项的输出:

{ id: 'MLM45922',
  name: 'Mitsubishi',
  picture: null,
  permalink: null,
  total_items_in_this_category: 39,
  path_from_root: 
   [ { id: 'MLM1747', name: 'Accesorios para Vehículos' },
     { id: 'MLM179617', name: 'Tuning y Performance' },
     { id: 'MLM179724', name: 'Performance' },
     { id: 'MLM4859', name: 'Filtros Alto Flujo' },
     { id: 'MLM45922', name: 'Mitsubishi' } ],
  children_categories: [],
  attribute_types: 'attributes',
  settings: 
   { adult_content: false,
     buying_allowed: true,
     buying_modes: [ 'buy_it_now', 'auction' ],
     catalog_domain: null,
     coverage_areas: 'not_allowed',
     currencies: [ 'USD', 'MXN' ],
     fragile: false,
     immediate_payment: 'required',
     item_conditions: [ 'used', 'not_specified', 'new' ],
     items_reviews_allowed: false,
     listing_allowed: true,
     max_description_length: 50000,
     max_pictures_per_item: 12,
     max_pictures_per_item_var: 10,
     max_sub_title_length: 70,
     max_title_length: 60,
     maximum_price: null,
     minimum_price: null,
     mirror_category: null,
     mirror_master_category: null,
     mirror_slave_categories: [],
     price: 'required',
     reservation_allowed: 'not_allowed',
     restrictions: [],
     rounded_address: false,
     seller_contact: 'not_allowed',
     shipping_modes: [ 'not_specified', 'custom', 'me1', 'me2' ],
     shipping_options: [ 'custom', 'carrier' ],
     shipping_profile: 'optional',
     show_contact_information: false,
     simple_shipping: 'optional',
     stock: 'required',
     sub_vertical: null,
     subscribable: false,
     tags: [],
     vertical: null,
     vip_subdomain: 'articulo' },
  meta_categ_id: null,
  attributable: false }

GeoJSONListener正是您想要做的——它将对象的第二级保留在内存中。通过这种方式,它会自动加载文件中的每个MLM对象,而不会将整个文件加载到内存中

测试您引用的文件中包含的代码(并将内存限制减少到32M,因为流式解析器不需要4G内存),它解析整个文件,在我取消该过程之前,在旧Macbook上大约10分钟内读取27200个对象


这让我相信这个问题与JSON解析器或解析文件的方式无关,可能是由其他原因引起的(比如你的主机/网络服务器不遵守
设置时间限制的要求,或者你的数据库层对某些内容进行锁定或呕吐。

你能转储3552文档吗?哦,我明白了!我需要验证子类是否不为空。我真是太傻了。我会用此更改的结果更新帖子。谢谢!尽管这仍然是一个错误。)我很高兴知道如何让听众不把整件事都记在记忆库里等待你的答案!然后我会检查还有什么可能导致卡在记忆库中
{ id: 'MLM45922',
  name: 'Mitsubishi',
  picture: null,
  permalink: null,
  total_items_in_this_category: 39,
  path_from_root: 
   [ { id: 'MLM1747', name: 'Accesorios para Vehículos' },
     { id: 'MLM179617', name: 'Tuning y Performance' },
     { id: 'MLM179724', name: 'Performance' },
     { id: 'MLM4859', name: 'Filtros Alto Flujo' },
     { id: 'MLM45922', name: 'Mitsubishi' } ],
  children_categories: [],
  attribute_types: 'attributes',
  settings: 
   { adult_content: false,
     buying_allowed: true,
     buying_modes: [ 'buy_it_now', 'auction' ],
     catalog_domain: null,
     coverage_areas: 'not_allowed',
     currencies: [ 'USD', 'MXN' ],
     fragile: false,
     immediate_payment: 'required',
     item_conditions: [ 'used', 'not_specified', 'new' ],
     items_reviews_allowed: false,
     listing_allowed: true,
     max_description_length: 50000,
     max_pictures_per_item: 12,
     max_pictures_per_item_var: 10,
     max_sub_title_length: 70,
     max_title_length: 60,
     maximum_price: null,
     minimum_price: null,
     mirror_category: null,
     mirror_master_category: null,
     mirror_slave_categories: [],
     price: 'required',
     reservation_allowed: 'not_allowed',
     restrictions: [],
     rounded_address: false,
     seller_contact: 'not_allowed',
     shipping_modes: [ 'not_specified', 'custom', 'me1', 'me2' ],
     shipping_options: [ 'custom', 'carrier' ],
     shipping_profile: 'optional',
     show_contact_information: false,
     simple_shipping: 'optional',
     stock: 'required',
     sub_vertical: null,
     subscribable: false,
     tags: [],
     vertical: null,
     vip_subdomain: 'articulo' },
  meta_categ_id: null,
  attributable: false }