使用salsify';用PHP解析大型(100MB)JSON;PHPProBid中的JsonStreamerParser
我有一个JSON文件:使用salsify';用PHP解析大型(100MB)JSON;PHPProBid中的JsonStreamerParser,php,large-data,large-files,jsonstream,Php,Large Data,Large Files,Jsonstream,我有一个JSON文件: curl https://api.mercadolibre.com/sites/MLM/categories/all > categoriesMLM.gz 其中包含一个对象对象(约60000+)。我还安装了via composer,将每个项目保存到PHPProBid中的数据库中,如以下代码所示: public function SaveCategoryByStream(){ require_once dirname(__FILE__).'/../../.
curl https://api.mercadolibre.com/sites/MLM/categories/all > categoriesMLM.gz
其中包含一个对象对象(约60000+)。我还安装了via composer,将每个项目保存到PHPProBid中的数据库中,如以下代码所示:
public function SaveCategoryByStream(){
require_once dirname(__FILE__).'/../../../../../../vendor/autoload.php';
ini_set('memory_limit', '4024M');
ini_set('max_execution_time', 0);
$testfile = '/home/richi/Desktop/categoriesMLM.json';
$listener = new \JsonStreamingParser\Listener\GeoJsonListener(function ($category) {
$category_name = Array(
$category['name']
);
$category['id'] = preg_replace('/[^0-9]/', '', $category['id']);
$id = $order_id = Array(
$category['id']
);
$parent_id = $category['path_from_root'];
if($parent_id == null){
}else{
$parent_id = end($category['path_from_root']);
$parent_id = prev($category['path_from_root']);
$parent_id = preg_replace('/[^0-9]/', '', $parent_id['id']);
}
$new_category = Array(
'parent_id' => $parent_id,
'id' => $id,
'order_id' => $order_id,
'name' => $category_name,
'full_name' => $category_name
);
try {
$categoriesService = new Service\Table\Relational\Categories();
$categoriesService->save($new_category);
$parent_id = $id;//['id'];
//Saving all children categories of that specific category
for($ii = 0; $ii < count($category['children_categories']); $ii++){
$this->SaveChildrenCategory($parent_id, $category['children_categories'][$ii], $categoriesService);
}
}catch(Exception $e){
echo $e;
}
});
$stream = fopen($testfile, 'r');
try {
$parser = new \JsonStreamingParser\Parser($stream, $listener);
$parser->parse();
fclose($stream);
} catch (Exception $e) {
fclose($stream);
throw $e;
}
$controller = 'Mercadolibre';
$headline = $this->_('MeliSync');
$filter = 'first_time';
return array(
'controller' => $controller,
'headline' => $headline,
'messages' => $this->_flashMessenger->getMessages(),
'filter' => $filter
);
}
或以人类可读的方式:
{
"MLMXXXXX":{...},
"MLMXXXY":{...},
...
}
尽管如此,当我调用该函数时,在保存了3552之后,它会被卡住。我还了解到GeoJsonListener在内存中加载JSON。我的问题是如何创建一个单独加载每个对象的侦听器,而不是在内存中加载整个JSON
以下是第3552项的输出:
{ id: 'MLM45922',
name: 'Mitsubishi',
picture: null,
permalink: null,
total_items_in_this_category: 39,
path_from_root:
[ { id: 'MLM1747', name: 'Accesorios para Vehículos' },
{ id: 'MLM179617', name: 'Tuning y Performance' },
{ id: 'MLM179724', name: 'Performance' },
{ id: 'MLM4859', name: 'Filtros Alto Flujo' },
{ id: 'MLM45922', name: 'Mitsubishi' } ],
children_categories: [],
attribute_types: 'attributes',
settings:
{ adult_content: false,
buying_allowed: true,
buying_modes: [ 'buy_it_now', 'auction' ],
catalog_domain: null,
coverage_areas: 'not_allowed',
currencies: [ 'USD', 'MXN' ],
fragile: false,
immediate_payment: 'required',
item_conditions: [ 'used', 'not_specified', 'new' ],
items_reviews_allowed: false,
listing_allowed: true,
max_description_length: 50000,
max_pictures_per_item: 12,
max_pictures_per_item_var: 10,
max_sub_title_length: 70,
max_title_length: 60,
maximum_price: null,
minimum_price: null,
mirror_category: null,
mirror_master_category: null,
mirror_slave_categories: [],
price: 'required',
reservation_allowed: 'not_allowed',
restrictions: [],
rounded_address: false,
seller_contact: 'not_allowed',
shipping_modes: [ 'not_specified', 'custom', 'me1', 'me2' ],
shipping_options: [ 'custom', 'carrier' ],
shipping_profile: 'optional',
show_contact_information: false,
simple_shipping: 'optional',
stock: 'required',
sub_vertical: null,
subscribable: false,
tags: [],
vertical: null,
vip_subdomain: 'articulo' },
meta_categ_id: null,
attributable: false }
GeoJSONListener正是您想要做的——它将对象的第二级保留在内存中。通过这种方式,它会自动加载文件中的每个MLM对象,而不会将整个文件加载到内存中 测试您引用的文件中包含的代码(并将内存限制减少到32M,因为流式解析器不需要4G内存),它解析整个文件,在我取消该过程之前,在旧Macbook上大约10分钟内读取27200个对象
这让我相信这个问题与JSON解析器或解析文件的方式无关,可能是由其他原因引起的(比如你的主机/网络服务器不遵守
设置时间限制的要求,或者你的数据库层对某些内容进行锁定或呕吐。你能转储3552文档吗?哦,我明白了!我需要验证子类是否不为空。我真是太傻了。我会用此更改的结果更新帖子。谢谢!尽管这仍然是一个错误。)我很高兴知道如何让听众不把整件事都记在记忆库里等待你的答案!然后我会检查还有什么可能导致卡在记忆库中
{ id: 'MLM45922',
name: 'Mitsubishi',
picture: null,
permalink: null,
total_items_in_this_category: 39,
path_from_root:
[ { id: 'MLM1747', name: 'Accesorios para Vehículos' },
{ id: 'MLM179617', name: 'Tuning y Performance' },
{ id: 'MLM179724', name: 'Performance' },
{ id: 'MLM4859', name: 'Filtros Alto Flujo' },
{ id: 'MLM45922', name: 'Mitsubishi' } ],
children_categories: [],
attribute_types: 'attributes',
settings:
{ adult_content: false,
buying_allowed: true,
buying_modes: [ 'buy_it_now', 'auction' ],
catalog_domain: null,
coverage_areas: 'not_allowed',
currencies: [ 'USD', 'MXN' ],
fragile: false,
immediate_payment: 'required',
item_conditions: [ 'used', 'not_specified', 'new' ],
items_reviews_allowed: false,
listing_allowed: true,
max_description_length: 50000,
max_pictures_per_item: 12,
max_pictures_per_item_var: 10,
max_sub_title_length: 70,
max_title_length: 60,
maximum_price: null,
minimum_price: null,
mirror_category: null,
mirror_master_category: null,
mirror_slave_categories: [],
price: 'required',
reservation_allowed: 'not_allowed',
restrictions: [],
rounded_address: false,
seller_contact: 'not_allowed',
shipping_modes: [ 'not_specified', 'custom', 'me1', 'me2' ],
shipping_options: [ 'custom', 'carrier' ],
shipping_profile: 'optional',
show_contact_information: false,
simple_shipping: 'optional',
stock: 'required',
sub_vertical: null,
subscribable: false,
tags: [],
vertical: null,
vip_subdomain: 'articulo' },
meta_categ_id: null,
attributable: false }