Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/json/13.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/reactjs/25.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 从不同的JSON文件创建语料库_Python_Json_Loops_Pandas - Fatal编程技术网

Python 从不同的JSON文件创建语料库

Python 从不同的JSON文件创建语料库,python,json,loops,pandas,Python,Json,Loops,Pandas,我想创建一个语料库,由以JSON格式存储的不同文章组成。它们位于以年份命名的不同文件中,例如: with open('Scot_2005.json') as f: data = [json.loads(line) for line in f] 与2005年的《苏格兰人》报纸相对应。此外,本报的其他文件名为:APJ_2006APJ2015。也。我还有另一份报纸《苏格兰每日邮报》,它只刊登在2014-1015年间:SDM_2014,SDM_2015。我想创建一个包含所有这些文章的通用列表

我想创建一个语料库,由以JSON格式存储的不同文章组成。它们位于以年份命名的不同文件中,例如:

with open('Scot_2005.json') as f:
    data = [json.loads(line) for line in f] 
与2005年的《苏格兰人》报纸相对应。此外,本报的其他文件名为:
APJ_2006
<代码>APJ2015。也。我还有另一份报纸《苏格兰每日邮报》,它只刊登在2014-1015年间:
SDM_2014,SDM_2015
。我想创建一个包含所有这些文章的通用列表:

doc_set = [d['body'] for d in data]
我的问题是循环我发布的代码的第一部分,以便数据对应于所有文章,而不仅仅是某一年某一特定报纸上的文章。关于如何完成这项任务有什么想法吗?在我的尝试中,我尝试使用熊猫,例如:

for i in range(2005,2016):
    df = pandas.DataFrame([json.loads(l) for l in open('Scot_%d.json' % i)])

doc_set = df.body
在我看来,这种方法的问题在于:它并不是所有年份都附加;我不知道如何将2005-15年以外的时间间隔的其他报纸包括在内。此方法的结果如下所示:

date
2015-12-31    The Institute of Directors (IoD) has added its...
2015-12-31    It is startling to see how much the Holyrood l...
2015-12-31    A hike in interest rates in the new year will ...
2015-12-31    The First Minister has resolved to make 2016 a...
2015-12-30    The Scottish Government announced yesterday th...
2015-12-30    The Footsie closed lower amid falling oil pric...
2015-12-28    BEFORE we start the guessing game for 2016, a ...
2015-12-27    AS WE ushered in 2015, few would have predicte...
2015-12-23    No matter how hard Derek McInnes and his Aberd...
2015-12-21    THE HEAD of a Scottish Government task force s...
2015-12-17    A Scottish local authority has fought off a le...
2015-12-17    Markets lifted after the Federal Reserve hiked...
2015-12-17    Significant increases in UK quotas for fish in...
2015-12-17    WAR of words with Donald Trump suggests its ti...
2015-12-16    SCOTLAND'S national performance companies have...
2015-12-15    Markets jumped ahead of what investors expect ...
2015-12-14    Political uncertainty in back seat as transpor...
2015-12-11    The International Monetary Fund (IMF) has warn...
2015-12-08    Scotland has a "spring in its step" with the j...
2015-12-07    London's leading share index struggled for dir...
2015-12-03    REDUCING carbon is just the start of it, write...
2015-11-26    One of the country's most prized salmon rivers...
2015-11-23    Tax and legislative changes undermine strong f...
2015-11-23    A second House of Lords committee has called f...
2015-11-14    At first glance, Scotland's economic performan...
2015-11-13    THE United States has long been viewed as the ...
2015-11-12    IT IS vital for a new governance group to rest...
2015-11-12    Former SSE chief Ian Marchant has criticised r...
2015-11-11    Telecoms firm TalkTalk said it will take a hit...
2015-11-09    Improvements to consumer rights legislation ma...
                                    ...                        
2015-02-25    Traders baulked at an assault on the 7,000 lev...
2015-02-24    BRITISH military personnel are to be deployed ...
2015-02-20    DAVID Cameron has announced a £859 million inv...
2015-02-16    Falling oil prices and slowing inflation have ...
2015-02-14    DEFENCE spending cuts and falling oil prices h...
2015-02-14    Brent crude rallied to a 2015 high and helped ...
2015-02-12    THE HOUSING markets in Scotland and Northern I...
2015-02-10    INVESTMENT in Scotland's commercial property m...
2015-02-09    Investors took flight after Greece's new gover...
2015-02-01    Experts say large numbers are delaying decisio...
2015-01-29    MORE than 300 jobs are at risk after Tesco sai...
2015-01-27    THE Three Bears have hit out at the Rangers bo...
2015-01-21    GEORGE Osborne has challenged the right of SNP...
2015-01-19    Employment figures this week should show Briti...
2015-01-19    Why haven't petrol pump prices fallen as fast ...
2015-01-18    Without an agreement on immediate action,  the...
2015-01-17    A SECOND independence referendum could be trig...
2015-01-14    THE RETAILER, which like its rivals has come u...
2015-01-14    HOUSE prices in Scotland rose by more than 4 p...
2015-01-13    HOUSE builder Taylor Wimpey is preparing for a...
2015-01-13    Supermarket group Sainsbury's today said it wo...
2015-01-13    INFLATION has tumbled to its lowest level on r...
2015-01-12    BUSINESSES are bullish about their ­prospects ...
2015-01-11    FOR decades, oil has dripped through our natio...
2015-01-09    Shares in the housebuilding sector fell heavil...
2015-01-08    THE Bank of England is expected to leave inter...
2015-01-05    COMPANIES in Scotland are more optimistic abou...
2015-01-04    UK is doing OK, but uncertainty looms on mid-y...
2015-01-02    The London market began the new year in a subd...
2015-01-02    The famous election mantra of Bill Clinton's c...
Name: body, dtype: object

假设您有一个文件列表:

file_name_list = ( 'Scot_2005.json', 'APJ_2006.json' )
您可以将附加到如下列表中:

data = list()
for file_name in file_name_list:
    with open(file_name, 'r') as json_file:
        for line in json_file:
            data.append(json.loads(line))

如果您想以编程方式创建
文件名\u列表
,您可以使用该库。

那么,您尝试这样做的原因是什么?有什么问题吗?我没有看到任何试图循环使用报纸名称或年份的尝试。也许可以试试吗?@jornshape,我刚刚更新了这个问题,你可以看到,通过使用Pandas,我无法生成列表。不,你会得到一个数据帧,这正是你想要的。怎么了?!然后我会产生一个列表?问题在于整合另一家报纸,例如2014-15年的《苏格兰每日邮报》。