Apache pig 减去两列中的值并使用pig获得新列
我根据我的标准筛选了一堆行。现在我需要两列减去的值,然后对它们进行排序。这些是我至今使用的命令Apache pig 减去两列中的值并使用pig获得新列,apache-pig,Apache Pig,我根据我的标准筛选了一堆行。现在我需要两列减去的值,然后对它们进行排序。这些是我至今使用的命令 data = LOAD '/user/imohit01017881/jk/a2dbe50d-c6e5-42e2-8fd0-5386720ce07b_Data.csv' using PigStorage(',') AS (Country:chararray, CountryCode:chararray, Series:chararray, SeriesCode:chararray, yr2000:
data = LOAD '/user/imohit01017881/jk/a2dbe50d-c6e5-42e2-8fd0-5386720ce07b_Data.csv' using PigStorage(',') AS (Country:chararray, CountryCode:chararray, Series:chararray, SeriesCode:chararray, yr2000: float, yr2001:float, yr2002 :float, yr2003 :float, yr2004 :float, yr2005:float, yr2006:float, yr2007:float, yr2008 :float, yr2009 :float, yr2010:float, yr2011: float, yr2012 :float, yr2013 : float, yr2014 : float, yr2015:float);
筛选包含请求数据的行:
data = LOAD '/user/imohit01017881/jk/a2dbe50d-c6e5-42e2-8fd0-5386720ce07b_Data.csv' using PigStorage(',') AS (Country:chararray, CountryCode:chararray, Series:chararray, SeriesCode:chararray, yr2000: float, yr2001:float, yr2002 :float, yr2003 :float, yr2004 :float, yr2005:float, yr2006:float, yr2007:float, yr2008 :float, yr2009 :float, yr2010:float, yr2011: float, yr2012 :float, yr2013 : float, yr2014 : float, yr2015:float);
ggdif = FILTER data by Series == 'Improved sanitation facilities (% of population with access)'
dump data
下面的行给了我一个错误:
data = LOAD '/user/imohit01017881/jk/a2dbe50d-c6e5-42e2-8fd0-5386720ce07b_Data.csv' using PigStorage(',') AS (Country:chararray, CountryCode:chararray, Series:chararray, SeriesCode:chararray, yr2000: float, yr2001:float, yr2002 :float, yr2003 :float, yr2004 :float, yr2005:float, yr2006:float, yr2007:float, yr2008 :float, yr2009 :float, yr2010:float, yr2011: float, yr2012 :float, yr2013 : float, yr2014 : float, yr2015:float);
sub_data = FOREACH ggdif GENERATE SUBTRACT(yr2015, yr2000);
:SUBTRACT将两个包作为参数,并返回由第一个包(不在第二个包中)的元组组成的新包。
如果为null,则包参数将替换为空包
data = LOAD '/user/imohit01017881/jk/a2dbe50d-c6e5-42e2-8fd0-5386720ce07b_Data.csv' using PigStorage(',') AS (Country:chararray, CountryCode:chararray, Series:chararray, SeriesCode:chararray, yr2000: float, yr2001:float, yr2002 :float, yr2003 :float, yr2004 :float, yr2005:float, yr2006:float, yr2007:float, yr2008 :float, yr2009 :float, yr2010:float, yr2011: float, yr2012 :float, yr2013 : float, yr2014 : float, yr2015:float);
sub_data = FOREACH ggdif GENERATE (yr2015 - yr2000);