Hadoop 在PIG中,是否可以通过定义列字段值来创建列字段

Hadoop 在PIG中,是否可以通过定义列字段值来创建列字段,hadoop,hive,apache-pig,hiveql,Hadoop,Hive,Apache Pig,Hiveql,假设我有下面的结构化数据文件 1298712061228765236542123049824234209374 1203972012073042198531203948203498023498023 1203712012092329385612350924395798456892345 12348120121012234989230482034893204820398 在上面的文件中,前6位是(1-6)中的用户id,下8位是(7-12)中的年份日期,下6列是(13-18)中的计数字段,同样地,对

假设我有下面的结构化数据文件

1298712061228765236542123049824234209374 1203972012073042198531203948203498023498023 1203712012092329385612350924395798456892345 12348120121012234989230482034893204820398

在上面的文件中,前6位是(1-6)中的用户id,下8位是(7-12)中的年份日期,下6列是(13-18)中的计数字段,同样地,对于上面的平面文件,我有(19-30)中的产品id和(31-42)中的字符值列,所以我希望我的数据采用以下格式。我的意思是,我想用这个字段加载我的数据。在PIG或HIVE中是否有此选项

您能使用子字符串吗

A = LOAD 'DATA' USING PigStorage() AS (line); 
B = FOREACH A GENERATE SUBSTRING(line,1,6) AS UserID, SUBSTRING(line,7,12) AS Year_date ...
你会使用子字符串吗

A = LOAD 'DATA' USING PigStorage() AS (line); 
B = FOREACH A GENERATE SUBSTRING(line,1,6) AS UserID, SUBSTRING(line,7,12) AS Year_date ...

你可以在猪和蜂箱中使用它。以下是两种解决方案
清管器:

data = LOAD '/data.txt' USING PigStorage() AS (line);
strsplit = FOREACH data GENERATE 
SUBSTRING(line,1,6) AS UserID,
SUBSTRING(line,7,12) AS year_date,
SUBSTRING(line,13,18) AS Count,
SUBSTRING(line,19,30) AS product_id,
SUBSTRING(line,31,42) AS Character_values;  
当您转储时:
排土桩; (298712987129871)
(203972039720397203972039720397)
(203712037120371203712037120371)
(234812348123481234812348123481)

蜂巢:

data = LOAD '/data.txt' USING PigStorage() AS (line);
strsplit = FOREACH data GENERATE 
SUBSTRING(line,1,6) AS UserID,
SUBSTRING(line,7,12) AS year_date,
SUBSTRING(line,13,18) AS Count,
SUBSTRING(line,19,30) AS product_id,
SUBSTRING(line,31,42) AS Character_values;  
步骤1:创建临时表并加载原始数据

create table temp(line String)
ROW FORMAT DELIMITED
LINES TERMINATED BY '\n';
LOAD DATA INPATH '/data.txt' INTO TABLE temp;  
步骤2:创建一个适合您的数据的表格

   create table user(UserID String,year_date String,Count String,product_id String,Character_values String)
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ','
    LINES TERMINATED BY '\n'; 
步骤3:将临时表插入actula表

INSERT INTO TABLE user
SELECT substr(line,0,6),substr(line,7,12),substr(line,13,18),substr(line,19,30),substr(line,31,42)FROM temp;

你可以在猪和蜂箱中使用它。以下是两种解决方案
清管器:

data = LOAD '/data.txt' USING PigStorage() AS (line);
strsplit = FOREACH data GENERATE 
SUBSTRING(line,1,6) AS UserID,
SUBSTRING(line,7,12) AS year_date,
SUBSTRING(line,13,18) AS Count,
SUBSTRING(line,19,30) AS product_id,
SUBSTRING(line,31,42) AS Character_values;  
当您转储时:
排土桩; (298712987129871)
(203972039720397203972039720397)
(203712037120371203712037120371)
(234812348123481234812348123481)

蜂巢:

data = LOAD '/data.txt' USING PigStorage() AS (line);
strsplit = FOREACH data GENERATE 
SUBSTRING(line,1,6) AS UserID,
SUBSTRING(line,7,12) AS year_date,
SUBSTRING(line,13,18) AS Count,
SUBSTRING(line,19,30) AS product_id,
SUBSTRING(line,31,42) AS Character_values;  
步骤1:创建临时表并加载原始数据

create table temp(line String)
ROW FORMAT DELIMITED
LINES TERMINATED BY '\n';
LOAD DATA INPATH '/data.txt' INTO TABLE temp;  
步骤2:创建一个适合您的数据的表格

   create table user(UserID String,year_date String,Count String,product_id String,Character_values String)
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ','
    LINES TERMINATED BY '\n'; 
步骤3:将临时表插入actula表

INSERT INTO TABLE user
SELECT substr(line,0,6),substr(line,7,12),substr(line,13,18),substr(line,19,30),substr(line,31,42)FROM temp;