Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/hadoop/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Regex 配置单元序列中特定分隔符字符串的正则表达式_Regex_Hadoop_Hive_Hive Serde - Fatal编程技术网

Regex 配置单元序列中特定分隔符字符串的正则表达式

Regex 配置单元序列中特定分隔符字符串的正则表达式,regex,hadoop,hive,hive-serde,Regex,Hadoop,Hive,Hive Serde,我使用serde读取带有分隔符的特定格式的数据| 我的一行数据可能如下所示:key1=value2 | key2=value2 | key3=“va,lues”,我创建的配置单元表如下所示: CREATE EXTERNAL TABLE( field1 STRING, field2 STRING, field3 STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES (

我使用serde读取带有分隔符的特定格式的数据|

我的一行数据可能如下所示:key1=value2 | key2=value2 | key3=“va,lues”,我创建的配置单元表如下所示:

CREATE EXTERNAL TABLE(
field1 STRING,
field2 STRING,
field3 STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
  "input.regex" = "([^\\|]*)\\|([^\\|]*)\\|([^\\|]*)",
  "output.format.string" = "%1$s %2$s %3$s"
)
STORED AS TEXTFILE;
我需要提取所有值,忽略所有配额(如果存在)。 结果看起来像是一个错误

 value2  value2 va , lues

如何更改提取值的当前regexp?

我目前可以提供两个选项,没有一个是完美的。
顺便说一句,
“output.format.string”
已经过时,没有任何效果

1.
2.


给定输入的当前输出结果是什么?key1=value2key2=value2key3=“va,lues”,所以只需更改以下内容:
“input.regex”=“[^\\\\\\\\\\\=]*=”?([^\\\\\\\\\\]*=”?([^\\\\\\\\\\\\\\\\\\\\\=]*=”?([^\\\\\\\\\\\\\\\\\\\\\\\\]*=”?”,
用例似乎很奇怪。为什么不需要这些键?@horcrux-
|
可能包含在一个带引号的值中
create external table mytable
(
    q1          string    
   ,field1      string
   ,q2          string
   ,field2      string
   ,q3          string
   ,field3      string
)
row format serde 'org.apache.hadoop.hive.serde2.RegexSerDe'
with serdeproperties ('input.regex' = '.*?=(?<q1>"?)(.*?)(?:\\k<q1>)\\|.*?=(?<q2>"?)(.*?)(?:\\k<q2>)\\|.*?=(?<q3>"?)(.*?)(?:\\k<q3>)')
stored as textfile
;
select * from mytable
;
+----+--------+----+--------+----+-----------+
| q1 | field1 | q2 | field2 | q3 |  field3   |
+----+--------+----+--------+----+-----------+
|    | value2 |    | value2 | "  | va , lues |
+----+--------+----+--------+----+-----------+
create external table mytable
(
    field1 string
   ,field2 string
   ,field3 string
)
row format serde 'org.apache.hadoop.hive.serde2.RegexSerDe'
with serdeproperties ('input.regex' = '.*?=(".*?"|.*?)\\|.*?=(".*?"|.*?)\\|.*?=(".*?"|.*?)')
stored as textfile
;
select * from mytable
;
+--------+--------+-------------+
| field1 | field2 |   field3    |
+--------+--------+-------------+
| value2 | value2 | "va , lues" |
+--------+--------+-------------+