Java CSV文件模式匹配也匹配分隔符_Java_Regex_Csv

Java CSV文件模式匹配也匹配分隔符

java regex csv

Java CSV文件模式匹配也匹配分隔符,java,regex,csv,Java,Regex,Csv,我使用了这个regex（？：^ |）\s*（？：（？：（？=”）（[^“].*）））（？：（？！”（.*））（？=；|$），它是从我的问题是它还匹配分隔分号，然后我必须手动删除它，这是一种糟糕的样式 String separator = ";"; String patternString = "(?:^|" + separator + ")\\s*(?:(?:(?=\")\"([^\"].*?)\")|(?:(?!\")(.*?)))(?=" + separator + "|$)"; patt

我使用了这个regex

（？：^ |）\s*（？：（？：（？=”）（[^“].*）））（？：（？！”（.*））（？=；|$）

，它是从

我的问题是它还匹配分隔分号，然后我必须手动删除它，这是一种糟糕的样式

String separator = ";";
String patternString = "(?:^|" + separator + ")\\s*(?:(?:(?=\")\"([^\"].*?)\")|(?:(?!\")(.*?)))(?=" + separator + "|$)";
pattern = Pattern.compile(patternString);

Matcher match = pattern.matcher(line);
                        // Find all cells and add them to row.
                        while (match.find())
                        {
                                String cell = match.group();
                                //HACK something with the pattern to match is wrong
                                if(cell.indexOf(";") == 0) cell = cell.substring(1);
                                cell = unescapeCsv(cell);
                                //do something with cell
}

我尝试匹配它，它必须匹配一列（请参见最后一列），该列被引用，并且数据中有分号：

Name;inst_type;Position;Currency;cftype;amount;tenor;fwterm;compoundtype;resetterm;histrates;sdate;edate;fwdate;callput;capfloor;strike;vola;volavalue;ulspot;ulspotvalue;divcurve;cleanflag;fixrate;compfr;dayc;refindex;spread;floatfactor;paystart;payend;annuity;amortizingtype;cgm;resrate;nextrate;isprorated;rolldate;rollday;islongstub;isarrear;payrule;paydays;paycal;resrule;resdays;rescal;isfixcoupon;spreadcurve;cds_spread_value;recovrate;payoutrate;payrec;creditspread;disc;"C""F"
Bond1;;1;EUR;2;100;6M;;;;;01.09.2007;01.09.2010;;;;;;;;;;;0,0625;1;1;;;;0;100;;;1;;;1;01.06.2008;;1;;;;;;;;;;;;;;;IR-EUR;"2;01062008;100;0;0.0625;;01092007;01062008";"2;01122008;100;0;0.0625;;01062008;01122008";"2;01062009;100;0;0.0625;;01122008;01062009";"2;01122009;100;0;0.0625;;01062009;01122009";"2;01092010;100;0;0.0625;;01122009;01092010";"1;01092010;100";
Bond2;;1;EUR;2;100;6M;;;;;01.09.2007;01.09.2010;;;;;;;;;;;0,0625;1;1;;;;0;100;;;-1;;;1;;;;;;;;;;;;;;;;;;IR-EUR;"2;01092010;100;0;0.0625;;01032010;01092010";"2;01032010;100;0;0.0625;;01092009;01032010";"2;01092009;100;0;0.0625;;01032009;01092009";"2;01032009;100;0;0.0625;;01092008;01032009";"2;01092008;100;0;0.0625;;01032008;01092008";"2;01032008;100;0;0.0625;;01092007;01032008";"1;01092010;100";

只需使用普通的CSV解析器：

只需使用普通的CSV解析器即可：

试试这个，它应该可以工作：

（？试试这个，它应该可以工作：（？为什么要这样做？只需使用CSV解析库。或者编写一个不使用正则表达式的简单标记器。您在正则表达式中使用了捕获组。现在需要做的是替换字符串cell=match.group（）
byString cell=match.group（1）
。这将为您提供组1，而不是完整的matchWell。我刚才看到您有两个组，所以这将有点困难…您为什么要对自己这样做？只需使用CSV解析库。或者编写一个不使用正则表达式的简单标记器。您在正则表达式中使用了捕获组。现在需要做的是替换String cell=match、 组（）
byString cell=match.group（1）
。这将为您提供组1，而不是完整的matchWell我刚才看到您有两个组，所以这将有点困难…仍然存在一个问题，“3；01022007；100；0；IR-EUR；01082006；01022007；6M；0；1”；“3；01022007；100；0；IR-EUR；01082006；01022007；01082006；6M；0；2”；“3、 01022007；100；0；IR-EUR；01082006；01022007；01082006；6M；0；3“
只有第一个匹配剪切并将字符串粘贴到regex101编辑器中，它包含破坏模式的隐藏字符：感谢它的工作-错误在数据表示的其他位置仍然存在问题，”3、 01022007；100；0；；IR-EUR；01082006；01022007；01082006；6M；0；1“；“3；01022007；100；0；；IR-EUR；01082006；01022007；01082006；6M；0；2”；“3；01022007；100；0；；IR-EUR；01082006；01022007；01082006；6M；0；3“
只有第一个是匹配的剪切并将字符串粘贴到regex101编辑器中，它包含破坏模式的隐藏字符：感谢它能工作-错误在数据表示的其他地方