Regex如何找到模式?
我需要用正则表达式语法分隔下面的文本。实际上我找到了Regex如何找到模式?,regex,Regex,我需要用正则表达式语法分隔下面的文本。实际上我找到了dddddddd和dddddddd[x]的配方。文本是什么?我需要得到如下值的字符串:“英国应用科学与技术杂志”。如何用正则表达式编写它 337 British Journal of Applied Science & Technology 2231-0843 5 338 British Journal of Economics, Management & Trade 2278-098X 5 339 British Journa
dddddddd
和dddddddd[x]
的配方。文本是什么?我需要得到如下值的字符串:“英国应用科学与技术杂志”
。如何用正则表达式编写它
337 British Journal of Applied Science & Technology 2231-0843 5
338 British Journal of Economics, Management & Trade 2278-098X 5
339 British Journal of Education, Society & Behavioural Science 2278-0998 6
340 British Journal of Environment and Climate Change 2231-4784 5
341 British Journal of Mathematics & Computer Science 2231-0851 4
342 British Journal of Medicine and Medical Research 2231-0614 8
343 British Journal of Pharmaceutical Research 2231-2919 4
344 British Microbiology Research Journal 2231-0886 9
345 Bromatologia i Chemia Toksykologiczna 0365-9445 5
346 Budownictwo Górnicze i Tunelowe 1234-5342 5
347 Budownictwo i Architektura 1899-0665 3
348 Budownictwo, Technologie, Architektura 1644-745X 3
349 Builder 1896-0642 2
350 Built Environment 0263-7960 10
351 Bulgarian Journal of Veterinary Medicine 1311-1477 8
352 Bulgarian Medicine 1314-3387 2
353 Bulletin de la Société des sciences et des lettres de Łódź, Série: Recherches sur les déformations 0459-6854 7
354 Bulletin of Alfred Nobel University. Series "Legal Science" 2226-2873 6
355 Bulletin of Geography. Socio-economic Series 1732-4254 10
356 Bulletin of Geography: Physical Geography Series 2080-7686 9
357 Bulletin of the Polish Academy of Sciences. Mathematics 0239-7269 9
358 Business and Economic Horizons 1804-1205 8
359 Business and Economics Research Journal 1309-2448 10
360 Business Process Management Journal 1463-7154 10
由于您没有指定目标语言或类似的语言,下面介绍如何使用perl实现这一点:
cat test.txt | perl -pe 's/^\d+\s//' | perl -pe 's/[0-9X "-]+$//'
第二个表达式可能需要根据其余数据的外观进行调整
这张照片是:
British Journal of Applied Science & Technology
British Journal of Economics, Management & Trade
British Journal of Education, Society & Behavioural Science
British Journal of Environment and Climate Change
[snip]
Bulletin of the Polish Academy of Sciences. Mathematics
Business and Economic Horizons
Business and Economics Research Journal
Business Process Management Journal
您可以使用使用lookahead和lookahead的表达式执行此操作,如下所示:
(?<=\d{3}\s).*(?=\s\d{4}-)
(?
提取:
British Journal of Applied Science & Technology
British Journal of Economics, Management & Trade
British Journal of Education, Society & Behavioural Science
British Journal of Environment and Climate Change
British Journal of Mathematics & Computer Science
British Journal of Medicine and Medical Research
British Journal of Pharmaceutical Research
[... cut ...]
(?
这会将字符串拆分为3个捕获组:
3位数
任何不包含数字的内容,直到下一个数字
末尾的引用(假设它以4位数字开头,格式一致)
请参见我知道您正在寻找REGEX,但如果您想要更直接的东西,您的文档似乎可以通过简单的字符串操作轻松解析。我为不想使用REGEX的人提供了一个替代方案
String tmp = "340 British Journal of Environment and Climate Change 2231-4784 5";
String ending = tmp.substring(tmp.length() - 11);
tmp = tmp.substring(0, (tmp.length() - 11)); //parse off the ending
StringTokenizer st = new StringTokenizer(tmp, " ");
String index = st.nextToken(); //reads the first int up to the first space.
tmp = tmp.substring(index.length()); //parse front
现在,tmp是日志的名称,索引是前几个字符,末尾的引用保存为结尾。此方法仅适用于假定所有字符串与上面列出的字符串完全相同或在类似范围内的情况。此方法:
(?<=\d\s)\D+(?=\s\d)
如何正确解析它?
PS我用C#(.NET)编写我的应用程序。你用什么程序运行正则表达式?请更清楚地说明你想提取什么。我看不出为什么数字不能超过999,或者更重要的是,不能低于100,以及“法律科学”这个例子可能也会破坏前瞻性。不过,一些小的修正会使这一点起作用。@AndrisLeduskrasts“法律科学”就像一个符咒(见演示)。至于在lookbehind中使用更多的数字,这应该是OP管理的一个简单的修复。哦,好吧,他想要这个标题,抱歉。至于数字,初学者regex用户可能会发现它并不是那么简单,因为lookbehind是固定长度的。这个表达式不适用于包含数字的杂志名称,例如“Perfect 10”,“J-14”
,“Route 66 Magazine”
,当然还有“2600:黑客季刊”
(?
(\d{3})\s([\D]+)(\d{4}-\d{3,4}X?\s\d{1,2})
String tmp = "340 British Journal of Environment and Climate Change 2231-4784 5";
String ending = tmp.substring(tmp.length() - 11);
tmp = tmp.substring(0, (tmp.length() - 11)); //parse off the ending
StringTokenizer st = new StringTokenizer(tmp, " ");
String index = st.nextToken(); //reads the first int up to the first space.
tmp = tmp.substring(index.length()); //parse front
(?<=\d\s)\D+(?=\s\d)
338 British Journal of 5Economics, Management & Trade 2278-098X 5