Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/67.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 提取两个子字符串之间匹配的字符串部分_Python_R_Perl_Pattern Matching_Substring - Fatal编程技术网

Python 提取两个子字符串之间匹配的字符串部分

Python 提取两个子字符串之间匹配的字符串部分,python,r,perl,pattern-matching,substring,Python,R,Perl,Pattern Matching,Substring,我有三个包含一组字符串的文件。File1和File2包含File3的子字符串。我想从File3中减去位于File1和File2中的子字符串之间的字符串。请参见下面的示例: 文件1(子字符串1): 文件2(子字符串2) 文件3 例如: String in File1 String in File2 AGGGCUUAGCUGCUU

我有三个包含一组字符串的文件。File1和File2包含File3的子字符串。我想从File3中减去位于File1和File2中的子字符串之间的字符串。请参见下面的示例:

文件1(子字符串1):

文件2(子字符串2)

文件3

例如:

                                 String in File1                       String in  File2
                              AGGGCUUAGCUGCUUGUGAGCA                   UUCACAGUGGCUAAGUUCCGC
String in File3      CUGAGGAGCAGGGCUUAGCUGCUUGUGAGCAGGGUCCACACCAAGUCGUGUUCACAGUGGCUAAGUUCCGCCCCCCAG
此示例的输出:

GGGUCCACACCAAGUCGUG

在R中有一个解决方案:

file1 <- "AGGGCUUAGCUGCUUGUGAGCA"
file2 <- "UUCACAGUGGCUAAGUUCCGC"
file3 <- "CUGAGGAGCAGGGCUUAGCUGCUUGUGAGCAGGGUCCACACCAAGUCGUGUUCACAGUGGCUAAGUUCCGCCCCCCAG"

# create a regular expression
pattern <- paste0(".*", file1, "(.*)", file2, ".*")

# extract the substring
sub(pattern, "\\1", file3)
# [1] "GGGUCCACACCAAGUCGUG"
python中的
file1

>>> a='AGGGCUUAGCUGCUUGUGAGCA'
>>> b='UUCACAGUGGCUAAGUUCCGC'
>>> c='CUGAGGAGCAGGGCUUAGCUGCUUGUGAGCAGGGUCCACACCAAGUCGUGUUCACAGUGGCUAAGUUCCGCCCCCCAG'
>>> regex = a + '(.*?)' + b
>>> regex
'AGGGCUUAGCUGCUUGUGAGCA(.*?)UUCACAGUGGCUAAGUUCCGC'
>>> re.findall(regex,c)
['GGGUCCACACCAAGUCGUG']

使用gsubfn中的
Straplyc
尝试此操作。我们假设只有一个
s1
s2
实例,或者如果有多个实例,您希望字符串位于
s1
的第一个实例和
s2
的最后一个实例之间。如果可能有多个实例,并且您想要不同的内容,请将此添加到问题中

s1 <- "AGGGCUUAGCUGCUUGUGAGCA"
s2 <- "UUCACAGUGGCUAAGUUCCGC"
s3 <- "CUGAGGAGCAGGGCUUAGCUGCUUGUGAGCAGGGUCCACACCAAGUCGUGUUCACAGUGGCUAAGUUCCGCCCCCCAG"

library(gsubfn)
fn$strapplyc(s3, "$s1(.*)$s2", simplify = TRUE)
##  [1] "GGGUCCACACCAAGUCGUG"
python中的s1代码
` string1 = "AGGGCUUAGCUGCUUGUGAGCA" string2 = "UUCACAGUGGCUAAGUUCCGC" string_main = "CUGAGGAGCAGGGCUUAGCUGCUUGUGAGCAGGGUCCACACCAAGUCGUGUUCACAGUGGCUAAGUUCCGCCCCCCAG" print string_main[string_main.find(string1)+len(string1):string_main.find(string2)] string1=“agggcuagcuugca” string2=“uucacagugggcuaaguuccgc” string_main=“cugaggagggcuagcuugcuugugugugguccacacacagugugugugugucagugcuagagugcuagugcuagugucccccag”
打印string\u main[string\u main.find(string1)+len(string1):string\u main.find(string2)]

在Perl中,您可以尝试以下代码:

use strict;
use warnings;

my $file1 = "AGGGCUUAGCUGCUUGUGAGCA";
my $file2 = "UUCACAGUGGCUAAGUUCCGC";
my $file3 = "CUGAGGAGCAGGGCUUAGCUGCUUGUGAGCAGGGUCCACACCAAGUCGUGUUCACAGUGGCUAAGUUCCGCCCCCCAG";

my ($result) = $file3 =~ /$file1(.*?)$file2/;

print $result;
产出:

GGGUCCACACCAAGUCGUG

根据您给定的输入,以下操作将起作用

f1 <- "AGGGCUUAGCUGCUUGUGAGCA"
f2 <- "UUCACAGUGGCUAAGUUCCGC"
f3 <- "CUGAGGAGCAGGGCUUAGCUGCUUGUGAGCAGGGUCCACACCAAGUCGUGUUCACAGUGGCUAAGUUCCGCCCCCCAG"
strsplit(f3, paste(f1, f2, sep='|'))[[1]][2]
# [1] "GGGUCCACACCAAGUCGUG"

f1在R中使用
qdapRegex

f1 <- "AGGGCUUAGCUGCUUGUGAGCA"
f2 <- "UUCACAGUGGCUAAGUUCCGC"
f3 <- "CUGAGGAGCAGGGCUUAGCUGCUUGUGAGCAGGGUCCACACCAAGUCGUGUUCACAGUGGCUAAGUUCCGCCCCCCAG"

library(qdapRegex)
rm_between(f3, f1, f2, extract=TRUE)

## [[1]]
## [1] "GGGUCCACACCAAGUCGUG"

f1这两个子字符串是什么?将您的代码放在这里,然后我们可以查看您正面临问题的位置。我已编辑了我的问题。我在文件1、2和3中有多个字符串。@user3741035是否要使用文件1和文件2中所有字符串的组合?
GGGUCCACACCAAGUCGUG
f1 <- "AGGGCUUAGCUGCUUGUGAGCA"
f2 <- "UUCACAGUGGCUAAGUUCCGC"
f3 <- "CUGAGGAGCAGGGCUUAGCUGCUUGUGAGCAGGGUCCACACCAAGUCGUGUUCACAGUGGCUAAGUUCCGCCCCCCAG"
strsplit(f3, paste(f1, f2, sep='|'))[[1]][2]
# [1] "GGGUCCACACCAAGUCGUG"
f1 <- "AGGGCUUAGCUGCUUGUGAGCA"
f2 <- "UUCACAGUGGCUAAGUUCCGC"
f3 <- "CUGAGGAGCAGGGCUUAGCUGCUUGUGAGCAGGGUCCACACCAAGUCGUGUUCACAGUGGCUAAGUUCCGCCCCCCAG"

library(qdapRegex)
rm_between(f3, f1, f2, extract=TRUE)

## [[1]]
## [1] "GGGUCCACACCAAGUCGUG"