合并两个文本文件的最简单脚本方法-Ruby、Python、JavaScript、Java？_Java_Javascript_Python_Ruby_Scripting

合并两个文本文件的最简单脚本方法-Ruby、Python、JavaScript、Java？

java javascript python ruby scripting

合并两个文本文件的最简单脚本方法-Ruby、Python、JavaScript、Java？,java,javascript,python,ruby,scripting,Java,Javascript,Python,Ruby,Scripting,我有两个文本文件，一个包含HTML，另一个包含URL段塞：文件1（HTML）： <li><a href="/article/"><button class="showarticle"/><span class="author">Thomas Friedman</span> - <span class="title">The World Is Flat</span></a></li> &l

我有两个文本文件，一个包含HTML，另一个包含URL段塞：

文件1（HTML）：

<li><a href="/article/"><button class="showarticle"/><span class="author">Thomas Friedman</span> - <span class="title">The World Is Flat</span></a></li>
<li><a href="/article/"><button class="showarticle"/><span class="author">Michael Dagleish</span> - <span class="title">Scotland In Wartime</span></a></li>
<li><a href="/article/"><button class="showarticle"/><span class="author">Dr. Raymond Kinsella</span> - <span class="title">Progress In Cancer Treatments</span></a></li>
...

thomas-friedman-the-world-is-flat
michael-dagleish-scotland-in-wartime
dr-raymond-kinsella-progress-in-cancer-treatments
...

<li><a href="/article/thomas-friedman-the-world-is-flat"><button class="showarticle"/><span class="author">Thomas Friedman</span> - <span class="title">The World Is Flat</span></a></li>
<li><a href="/article/michael-dagleish-scotland-in-wartime"><button class="showarticle"/><span class="author">Michael Dagleish</span> - <span class="title">Scotland In Wartime</span></a></li>
<li><a href="/article/dr-raymond-kinsella-progress-in-cancer-treatments"><button class="showarticle"/><span class="author">Dr. Raymond Kinsella</span> - <span class="title">Progress In Cancer Treatments</span></a></li>

我需要合并它们，以便将文件2中的段塞插入到文件1中的HTML中，如下所示：

f1 = File.readlines('file1.txt')
f2 = File.readlines('file2.txt')

File.open('file3.txt','w') do |output_file|

    f1.zip(f2) do |a,b|
        output_file.puts a.sub('/article/','/article/'+b)
    end

end

输出：

<li><a href="/article/"><button class="showarticle"/><span class="author">Thomas Friedman</span> - <span class="title">The World Is Flat</span></a></li>
<li><a href="/article/"><button class="showarticle"/><span class="author">Michael Dagleish</span> - <span class="title">Scotland In Wartime</span></a></li>
<li><a href="/article/"><button class="showarticle"/><span class="author">Dr. Raymond Kinsella</span> - <span class="title">Progress In Cancer Treatments</span></a></li>
...

thomas-friedman-the-world-is-flat
michael-dagleish-scotland-in-wartime
dr-raymond-kinsella-progress-in-cancer-treatments
...

<li><a href="/article/thomas-friedman-the-world-is-flat"><button class="showarticle"/><span class="author">Thomas Friedman</span> - <span class="title">The World Is Flat</span></a></li>
<li><a href="/article/michael-dagleish-scotland-in-wartime"><button class="showarticle"/><span class="author">Michael Dagleish</span> - <span class="title">Scotland In Wartime</span></a></li>
<li><a href="/article/dr-raymond-kinsella-progress-in-cancer-treatments"><button class="showarticle"/><span class="author">Dr. Raymond Kinsella</span> - <span class="title">Progress In Cancer Treatments</span></a></li>

最好的方法是什么？哪种语言最适合以最小的复杂性完成此任务？

您需要zip函数，这在大多数语言中都可用。其目的是并行处理两个或多个阵列。
在Ruby中，它将是这样的：

f1 = File.readlines('file1.txt')
f2 = File.readlines('file2.txt')

File.open('file3.txt','w') do |output_file|

    f1.zip(f2) do |a,b|
        output_file.puts a.sub('/article/','/article/'+b)
    end

end

要压缩两个以上的数组，您可以使用f1。zip（f2，f3，…）执行a，b，c，…最简单的方法是使用您最熟悉的所列数组的语言。即使它不能产生最整洁的解决方案，你也会用最少的（精神上的）努力完成工作

如果您都不知道，那么Perl是一个不错的选择，因为这是它设计用来做的事情。（我假设您理解正则表达式…）从其他一些答案来看，Python也是一个不错的选择。

这在任何语言中都很容易。这里是伪Python；我省略了

lxml

位，因为我无法访问它们，也记不清语法。不过，这并不难

with open(...) as htmls, open(...) as slugs, open(...) as output:
    for html, slug in zip(htmls, slugs):
        root = lxml.etree.fromstring(html)
        # do some fiddling with lxml to get the name

        slug = slug.split("-")[(len(name.split()):]
        # add in the extra child in lxml

        output.write(root.tostring())

有趣的特点：

这不会一次读取整个文件；它一块一块地执行（好的，一行一行，但是Python会缓冲它）。如果文件很大，但可能不相关，则很有用
```
lxml
```
可能有些过分，这取决于html字符串的格式有多严格。如果保证它们是相同的并且格式良好，那么使用简单的字符串操作可能会更容易。另一方面，
```
lxml
```
速度非常快，提供了更多的灵活性

$firstFile = file('file1.txt');
$secodFile = file('file2.txt');

$findKey='/article/';
$output='';

if (count($firstFile)==count($secodFile)) 
                    or die('record counts dont match');

for($i=0;$i<count($firstFile);$i++)
{
    $output.=str_replace($findKey,$findKey.trim($secodFile[$i]),$firstFile[$i]);
}

file_put_contents('output.txt',$output);

$firstFile=file（'file1.txt'）；
$secodFile=file（'file2.txt'）；
$findKey='/article/'；
$output=''；
if（count（$firstFile）==count（$secodFile））
或死亡（“记录计数不匹配”）；
对于（$i=0；$iPython是一种很棒的语言
看看这六行python代码
他们可以合并任何大的文本文件，刚才我合并了2个文本文件，每个10 GB
 o = open("E:/temp/3.txt","wb") #open for write
 for line in open("E:/temp/1.txt","rb"):
     o.write(line)
 for line in open("E:/temp/2.txt","rb"):
     o.write(line)
 o.close()

Ruby one liner：
File.open("joined.txt","w") { |f| f.puts ['file1.txt', 'file2.txt'].map{ |s| IO.read(s) }}

我会用perl，但这只是因为我有10多年的perl经验。文件的顺序保证是一样的吗？我会用perl，但这只是因为perl对这个问题非常有用。我有10分钟以上的perl经验。@Katrielex:问得好。是的，文件是一行一行地匹配的。不，我不知道正则表达式@Sven Marnach，当然，我发帖时忘了说）我已经更新了答案。太好了！谢谢这是美丽的。你可以做f1.zip（f2，f3，…）在红宝石！死亡死亡优秀的异常处理）您测试过这段代码吗？它看起来相当混乱，使用die
是非常不必要的。解释为什么不建议使用die（）。