Python或Bash—在文本文件中迭代所有单词_Python_Bash_Loops_While Loop

Python或Bash—在文本文件中迭代所有单词

python bash loops

Python或Bash—在文本文件中迭代所有单词,python,bash,loops,while-loop,Python,Bash,Loops,While Loop,我有一个包含数千个单词的文本文件，例如： laban labrador labradors lacey lachesis lacy ladoga ladonna lafayette lafitte lagos lagrange lagrangian lahore laius lajos lakeisha lakewood 我想迭代每个单词，这样我就可以： labanlaban labanlabrador labanlabradors labanlacey labanlachesis etc..

我有一个包含数千个单词的文本文件，例如：

laban
labrador
labradors
lacey
lachesis
lacy
ladoga
ladonna
lafayette
lafitte
lagos
lagrange
lagrangian
lahore
laius
lajos
lakeisha
lakewood

我想迭代每个单词，这样我就可以：

labanlaban
labanlabrador
labanlabradors
labanlacey
labanlachesis
etc...

在bash中，我可以执行以下操作，但速度非常慢：

#!/bin/bash
( cat words.txt | while read word1; do
  cat words.txt | while read word2; do
    echo "$word1$word2" >> doublewords.txt
 done; done )

有没有更快、更有效的方法？

另外，如何以这种方式迭代两个不同的文本文件？

如果可以将列表放入内存中：

import itertools

with open(words_filename, 'r') as words_file:
    words = [word.strip() for word in words_file]

for words in itertools.product(words, repeat=2):
    print(''.join(words))

（你也可以做双for循环，但我今晚感觉到了

itertools

）

我怀疑这里的胜利在于我们可以避免反复阅读文件；bash示例中的内部循环将为外部循环的每次迭代生成一个cat文件。此外，我认为Python的执行速度往往比bash、IIRC快

您当然可以使用bash来实现这个技巧（将文件读入数组，编写双for循环），更痛苦的是。

你可以用pythonic的方式来做这件事，方法是创建一个

tempfile

并在读取现有文件时向其写入数据，最后

删除原始文件并将新文件移动到原始文件
import sys
from os import remove
from shutil import move
from tempfile import mkstemp


def data_redundent(source_file_path):
    fh, target_file_path = mkstemp()
    with open(target_file_path, 'w') as target_file:
        with open(source_file_path, 'r') as source_file:
            for line in source_file:
                target_file.write(line.replace('\n', '')+line)
    remove(source_file_path)
    move(target_file_path, source_file_path)

data_redundent('test_data.txt')

似乎sed在每行中添加一个文本非常有效。
我提议：
#!/bin/bash

for word in $(< words.txt)
do 
    sed "s/$/$word/" words.txt;
done > doublewords.txt

您需要了解bash和python都不擅长double-for循环：这就是为什么要使用技巧（@Thanatos）或预定义命令（sed）。最近，我遇到了一个双循环问题（给定一组10000个点的3D，计算所有的对之间的距离），我成功地用C++代替了Python或Matlab。< /P> < P>如果你有GHC，笛卡尔积是同步的！p>
Q1：一个文件
-- words.hs
import Control.Applicative
main = interact f
    where f = unlines . g . words
          g x = map (++) x <*> x

然后使用IO重定向运行：
./words <words.txt >out

Bleh，编译？
想要shell脚本的便利性和编译后的可执行文件的性能吗？为什么不两者都做呢
只需将所需的Haskell程序包装在一个包装器脚本中，该脚本在/var/tmp
中对其进行编译，然后将其自身替换为生成的可执行文件：
#!/bin/bash
# wrapper.sh

cd /var/tmp
cat > c.hs <<CODE
# replace this comment with haskell code
CODE
ghc c.hs >/dev/null
cd - >/dev/null
exec /var/tmp/c "$@"

我不确定这有多高效，但一个非常简单的方法是，使用专门为这类事情设计的Unix工具
paste -d'\0' <file> <file>

粘贴-d'\0'

-d
选项指定要在连接的部分之间使用的分隔符，\0
表示空字符（即完全没有分隔符）。
能否将列表放入内存中？谢谢。这很有效。如果我想迭代两个不同的文本文件而不是同一个文本文件，该怎么办？用和…
行加载两个文件，并使用itertools.product（words1，words2）
而不是itertools.product（words，repeat=2）
。（请参见“[”）感谢您的帮助，但我无法完成此操作。我尝试使用相同的“with”加载两个文件，但效果不太好（例如，将open（'a'，'w'）作为a，将open（'b'，'w'）作为b:
）。尝试使用2个单独的“with”）循环，也没有运气。如果你想打开两个文件进行读取，你需要'r'
作为打开
的第二个参数（'w'
是“写”，并将删除该文件！）谢谢。我会试试你的解决方案。另外，我如何迭代两个不同的文本文件而不是同一个文本文件本身？两个出现的words.txt
是独立的。只需将第一个替换为words1.txt
，将第二个替换为words2.txt
。我认为我在这里做错了什么。这是正确的吗？#！/bin/bash for word in$（words-numbers.txt
@user3552978您的意思是将（
改为（
。除此之外，它看起来是正确的（除了缺少换行符，我认为这是从注释中得出的）.太棒了！两个不同的文本文件怎么样？@user3552978我现在改进了两个不同文本文件的Haskell脚本并更新了结果。这只输出像“labanlaban”这样的双精度值，而不是像“labanlacey”这样的双精度值。（它只是每行加倍；OP似乎需要笛卡尔乘积。）
-- words2.hs
import Control.Applicative
import Control.Monad
import System.Environment
main = do
    ws <- mapM ((liftM words) . readFile) =<< getArgs
    putStrLn $ unlines $ g ws
    where g (x:y:_) = map (++) x <*> y

./words2 words1.txt words2.txt > out

#!/bin/bash
# wrapper.sh

cd /var/tmp
cat > c.hs <<CODE
# replace this comment with haskell code
CODE
ghc c.hs >/dev/null
cd - >/dev/null
exec /var/tmp/c "$@"

$ time ./words2 words1.txt words2.txt >out
3.75s user 0.20s system 98% cpu 4.026 total

$ time ./wrapper.sh words1.txt words2.txt > words2
4.12s user 0.26s system 97% cpu 4.485 total

$ time ./thanatos.py > out
4.93s user 0.11s system 98% cpu 5.124 total

$ time ./styko.sh
7.91s user 0.96s system 74% cpu 11.883 total

$ time ./user3552978.sh
57.16s user 29.17s system 93% cpu 1:31.97 total

paste -d'\0' <file> <file>