使用python查找文件中最常见的文本行长度，该文件具有各种行大小_Python_String_Count

使用python查找文件中最常见的文本行长度，该文件具有各种行大小

python string

使用python查找文件中最常见的文本行长度，该文件具有各种行大小,python,string,count,Python,String,Count,所以我有点被困在想这个- 我有一个文件，其中包含许多字符行-一个接一个。它们不是在第段中，而是在以下表格中： xxxxxxx xxx xxxxxxxxxxxx xxx xxxxxxx xxxx xxxxxxxx xxx xxxxxx xxxx xxx 其思想是找出最常见大小（或字符数）的行数。在上面的例子中，答案是-4行我正试图用python实现这一点，因为其余的代码都是用python编写的。任何帮助都将不胜感激使用线长度列表，然后最大化发生次数： with open('file.txt

所以我有点被困在想这个-

我有一个文件，其中包含许多字符行-一个接一个。它们不是在第段中，而是在以下表格中：

xxxxxxx
xxx
xxxxxxxxxxxx
xxx
xxxxxxx
xxxx
xxxxxxxx
xxx
xxxxxx
xxxx
xxx

其思想是找出最常见大小（或字符数）的行数。在上面的例子中，答案是-4行

我正试图用python实现这一点，因为其余的代码都是用python编写的。

任何帮助都将不胜感激

使用线长度列表，然后最大化发生次数：

with open('file.txt') as data:
    length = [len(i) for i in data] # line length
    common = max(length.count(i) for i in length)

使用线长度列表，然后最大化发生次数：

with open('file.txt') as data:
    length = [len(i) for i in data] # line length
    common = max(length.count(i) for i in length)

您可以使用计数器，然后使用计数器的

最常用的方法：
from collections import Counter
with open("a.txt") as f:
    c = Counter(len(line.rstrip("\n")) for line in f)
print(c.most_common(1))

结果:
[(3, 4)]

意思是长度3在4次出现时最常见。
您可以使用计数器，然后使用计数器的最常见方法：
from collections import Counter
with open("a.txt") as f:
    c = Counter(len(line.rstrip("\n")) for line in f)
print(c.most_common(1))

结果:
[(3, 4)]

意思是长度3最常见，出现4次。
以下是获取最常见长度的方法：
with open('file.txt', 'rb') as fin:
    lst = [len(line.strip()) for line in fin]

print max(set(lst), key=lst.count)

以下是获取最常见长度的方法：
with open('file.txt', 'rb') as fin:
    lst = [len(line.strip()) for line in fin]

print max(set(lst), key=lst.count)

好的，从阅读台词开始，你可以采取以下几种方法：
myFile = open(path)
for line in file:
    #do something with 'line'

或许
lines = file.readlines()
for i in range(lines.length):
     #do something

然后，您需要以某种方式存储每行的长度
lengths.append(line.length)

现在，您只需要找到最频繁的长度
frequencies = {}
for length in lengths:
    if length in frequencies: #Check if we already had this length before
        frequencies[length] += 1 #Increment it
    else:
        frequencies[length] = 1 #Add to the list

从集合中找到最大值应该很简单，但只是以防万一：
maximum = 0
for i in frequencies:
    if frequencies[i] > maximum:
        maximum = frequencies[i]
#after this completes, no entry on frequencies is greater than maximum

好的，从阅读台词开始，你可以采取以下几种方法：
myFile = open(path)
for line in file:
    #do something with 'line'

或许
lines = file.readlines()
for i in range(lines.length):
     #do something

然后，您需要以某种方式存储每行的长度
lengths.append(line.length)

现在，您只需要找到最频繁的长度
frequencies = {}
for length in lengths:
    if length in frequencies: #Check if we already had this length before
        frequencies[length] += 1 #Increment it
    else:
        frequencies[length] = 1 #Add to the list

从集合中找到最大值应该很简单，但只是以防万一：
maximum = 0
for i in frequencies:
    if frequencies[i] > maximum:
        maximum = frequencies[i]
#after this completes, no entry on frequencies is greater than maximum

该模块有一个名为will的dictionary子类，可用于跟踪遇到的每一行的长度
这使得解决这个问题非常容易。如果文件不是很大，您可以这样使用它：
from collections import Counter

def most_common_line_len(filename):
    with open('somefile.txt') as f:
        return Counter(map(len, f.read().splitlines())).most_common(1)[0][0]

print(most_common_line_len('somefile.txt'))  # --> 3 for your sample data

否则，您可以通过将与函数结合使用，避免将其全部一次读入内存：
def most_common_line_len(filename):
    with open('somefile.txt') as f:
        return Counter(map(lambda line: len(line.rstrip()),
                           (line for line in f))).most_common(1)[0][0]

该模块有一个名为will的dictionary子类，可用于跟踪遇到的每一行的长度
这使得解决这个问题非常容易。如果文件不是很大，您可以这样使用它：
from collections import Counter

def most_common_line_len(filename):
    with open('somefile.txt') as f:
        return Counter(map(len, f.read().splitlines())).most_common(1)[0][0]

print(most_common_line_len('somefile.txt'))  # --> 3 for your sample data

否则，您可以通过将与函数结合使用，避免将其全部一次读入内存：
def most_common_line_len(filename):
    with open('somefile.txt') as f:
        return Counter(map(lambda line: len(line.rstrip()),
                           (line for line in f))).most_common(1)[0][0]

你的代码有什么问题？请将其添加到您的问题中。很抱歉，对于这部分代码，我目前没有任何真正的代码。我的主要问题是文件在其他地方被打开，并且文件的每一行都有一个函数在运行-然后我不得不以某种方式存储行的长度，并找到最常见的大小。您的代码有什么问题？请将其添加到您的问题中。很抱歉，对于这部分代码，我目前没有任何真正的代码。我的主要问题是文件在其他地方被打开，文件的每一行都有一个函数在运行-然后我不得不以某种方式存储行的长度，并找到最常见的大小。老实说，我的文件很大，这与生物数据集有关，所以文件的最小大小约为4gb…所以使用第二个版本，或者，您可以使函数智能化，并让它通过首先检查文件大小来决定使用哪种技术。可以使用stat\u result
中的st\u size
完成此操作。函数返回。谢谢-我只需修改第二位并将其用于我的案例-我肯定文件会很大。实际上，如果您的文件那么大，它可能不值得花一段时间来阅读一次全部的东西——不管它是否全部保存在内存中——只决定最常见的行长度。对于严肃的工作，你可能想考虑使用（Python数据分析库），我听说过它是非常好的，而且很快，当然，还有。老实说，我的文件很大，这与生物数据集有关，所以一个文件的最小大小约为4gb…所以请使用第二个版本，或者您可以使函数智能化，让它通过先检查文件大小来决定使用哪种技术。可以使用stat\u result
中的st\u size
完成此操作。函数返回。谢谢-我只需修改第二位并将其用于我的案例-我肯定文件会很大。实际上，如果您的文件那么大，它可能不值得花一段时间来阅读一次所有的事情——不管它是否全部保存在内存中——仅仅是确定最常见的长度。对于严肃的工作，你可能想考虑使用（Python数据分析库），我听说过的很好，也很快，当然，也有。