如何在arry上使用类似python的len[arry]-1）获取文本文件中的最后一行作为索引？_Python_Algorithm_Search_Text Files

如何在arry上使用类似python的len[arry]-1）获取文本文件中的最后一行作为索引？

python algorithm search

如何在arry上使用类似python的len[arry]-1）获取文本文件中的最后一行作为索引？,python,algorithm,search,text-files,Python,Algorithm,Search,Text Files,我有一个非常大的散列文件和一个我给出的散列。我想将这个给定的散列与文件进行比较，看看它是否在文件中。我为此选择了BinarySearch。我当前的问题是为最右边的元素找到正确的索引 def binarySearch (l, r, x): # Check base case if r >= l: mid = l + (r - l)/2 # If element is present at the middle itself if getLineFromFile(mid) == x:

我有一个非常大的散列文件和一个我给出的散列。我想将这个给定的散列与文件进行比较，看看它是否在文件中。我为此选择了BinarySearch。我当前的问题是为最右边的元素找到正确的索引

def binarySearch (l, r, x):

# Check base case
if r >= l:

mid = l + (r - l)/2

# If element is present at the middle itself
if getLineFromFile(mid) == x:
    return mid

# If element is smaller than mid, then it
# can only be present in left subarray
elif getLineFromFile(mid) > x:
    return binarySearch(l, mid-1, x)

# Else the element can only be present
# in right subarray
else:
    return binarySearch(mid + 1, r, x)

else:
   # Element is not present in the array
   return -1

x = '0000000A0E3B9F25FF41DE4B5AC238C2D545C7A8:15'

def getLineFromFile(lineNumber):
 with open('testfile.txt') as f:
  for i, line in enumerate(f):
   if i == lineNumber:
    return line
else:
 print('Not 7 lines in file')
 line = None

# get last Element of List
def tail():
 for line in open('pwned.txt', 'r'):
   pass
else:
  print line

ausgabetail = tail()
#print ausgabetail
result = binarySearch( 0, ausgabetail, x)
if result != -1:
  print "Element is present at index % d" % result
else:
   print "Element is not present in array"

我现在的问题是为二进制搜索的右侧获取正确的索引。我传递函数（l，r，x）。左侧从0开始。右边应该是文件的结尾，所以最后一行。我试着去弄，但没用。我试着用Funktion tail（）得到这个。但是如果我在测试时打印r，我会得到值“None”。您还有其他想法吗？

以上评论中要求的示例

def checkForHash(h, fname):
    with open(fname) as f:
        for i, line in enumerate(f):
            if h == line:
                return i
    return -1

x = '0000000A0E3B9F25FF41DE4B5AC238C2D545C7A8:15'
checkForHash (x, 'testfile.txt')

我不太明白你对算法的看法。1.您的哈希值在该文件中排序了吗？（二进制搜索需要）2。您无法访问文本文件的任意行，这使得无法正确实现二进制搜索。您已经通过创建

getLineFromFile（）

解决了这个问题，但是这会一遍又一遍地搜索正确的行。因此，从我的观点来看，简单地在文件上循环并检查每一行是否找到了散列将更加有效…@SpghttCd Yes文件是按散列排序的。它也是一个巨大的txt文件（11GB）。我认为简单地循环文件需要很长时间？你能给我举个例子说明你的想法吗？谢谢你的例子。如何使字符串作为一个整体而不是部分进行比较？示例x=12345，如果子字符串12345出现在某个地方，它也会返回结果。此外，性能不是最好的，但我不知道其他方法。如果h==line:则编辑为

，因此它要求相等。当然，当您在11GB文件中搜索某个内容时，性能不会太好。。。这里的问题是IO：如果内存中没有数据，则无法访问该列表中的任意元素n，但这是二进制搜索除了排序列表之外所需要的。在算法中，您很快就会想要访问最后一个元素（请参见您自己的标题）-这就是getLineFromFile（lineNumber）的位置
函数只需要在返回算法行时使用相同的时间，就像上面的简单迭代文件直到找到整个搜索所需的方法一样。感谢您的精彩回复：）。我真的很高兴。现在我还有最后一个问题：我的文本文件看起来像这样：6E252D5ED04B0A3D34CC5EEE0DB42FB1C7C0FEBA:1 5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8:3730471现在我只想要冒号（：）之前的所有内容。我知道如果h==行[：-4]：最后4个字符会被删除，但冒号后面的数字是可变的。因此我需要一些通用的东西。你有什么想法吗？好主意，表演和我用[：-X]在标志后面剪掉时一样。你帮了我很多。非常感谢你！我将您的答案标记为解决方案：）。