Python:尝试搜索csv第一列,如果len>;30在同一行的另一个文件中
标题不够大,我无法解释这一点,所以它是这样的: 我有一个csv文件,看起来像这样: 示例csv包含Python:尝试搜索csv第一列,如果len>;30在同一行的另一个文件中,python,csv,Python,Csv,标题不够大,我无法解释这一点,所以它是这样的: 我有一个csv文件,看起来像这样: 示例csv包含 long string with some special characters , number, string, number long string with some special characters , number, string, number long string with some special characters , number, string, number lo
long string with some special characters , number, string, number
long string with some special characters , number, string, number
long string with some special characters , number, string, number
long string with some special characters , number, string, number
我想浏览第一列,如果字符串的长度大于20,请执行以下操作:
第20行:带有som,e特殊字符的长字符串
拆分字符串,用字符串的第一部分修改第一个csv,然后创建一个新的csv,并将另一部分添加到同一行号上,剩下的只是空白
我现在拥有的是: 下面的内容现在没有任何作用,我只是试着向自己解释一下,并找出如何使用splitString编写新文件
fileName = file name
maxCollumnLength = number of rows in the whole set
lineNum = line number of a string that is greater then 20
splitString = second part of the split string that should be written on another file
def newopenfile(fileName, maxCollumnLength, lineNum, splitString):
with open(fileName, 'rw', encoding="utf8") as nf:
writer = csv.writer(fileName, quoting=csv.QUOTE_NONE)
for i in range(0, maxCollumnLength-1):
#write whitespace until reaching lineNum of a string thats bigger then 20 then write that part of the string to a csv
这将通过第一列并检查长度
fileName = 'uskrs.csv'
firstColList=[] #an empty list to store the second column
splitString=[]
i = 0
with open(fileName, 'rw', encoding="utf8") as rf:
reader = csv.reader(rf, delimiter=',')
for row in reader:
if len(row[0]) > 20:
i+=1
#split row and parse the other end of the row to newopenfile(fileName, len(reader), i, splitString )
#print(row[0])
#for debuging
firstColList.append(row[0])
从这一点上,我被困在如何实际更改csv中的字符串以及如何拆分它们
字符串也可能有60多个字符,因此需要拆分2次以上,并将其存储在2个以上的CSV中
我不擅长解释这个问题,所以如果你有任何问题,请尽管问
好的,我成功地划分了第一列,如果它的长度大于20,并用前20个字符替换第一列
import csv
def checkLength(column, readFile, writeFile, maxLen):
counter = 0
i = 0
idxSplitItems = []
final = []
newSplits = 0
with open(readFile,'r', encoding="utf8", newline='') as f:
reader = csv.reader(f)
your_list = list(reader)
final = your_list
for sublist in your_list:
#del sublist[-1] -remove last invisible element
i+=1
data = removeUnwanted(sublist[column])
print(data)
if len(data) > maxLen:
counter += 1 # Number of large
idxSplitItems.append(split_bylen(i,data,maxLen))
if len(idxSplitItems) > newSplits: newSplits = len(idxSplitItems)
final[i-1][column] = split_bylen(i,data,maxLen)[1]
final[i-1][column] = removeUnwanted(final[i-1][column])
print("After split data: "+ data)
print("After split final: "+ final[i-1][column])
writeSplitToCSV(writeFile, final)
checkCols(final, 6)
return final, idxSplitItems
def removeUnwanted(data):
data = data.replace(',',' ')
return data
def split_bylen(index, item, maxLen):
clean = removeUnwanted(item)
splitList = [clean[ind:ind+maxLen] for ind in range(0, len(item), maxLen)]
splitList.insert(0,index)
return splitList
def writeSplitToCSV(writeFile,data):
with open(writeFile,'w', encoding="utf8", newline='') as f:
writer = csv.writer(f)
writer.writerows(data)
def checkCols(data, columns):
for sublist in data:
if len(sublist)-1!=columns:
print ("[X] This row doesnt have the same amount of columns as others: "+sublist)
else:
print("All okay")
#len(data) #how many split items
#print(your_list[0][0])
#print("Number of large: ", counter)
final, idxSplitItems = checkLength(0,'test.csv','final.csv', 30)
print("------------------------")
print(idxSplitItems)
print("-------------------------")
print(final)
现在我对这部分代码有一个问题,请注意:
print("After split data: "+ data)
print("After split final: "+ final[i-1][column])
这是为了检查删除逗号是否有效
以
“BUTKOVIĆVESNA,DIPL.IUR。”
数据返回
布特科维·维斯纳
但是最后的回报
BUTKOVIĆVESNA,DIPL.IUR
为什么我的最后一次返回“,”但在数据中它已经消失了,一定是在“split_bylen()”中做的某件事让它这样做的,字典很有趣
要覆盖原始csv,请参阅。你必须使用听写器和听写器。我保留你的阅读方法只是为了清楚
writecsvs = {} #store each line of each new csv
# e.g. {'csv1':[[row0_split1,row0_num,row0_str,row0_num],[row1_split1,row1_num,row1_str,row1_num],...],
# 'csv2':[[row0_split2,row0_num,row0_str,row0_num],[row1_split2,row1_num,row1_str,row1_num],...],
# .
# .
# .}
with open(fileName, mode='rw', encoding="utf-8-sig") as rf:
reader = csv.reader(rf, delimiter=',')
for row in reader:
col1 = row[0]
# check size & split
# decide number of new csvs
# overwrite original csv
# store new content in writecsvs dict
for # Loop over each csv in writecsvs:
writelines = # Get List of Lines
out_file = open('csv1.csv', mode='w') # use the keys in writecsvs for filenames
for line in writelines:
out_file.write(line)
希望这能有所帮助。我已经努力了,找到了一个可以解决的办法。。。。我做了:1。我可以在csv第一列中搜索长度超过30个字符2的字符串。我可以每30个字符拆分一个长字符串,并将它们存储在列表3中。我可以用正确的信息编写一个新的csv,但字符串较短的问题是:它不会删除字符串中的“,”(不是csv所需的“,”),但我创建了一个函数,可以删除逗号(如果存在的话)