Python 如何在数据帧中将二进制数据拆分为三列
我已经做了一段时间了,对我来说没什么意义 我有以下形式的twitter数据:Python 如何在数据帧中将二进制数据拆分为三列,python,pandas,Python,Pandas,我已经做了一段时间了,对我来说没什么意义 我有以下形式的twitter数据: column Lines "585978391360221184|Thu Apr 09 01:31:50 +0000 2015|Breast cancer risk test devised http://bbc.in/1CimpJF" "585947808772960257|Wed Apr 08 23:30:18 +0000 2015|GP workload harming care - BMA poll h
column Lines
"585978391360221184|Thu Apr 09 01:31:50 +0000 2015|Breast cancer risk test devised
http://bbc.in/1CimpJF"
"585947808772960257|Wed Apr 08 23:30:18 +0000 2015|GP workload harming care - BMA poll
http://bbc.in/1ChTBRv"
"585947807816650752|Wed Apr 08 23:30:18 +0000 2015|Short people's 'heart risk greater'
http://bbc.in/1ChTANp"
"585866060991078401|Wed Apr 08 18:05:28 +0000 2015|New approach against HIV 'promising'
http://bbc.in/1E6jAjt"
"585794106170839041|Wed Apr 08 13:19:33 +0000 2015|Coalition 'undermined NHS' - doctors
http://bbc.in/1CnLwK7"
'586266687017771008|Thu Apr 09 20:37:25 +0000 2015|Sabra hummus recalled in U.S.
http://www.cbc.ca/news/health/sabra-hummus-recalled-in-u-s-1.3026865?cmp=rss'
我需要使用|字符将数据分为数据帧中的三列
我从文本文件中读取数据并将其转换为dataframe。列的名称为Lines
data = []
for f in all_files:
if f =='Health-Tweets.py' or f =='Heath-Tweets.py' :
continue
else:
with open(f, "rb") as myfile:
data1 = myfile.readlines()
if not data1:
continue
print(f)
data.append(data1)
# flatening the list data
data2 = [j for sub in data for j in sub]
# transforming the data to dataframe
df = pd.DataFrame(data2)
# renaming the column
df.columns = ['Lines']
for i in range(df.shape[0]):
try:
df['Lines'][i]= df['Lines'][i].decode('utf-8')
except:
df['Lines'][i]= df['Lines'][i].decode('windows-1252')
df[['binary','date','data']]=df['Lines'].str.split('|',expand=True).apply(lambda x: x.str.strip())
我得到一个错误:
ValueError: Columns must be same length as key
您可能会遇到这样一种情况:拆分没有为您提供3列:
“二进制”、“日期”、“数据”
在某些行中,如果数据被拆分,则二进制列或日期列没有数据