Python Pandas连接中的ParserError

Python Pandas连接中的ParserError,python,pandas,string,datetime,concatenation,Python,Pandas,String,Datetime,Concatenation,我在不同的csv文件中使用twint模块抓取了一些推文。但是,我需要将它们合并到单个数据帧和csv文件中 dataframes = [jan, feb, mar] total = pd.concat(dataframes).reset_index(drop=True) 在连接之后,为了时间序列可视化,我尝试将date列中的对象从strings转换为datetime对象 total['date'] = pd.to_datetime(total['date']) 然而,我面临着这个错误: ---

我在不同的csv文件中使用
twint
模块抓取了一些推文。但是,我需要将它们合并到单个数据帧和csv文件中

dataframes = [jan, feb, mar]
total = pd.concat(dataframes).reset_index(drop=True)
在连接之后,为了时间序列可视化,我尝试将
date
列中的对象从
strings
转换为
datetime
对象

total['date'] = pd.to_datetime(total['date'])
然而,我面临着这个错误:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/arrays/datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
   2084         try:
-> 2085             values, tz_parsed = conversion.datetime_to_datetime64(data)
   2086             # If tzaware, these values represent unix timestamps, so we

pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.datetime_to_datetime64()

TypeError: Unrecognized value type: <class 'str'>

During handling of the above exception, another exception occurred:

ParserError                               Traceback (most recent call last)
<ipython-input-12-1a4afbc77b66> in <module>
----> 1 total['date'] = pd.to_datetime(total['date'])

~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/tools/datetimes.py in to_datetime(arg, errors, dayfirst, yearfirst, utc, format, exact, unit, infer_datetime_format, origin, cache)
    799                 result = result.tz_localize(tz)
    800     elif isinstance(arg, ABCSeries):
--> 801         cache_array = _maybe_cache(arg, format, cache, convert_listlike)
    802         if not cache_array.empty:
    803             result = arg.map(cache_array)

~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/tools/datetimes.py in _maybe_cache(arg, format, cache, convert_listlike)
    176         unique_dates = unique(arg)
    177         if len(unique_dates) < len(arg):
--> 178             cache_dates = convert_listlike(unique_dates, format)
    179             cache_array = Series(cache_dates, index=unique_dates)
    180     return cache_array

~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/tools/datetimes.py in _convert_listlike_datetimes(arg, format, name, tz, unit, errors, infer_datetime_format, dayfirst, yearfirst, exact)
    463         assert format is None or infer_datetime_format
    464         utc = tz == "utc"
--> 465         result, tz_parsed = objects_to_datetime64ns(
    466             arg,
    467             dayfirst=dayfirst,

~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/arrays/datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
   2088             return values.view("i8"), tz_parsed
   2089         except (ValueError, TypeError):
-> 2090             raise e
   2091 
   2092     if tz_parsed is not None:

~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/arrays/datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
   2073 
   2074     try:
-> 2075         result, tz_parsed = tslib.array_to_datetime(
   2076             data,
   2077             errors=errors,

pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()

pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()

pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime_object()

pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime_object()

pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.parse_datetime_string()

~/opt/anaconda3/lib/python3.8/site-packages/dateutil/parser/_parser.py in parse(timestr, parserinfo, **kwargs)
   1372         return parser(parserinfo).parse(timestr, **kwargs)
   1373     else:
-> 1374         return DEFAULTPARSER.parse(timestr, **kwargs)
   1375 
   1376 

~/opt/anaconda3/lib/python3.8/site-packages/dateutil/parser/_parser.py in parse(self, timestr, default, ignoretz, tzinfos, **kwargs)
    647 
    648         if res is None:
--> 649             raise ParserError("Unknown string format: %s", timestr)
    650 
    651         if len(res) == 0:

ParserError: Unknown string format: ['ya60binyagidin', 'günaydın', 'boğaziçitutuklanamaz', 'esnaf1marttaacılıyor', 'cizrevahsetbodrumları', 'fakülteyisarayakur', 'aşağıbakmayacağız', 'abdullahkığılıyaboykot', 'ayağakalk', 'cumhurbaskanıi̇stifa', 'boğaziçi']
---------------------------------------------------------------------------
TypeError回溯(最近一次调用上次)
对象中的~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/arrays/datetimes.py到\u datetime64ns(数据、日首、年首、utc、错误、要求\u iso8601、允许\u对象)
2084尝试:
->2085个值,tz_已解析=转换.datetime_到_datetime64(数据)
2086#如果tzaware,这些值表示unix时间戳,因此
pandas/_libs/tslibs/conversion.pyx在pandas中。_libs.tslibs.conversion.datetime_to_datetime64()
TypeError:无法识别的值类型:
在处理上述异常期间,发生了另一个异常:
ParserError回溯(上次最近的调用)
在里面
---->1总计['date']=pd.至_datetime(总计['date'])
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/tools/datetimes.py in to_datetime(arg,errors,dayfirst,yearfirst,utc,format,exact,unit,expert_datetime_format,origin,cache)
799结果=结果。tz_本地化(tz)
800 elif iInstance(arg、abc系列):
-->801 cache\u array=\u maybe\u cache(arg、format、cache、convert\u listlike)
802如果不是缓存_array.empty:
803结果=参数映射(缓存数组)
缓存中的~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/tools/datetimes.py(参数、格式、缓存、转换列表)
176唯一日期=唯一(arg)
177如果len(唯一日期)178缓存\u日期=转换\u列表(唯一\u日期,格式)
179缓存数组=系列(缓存日期,索引=唯一日期)
180返回高速缓存阵列
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/tools/datetimes.py in\u convert\u listlike\u datetimes(arg、格式、名称、tz、单位、错误、推断日期时间格式、dayfirst、yearfirst、精确)
463断言格式为无或推断日期时间格式
464 utc=tz==“utc”
-->465结果,tz_parsed=对象到日期时间64ns(
466 arg,
467 dayfirst=dayfirst,
对象中的~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/arrays/datetimes.py到\u datetime64ns(数据、日首、年首、utc、错误、要求\u iso8601、允许\u对象)
2088返回值。视图(“i8”),tz_已解析
2089除外(ValueError,TypeError):
->2090升e
2091
2092如果tz_解析为非无:
对象中的~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/arrays/datetimes.py到\u datetime64ns(数据、日首、年首、utc、错误、要求\u iso8601、允许\u对象)
2073
2074尝试:
->2075结果,tz_parsed=tslib.array_to_datetime(
2076年数据,
2077错误=错误,
pandas/_libs/tslib.pyx在pandas中。_libs.tslib.array_to_datetime()
pandas/_libs/tslib.pyx在pandas中。_libs.tslib.array_to_datetime()
pandas/_libs/tslib.pyx在pandas中。_libs.tslib.array_to_datetime_object()
pandas/_libs/tslib.pyx在pandas中。_libs.tslib.array_to_datetime_object()
pandas/_libs/tslibs/parsing.pyx在pandas中。_libs.tslibs.parsing.parse_datetime_string()
解析中的~/opt/anaconda3/lib/python3.8/site-packages/dateutil/parser//u parser.py(timestr、parserinfo、**kwargs)
1372返回解析器(parserinfo).parse(timestr,**kwargs)
其他:
->1374返回DEFAULTPARSER.parse(timestr,**kwargs)
1375
1376
解析中的~/opt/anaconda3/lib/python3.8/site-packages/dateutil/parser//u parser.py(self、timestr、default、ignoretz、tzinfos、**kwargs)
647
648如果res为无:
-->649 raise解析器错误(“未知字符串格式:%s”,timestr)
650
651如果len(res)=0:
解析器错误:未知字符串格式:['ya60binyagidin'、'günaydın'、'boğaziçittuklamaz'、'esnaf1marttaacılıyor'、'cizrevahsetbodrumları'、'fakülteyisarayakur'、'aşağağbakmayacaız'、'abdullahkğlıyaboykot'、'akalk'、'cumhurbaskaiı
我相信问题是由连接过程引起的,但我找不到如何处理它。在这个阶段,即使我在
date
列中删除了那些包含不可转换为
datetime
对象的行。您能帮我吗


已经感谢您的帮助了!

因为您的
date
列包含
string
值,所以请尝试:
total['date']=pd.to_datetime(total['date'],errors='concurve')
当然,您的“date”列中有解析器无法解析的字符串-例如,“ya60binyagidin”不是日期。可能首先显示此列以查看此列中的实际内容。您还可以使用方法
uniq
count
等来获取有关列中值的更多信息。您不应该期望这样做只有正确的数据。因此,在尝试使用数据之前,只需对数据进行
分析即可。您的错误表明此列中的值不正确,现在您必须决定如何处理错误的值-删除行,转换为其他值,使用其他行的平均值,使用最近行的正确值,等等。