Pandas Csv读取器分隔符不起作用
我有一个用逗号分隔的csv文件Pandas Csv读取器分隔符不起作用,pandas,csv,Pandas,Csv,我有一个用逗号分隔的csv文件 df = pd.read_csv('data/data_notebook-1_crime.csv', sep= ',') print(df.head) 不幸的是,如果我打印结果,所有值都在第一列中,如图所示 Csv文件:您必须在此处运行df.head() 输出df.head: <bound method NDFrame.head of Incident ID Offence Code CR Number Dispatch Date
df = pd.read_csv('data/data_notebook-1_crime.csv', sep= ',')
print(df.head)
不幸的是,如果我打印结果,所有值都在第一列中,如图所示
Csv文件:您必须在此处运行
df.head()
输出df.head
:
<bound method NDFrame.head of Incident ID Offence Code CR Number Dispatch Date / Time ... Latitude Longitude Police District Number Location
0 201087097 5707 16033232 NaN ... 39.078911 -77.080827 4D (39.0789, -77.0808)
1 201215730 5311 180058531 11/22/2018 04:58:01 AM ... 38.973022 -76.996799 8D (38.973, -76.9968)
2 201229073 3562 190009928 03/03/2019 04:59:49 AM ... 38.956840 -77.111362 2D (38.9568, -77.1114)
3 201233523 1114 190015440 04/03/2019 11:53:15 AM ... 39.020392 -77.012776 3D (39.0204, -77.0128)
4 201087102 3562 16033238 NaN ... 38.991701 -77.024096 3D (38.9917, -77.0241)
[225681 rows x 30 columns]
Incident ID Offence Code CR Number Dispatch Date / Time ... Latitude Longitude Police District Number Location
0 201087097 5707 16033232 NaN ... 39.078911 -77.080827 4D (39.0789, -77.0808)
1 201215730 5311 180058531 11/22/2018 04:58:01 AM ... 38.973022 -76.996799 8D (38.973, -76.9968)
2 201229073 3562 190009928 03/03/2019 04:59:49 AM ... 38.956840 -77.111362 2D (38.9568, -77.1114)
3 201233523 1114 190015440 04/03/2019 11:53:15 AM ... 39.020392 -77.012776 3D (39.0204, -77.0128)
4 201087102 3562 16033238 NaN ... 38.991701 -77.024096 3D (38.9917, -77.0241)
[5 rows x 30 columns]
要检查所有列是否设置正确,我们可以使用info
功能:
import pandas as pd
df = pd.read_csv("./Crime.csv", sep=",")
print(df.info())
我们可以看到,所有列都按预期设置:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 225681 entries, 0 to 225680
Data columns (total 30 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Incident ID 225681 non-null int64
1 Offence Code 225681 non-null object
2 CR Number 225681 non-null int64
3 Dispatch Date / Time 157045 non-null object
4 NIBRS Code 225681 non-null object
5 Victims 225681 non-null int64
6 Crime Name1 225540 non-null object
7 Crime Name2 225540 non-null object
8 Crime Name3 225540 non-null object
9 Police District Name 225681 non-null object
10 Block Address 205179 non-null object
11 City 224624 non-null object
12 State 225681 non-null object
13 Zip Code 222494 non-null float64
14 Agency 225681 non-null object
15 Place 225681 non-null object
16 Sector 225622 non-null object
17 Beat 225622 non-null object
18 PRA 225640 non-null object
19 Address Number 205253 non-null float64
20 Street Prefix 9949 non-null object
21 Street Name 225681 non-null object
22 Street Suffix 4243 non-null object
23 Street Type 225367 non-null object
24 Start_Date_Time 225681 non-null object
25 End_Date_Time 109034 non-null object
26 Latitude 225681 non-null float64
27 Longitude 225681 non-null float64
28 Police District Number 225681 non-null object
29 Location 225681 non-null object
dtypes: float64(4), int64(3), object(23)
memory usage: 51.7+ MB
范围索引:225681个条目,0到225680
数据列(共30列):
#列非空计数数据类型
--- ------ -------------- -----
0事件ID 225681非空int64
1犯罪代码225681非空对象
2 CR编号225681非空int64
3分派日期/时间157045非空对象
4 NIBRS代码225681非空对象
5受害者225681非空int64
6犯罪名称1 225540非空对象
7犯罪名称2 225540非空对象
8犯罪名称3 225540非空对象
9警区名称225681非空对象
10块地址205179非空对象
11城市224624非空对象
12状态225681非空对象
13邮编222494非空浮动64
14代理225681非空对象
15放置225681非空对象
16扇区225622非空对象
17拍225622非空对象
18 PRA 225640非空对象
19地址号205253非空浮点64
20街道前缀9949非空对象
21街道名称225681非空对象
22街道后缀4243非空对象
23街道类型225367非空对象
24开始日期时间225681非空对象
25结束日期时间109034非空对象
26纬度225681非空浮点64
27经度225681非空浮点64
28警区号码225681非空对象
29位置225681非空对象
数据类型:float64(4)、int64(3)、object(23)
内存使用率:51.7+MB
因此,您似乎在使用一些不同的数据或做一些与问题状态不同的事情。以下是3.8.6中的结果
Python 3.8.6 (default, Oct 28 2020, 18:56:32)
[Clang 12.0.0 (clang-1200.0.31.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.>>> import pandas as pd
>>> df = pd.read_csv('~/Downloads/Crime.csv',sep=',')
sys:1: DtypeWarning: Columns (1,18) have mixed types.Specify dtype option on import or set low_memory=False.
>>> df
Incident ID Offence Code CR Number Dispatch Date / Time NIBRS Code Victims ... Start_Date_Time End_Date_Time Latitude Longitude Police District Number Location
0 201223224 2303 190002520 01/16/2019 03:51:46 PM 23C 1 ... 01/16/2019 03:51:00 PM NaN 39.037367 -77.051662 4D (39.0374, -77.0517)
1 201224613 2006 190004310 01/27/2019 06:05:56 PM 200 1 ... 01/27/2019 06:05:00 PM NaN 39.146531 -77.184940 6D (39.1465, -77.1849)
2 201267200 1103 190057412 11/28/2019 06:08:02 AM 11A 1 ... 11/28/2019 06:08:00 AM NaN 39.034255 -77.049163 4D (39.0343, -77.0492)
3 201230900 1399 190011960 03/15/2019 10:53:22 AM 13B 2 ... 03/15/2019 10:50:00 AM 03/15/2019 10:55:00 AM 39.141812 -77.224489 6D (39.1418, -77.2245)
4 201265312 1399 190055150 11/15/2019 03:31:20 PM 13B 1 ... 11/15/2019 03:20:00 PM NaN 39.159339 -77.198516 6D (39.1593, -77.1985)
... ... ... ... ... ... ... ... ... ... ... ... ... ...
225676 201248696 4104 190034187 07/18/2019 06:48:53 PM 90G 1 ... 07/18/2019 06:48:00 PM NaN 39.075179 -77.112958 1D (39.0752, -77.113)
225677 201250353 2902 190036349 07/30/2019 04:39:04 PM 290 1 ... 07/11/2019 03:00:00 PM 07/11/2019 11:13:00 PM 39.119276 -77.156921 1D (39.1193, -77.1569)
225678 201250255 2305 190035784 07/27/2019 01:20:11 PM 23F 1 ... 07/19/2019 09:30:00 AM 07/27/2019 11:30:00 AM 39.030905 -77.057040 4D (39.0309, -77.057)
225679 201243750 2203 190028212 06/13/2019 10:33:01 AM 220 1 ... 06/13/2019 10:33:00 AM 06/13/2019 11:30:00 AM 38.990216 -77.024017 3D (38.9902, -77.024)
225680 201244611 2309 190029400 06/19/2019 08:22:39 PM 23H 1 ... 04/19/2019 12:00:00 PM 06/19/2019 08:22:00 PM 39.098924 -76.920850 3D (39.0989, -76.9208)
[225681 rows x 30 columns]
>>>
数据类型上的警告可以通过以下参数克服
dtype={"user_id": int, "username": "string"}
我会检查你的python和pandas版本。你能将源代码粘贴到这里并格式化为代码吗?检查文件中是否有引号您能否共享数据帧的前20行,以便我们尝试一个有用的解决方案?是的,但我仍然会遇到一个问题,即我获取的列中的所有值都是错误的。我添加了一个示例,显示所有列都按照您链接的数据的预期进行了设置。感谢所有操作,现在保存的csv文件出现了问题。我真的很感谢你的帮助