Python 如何展开输出显示以查看数据帧的更多列?
有没有办法在交互或脚本执行模式下扩大输出的显示范围 具体地说,我在数据帧上使用Python 如何展开输出显示以查看数据帧的更多列?,python,pandas,printing,column-width,Python,Pandas,Printing,Column Width,有没有办法在交互或脚本执行模式下扩大输出的显示范围 具体地说,我在数据帧上使用descripe()函数。当DataFrame有五列(标签)宽时,我会得到我想要的描述性统计数据。但是,如果DataFrame有更多的列,则统计信息将被抑制,并返回如下内容: >索引:8个条目,最多计数 >>数据列: >>x1 8非空值 >>x2 8非空值 >>x3 8非空值 >>x4 8非空值 >>x5 8非空值 >>x6 8非空值 >>x7 8非空值 无论是6列还是7列,都会给出“8”值。“8”指的是什么 我已经
descripe()
函数。当DataFrame
有五列(标签)宽时,我会得到我想要的描述性统计数据。但是,如果DataFrame
有更多的列,则统计信息将被抑制,并返回如下内容:
>索引:8个条目,最多计数
>>数据列:
>>x1 8非空值
>>x2 8非空值
>>x3 8非空值
>>x4 8非空值
>>x5 8非空值
>>x6 8非空值
>>x7 8非空值
无论是6列还是7列,都会给出“8”值。“8”指的是什么
我已经尝试过将窗口拖得更大,以及增加“配置空闲”宽度选项,但都没有效果
我使用Pandas和
descripe()
的目的是避免使用第二个程序(如Stata)进行基本数据操作和调查。您可以使用打印df.descripe().to_string()
强制它显示整个表。(您可以像这样对任何数据帧使用to_string()
。descripe
的结果只是一个数据帧本身。)
8是数据框中保存“描述”的行数(因为
description
计算8个统计值、最小值、最大值、平均值等)。您可以使用设置打印选项来调整熊猫打印选项
In [3]: df.describe()
Out[3]:
<class 'pandas.core.frame.DataFrame'>
Index: 8 entries, count to max
Data columns:
x1 8 non-null values
x2 8 non-null values
x3 8 non-null values
x4 8 non-null values
x5 8 non-null values
x6 8 non-null values
x7 8 non-null values
dtypes: float64(7)
In [4]: pd.set_printoptions(precision=2)
In [5]: df.describe()
Out[5]:
x1 x2 x3 x4 x5 x6 x7
count 8.0 8.0 8.0 8.0 8.0 8.0 8.0
mean 69024.5 69025.5 69026.5 69027.5 69028.5 69029.5 69030.5
std 17.1 17.1 17.1 17.1 17.1 17.1 17.1
min 69000.0 69001.0 69002.0 69003.0 69004.0 69005.0 69006.0
25% 69012.2 69013.2 69014.2 69015.2 69016.2 69017.2 69018.2
50% 69024.5 69025.5 69026.5 69027.5 69028.5 69029.5 69030.5
75% 69036.8 69037.8 69038.8 69039.8 69040.8 69041.8 69042.8
max 69049.0 69050.0 69051.0 69052.0 69053.0 69054.0 69055.0
此外,设置选项的API已更改:
In [4]: pd.set_option('display.precision', 2)
In [5]: df.describe()
Out[5]:
x1 x2 x3 x4 x5 x6 x7
count 8.0 8.0 8.0 8.0 8.0 8.0 8.0
mean 59832.4 27356.7 49317.3 51214.8 51254.8 41863.0 33950.2
std 22600.7 26867.2 28071.7 21012.4 33831.5 38709.5 29075.7
min 31906.7 1648.4 56.4 16278.3 43.7 3591.0 1833.5
25% 45264.6 12799.5 41429.6 40374.3 29789.6 15145.8 6879.5
50% 56340.2 18666.5 51995.7 54894.6 47667.7 22139.2 33706.0
75% 75587.0 31375.6 61069.2 67811.9 76014.9 72039.0 51449.9
max 98136.5 84544.5 91744.0 75154.6 99012.7 98601.2 83309.1
更新:熊猫0.23.4版以后的版本
这是没有必要的。如果设置了pd.options.display.width=0
,则自动检测终端窗口的大小。(对于旧版本,请参见底部。)
pandas.set\u printoptions(…)
不推荐使用。相反,请使用pandas.set_选项(optname,val)
,或等效的pd.options.=val
。比如:
import pandas as pd
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
以下是:
set_option(pat,value)-设置指定选项的值
可用选项:
显示。[切碎阈值,列标题,列间距,日期,日期优先,
日期_yearfirst,编码,展开_frame _repr,浮动_格式,高度,
行宽、最大列数、最大列宽、最大信息列数、最大信息行数、,
最大行数、最大顺序项、mpl样式、多行稀疏、笔记本报告html、,
pprint\u嵌套\u深度、精度、宽度]
模式。[模拟交互,将\u inf\u用作\u null]
参数
----------
pat-str/regexp,它应该匹配单个选项。
注意:为方便起见,支持部分匹配,但除非使用
完整的选项名称(例如,*x.y.z.option\u name*),您的代码将来可能会中断
如果引入了具有类似名称的新选项,则为。
价值-期权的新价值。
退换商品
-------
没有一个
提高
------
如果不存在这样的选项,则返回KeyError
display.chop_阈值:[默认值:无][当前:无]
:浮动或无
如果设置为浮点值,则所有浮点值均小于给定阈值
repr和friends将显示为0。
display.colheader_justify:[默认值:右侧][当前:右侧]
:“左”/“右”
控制列标题的对齐方式。由DataFrameFormatter使用。
display.column_space:[默认值:12][当前:12]无可用说明。
display.date\u dayfirst:[默认值:False][当前值:False]
:布尔值
如果为True,则打印并解析日期,例如20/01/2005
display.date\u yearfirst:[默认值:False][当前值:False]
:布尔值
如果为True,则打印并解析第一年的日期,例如2005/01/20
display.encoding:[默认值:UTF-8][当前:UTF-8]
:str/unicode
默认为检测到的控制台编码。
指定要用于to_字符串返回的字符串的编码,
这些通常是要在控制台上显示的字符串。
display.expand\u frame\u repr:[默认值:True][当前值:True]
:布尔值
是否打印宽数据帧的完整数据帧报告
在多行中,仍然尊重'max_columns',但输出将
如果宽度超过“display.width”,则在多个“页面”之间环绕。
display.float_格式:[默认值:无][当前:无]
:可调用
可调用函数应接受浮点数并返回
具有所需数字格式的字符串。这是用来
在某些地方,如SeriesFormatter。
有关示例,请参见core.format.EngFormatter。
display.height:[默认值:60][当前值:1000]
:int
不赞成。
(不推荐使用,请改用'display.height'
display.line_width:[默认值:80][当前值:1000]
:int
不赞成。
(不推荐使用,请改用'display.width'
display.max_columns:[默认值:20][当前值:500]
:int
max_rows和max_columns在_repr__()方法中用于决定
to_string()或info()用于将对象渲染为字符串。万一
python/IPython正在终端中运行,可以将其设置为0和0
将正确自动检测终端的宽度,并切换到较小的宽度
格式,以防所有列无法垂直放置。IPython笔记本,
IPython qtconsole或IDLE不在终端中运行,因此它不是
可以进行正确的自动检测。
“无”值表示无限制。
display.max_colwidth:[默认值:50][当前值:50]
:int
报表中列的最大字符宽度
数据结构。当列溢出时,会出现“…”
占位符嵌入到输出中。
display.max\u info\u列:[默认值:100][当前值:100]
:int
在DataFrame.info方法中使用max_info_列来决定
将打印每列信息。
display.max\u info\u行:[默认值:
import pandas as pd
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
set_option(pat,value) - Sets the value of the specified option
Available options:
display.[chop_threshold, colheader_justify, column_space, date_dayfirst,
date_yearfirst, encoding, expand_frame_repr, float_format, height,
line_width, max_columns, max_colwidth, max_info_columns, max_info_rows,
max_rows, max_seq_items, mpl_style, multi_sparse, notebook_repr_html,
pprint_nest_depth, precision, width]
mode.[sim_interactive, use_inf_as_null]
Parameters
----------
pat - str/regexp which should match a single option.
Note: partial matches are supported for convenience, but unless you use the
full option name (e.g., *x.y.z.option_name*), your code may break in future
versions if new options with similar names are introduced.
value - new value of option.
Returns
-------
None
Raises
------
KeyError if no such option exists
display.chop_threshold: [default: None] [currently: None]
: float or None
if set to a float value, all float values smaller then the given threshold
will be displayed as exactly 0 by repr and friends.
display.colheader_justify: [default: right] [currently: right]
: 'left'/'right'
Controls the justification of column headers. used by DataFrameFormatter.
display.column_space: [default: 12] [currently: 12]No description available.
display.date_dayfirst: [default: False] [currently: False]
: boolean
When True, prints and parses dates with the day first, eg 20/01/2005
display.date_yearfirst: [default: False] [currently: False]
: boolean
When True, prints and parses dates with the year first, e.g., 2005/01/20
display.encoding: [default: UTF-8] [currently: UTF-8]
: str/unicode
Defaults to the detected encoding of the console.
Specifies the encoding to be used for strings returned by to_string,
these are generally strings meant to be displayed on the console.
display.expand_frame_repr: [default: True] [currently: True]
: boolean
Whether to print out the full DataFrame repr for wide DataFrames
across multiple lines, `max_columns` is still respected, but the output will
wrap-around across multiple "pages" if it's width exceeds `display.width`.
display.float_format: [default: None] [currently: None]
: callable
The callable should accept a floating point number and return
a string with the desired format of the number. This is used
in some places like SeriesFormatter.
See core.format.EngFormatter for an example.
display.height: [default: 60] [currently: 1000]
: int
Deprecated.
(Deprecated, use `display.height` instead.)
display.line_width: [default: 80] [currently: 1000]
: int
Deprecated.
(Deprecated, use `display.width` instead.)
display.max_columns: [default: 20] [currently: 500]
: int
max_rows and max_columns are used in __repr__() methods to decide if
to_string() or info() is used to render an object to a string. In case
python/IPython is running in a terminal this can be set to 0 and Pandas
will correctly auto-detect the width the terminal and swap to a smaller
format in case all columns would not fit vertically. The IPython notebook,
IPython qtconsole, or IDLE do not run in a terminal and hence it is not
possible to do correct auto-detection.
'None' value means unlimited.
display.max_colwidth: [default: 50] [currently: 50]
: int
The maximum width in characters of a column in the repr of
a Pandas data structure. When the column overflows, a "..."
placeholder is embedded in the output.
display.max_info_columns: [default: 100] [currently: 100]
: int
max_info_columns is used in DataFrame.info method to decide if
per column information will be printed.
display.max_info_rows: [default: 1690785] [currently: 1690785]
: int or None
max_info_rows is the maximum number of rows for which a frame will
perform a null check on its columns when repr'ing To a console.
The default is 1,000,000 rows. So, if a DataFrame has more
1,000,000 rows there will be no null check performed on the
columns and thus the representation will take much less time to
display in an interactive session. A value of None means always
perform a null check when repr'ing.
display.max_rows: [default: 60] [currently: 500]
: int
This sets the maximum number of rows Pandas should output when printing
out various output. For example, this value determines whether the repr()
for a dataframe prints out fully or just a summary repr.
'None' value means unlimited.
display.max_seq_items: [default: None] [currently: None]
: int or None
when pretty-printing a long sequence, no more then `max_seq_items`
will be printed. If items are ommitted, they will be denoted by the addition
of "..." to the resulting string.
If set to None, the number of items to be printed is unlimited.
display.mpl_style: [default: None] [currently: None]
: bool
Setting this to 'default' will modify the rcParams used by matplotlib
to give plots a more pleasing visual style by default.
Setting this to None/False restores the values to their initial value.
display.multi_sparse: [default: True] [currently: True]
: boolean
"sparsify" MultiIndex display (don't display repeated
elements in outer levels within groups)
display.notebook_repr_html: [default: True] [currently: True]
: boolean
When True, IPython notebook will use html representation for
Pandas objects (if it is available).
display.pprint_nest_depth: [default: 3] [currently: 3]
: int
Controls the number of nested levels to process when pretty-printing
display.precision: [default: 7] [currently: 7]
: int
Floating point output precision (number of significant digits). This is
only a suggestion
display.width: [default: 80] [currently: 1000]
: int
Width of the display in characters. In case python/IPython is running in
a terminal this can be set to None and Pandas will correctly auto-detect the
width.
Note that the IPython notebook, IPython qtconsole, or IDLE do not run in a
terminal and hence it is not possible to correctly detect the width.
mode.sim_interactive: [default: False] [currently: False]
: boolean
Whether to simulate interactive mode for purposes of testing
mode.use_inf_as_null: [default: False] [currently: False]
: boolean
True means treat None, NaN, INF, -INF as null (old way),
False means None and NaN are null, but INF, -INF are not null
(new way).
Call def: pd.set_option(self, *args, **kwds)
pd.set_option('display.expand_frame_repr', False)
pd.set_option('display.width', pd.util.terminal.get_terminal_size()[0])
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
print (df)
pd.set_option('max_colwidth', 800)
pd.set_option('display.large_repr', 'truncate')
pd.set_option('display.max_columns', 0)
pd.options.display.width = None
In [1]: import pandas as pd
In [2]: pd.options.display.max_rows
Out[2]: 15
In [3]: pd.options.display.max_rows = 999
In [4]: pd.options.display.max_rows
Out[4]: 999
pd.set_option('display.max_columns', None)
pd.set_option('display.expand_frame_repr', False)
pd.set_option('max_colwidth', -1)
# Environment settings:
pd.set_option('display.max_column', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_seq_items', None)
pd.set_option('display.max_colwidth', 500)
pd.set_option('expand_frame_repr', True)
import pandas as pd
pd.set_option('display.max_columns', 100)
pd.set_option('display.width', 1000)
SentenceA = "William likes Piano and Piano likes William"
SentenceB = "Sara likes Guitar"
SentenceC = "Mamoosh likes Piano"
SentenceD = "William is a CS Student"
SentenceE = "Sara is kind"
SentenceF = "Mamoosh is kind"
bowA = SentenceA.split(" ")
bowB = SentenceB.split(" ")
bowC = SentenceC.split(" ")
bowD = SentenceD.split(" ")
bowE = SentenceE.split(" ")
bowF = SentenceF.split(" ")
# Creating a set consisting of all words
wordSet = set(bowA).union(set(bowB)).union(set(bowC)).union(set(bowD)).union(set(bowE)).union(set(bowF))
print("Set of all words is: ", wordSet)
# Initiating dictionary with 0 value for all BOWs
wordDictA = dict.fromkeys(wordSet, 0)
wordDictB = dict.fromkeys(wordSet, 0)
wordDictC = dict.fromkeys(wordSet, 0)
wordDictD = dict.fromkeys(wordSet, 0)
wordDictE = dict.fromkeys(wordSet, 0)
wordDictF = dict.fromkeys(wordSet, 0)
for word in bowA:
wordDictA[word] += 1
for word in bowB:
wordDictB[word] += 1
for word in bowC:
wordDictC[word] += 1
for word in bowD:
wordDictD[word] += 1
for word in bowE:
wordDictE[word] += 1
for word in bowF:
wordDictF[word] += 1
# Printing term frequency
print("SentenceA TF: ", wordDictA)
print("SentenceB TF: ", wordDictB)
print("SentenceC TF: ", wordDictC)
print("SentenceD TF: ", wordDictD)
print("SentenceE TF: ", wordDictE)
print("SentenceF TF: ", wordDictF)
print(pd.DataFrame([wordDictA, wordDictB, wordDictB, wordDictC, wordDictD, wordDictE, wordDictF]))
CS Guitar Mamoosh Piano Sara Student William a and is kind likes
0 0 0 0 2 0 0 2 0 1 0 0 2
1 0 1 0 0 1 0 0 0 0 0 0 1
2 0 1 0 0 1 0 0 0 0 0 0 1
3 0 0 1 1 0 0 0 0 0 0 0 1
4 1 0 0 0 0 1 1 1 0 1 0 0
5 0 0 0 0 1 0 0 0 0 1 1 0
6 0 0 1 0 0 0 0 0 0 1 1 0
df.columns.values
for col in df.columns:
print(col)
pd.set_option('display.max_columns', None)
import pandas as pd
pd.options.display.max_columns = 10
pd.options.display.max_rows = 999
pd.options.display.max_columns = 100
def display_all(df): # For any Dataframe df
with pd.option_context('display.max_rows',1000): # Change number of rows accordingly
with pd.option_context('display.max_columns',1000): # Change number of columns accordingly
display(df)
import math
col_range = 5
for _ in range(int(math.ceil(len(df_data.columns)/col_range))):
idx1 = _*col_range
idx2 = idx1+col_range
print(df_data.iloc[:, idx1:idx2].describe())
import numpy as np
np.set_printoptions(linewidth=160)