Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/341.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何从NLTK中brown语料库的特定类别中查找形容词频率_Python_Nltk - Fatal编程技术网

Python 如何从NLTK中brown语料库的特定类别中查找形容词频率

Python 如何从NLTK中brown语料库的特定类别中查找形容词频率,python,nltk,Python,Nltk,我是这方面的初学者,我想知道是否有可能从brown语料库的类别中提取一个形容词频率,并用Python创建一个形容词列表 from collections import Counter from nltk.corpus import brown # Split the words and POS tags words, poss = zip(*brown.tagged_words()) # Put them into a Counter object pos_freq = Counter(pos

我是这方面的初学者,我想知道是否有可能从brown语料库的类别中提取一个形容词频率,并用Python创建一个形容词列表

from collections import Counter
from nltk.corpus import brown

# Split the words and POS tags
words, poss = zip(*brown.tagged_words())
# Put them into a Counter object
pos_freq = Counter(poss)

for pos in pos_freq:
    print pos, pos_freq[pos]
[out]:

' 317
'' 8789
( 2264
(-HL 162
) 2273
)-HL 184
* 4603
*-HL 8
*-NC 1
*-TL 1
, 58156
,-HL 171
,-NC 5
,-TL 4
-- 3405
---HL 26
. 60638
.-HL 598
.-NC 16
.-TL 2
: 1558
:-HL 138
:-TL 22
ABL 357
ABN 3010
ABN-HL 4
ABN-NC 1
ABN-TL 7
ABX 730
AP 9522
AP$ 9
AP+AP-NC 1
AP-HL 40
AP-NC 2
AP-TL 18
AT 97959
AT-HL 332
AT-NC 35
AT-TL 746
AT-TL-HL 5
BE 6360
BE-HL 13
BE-TL 1
BED 3282
BED* 22
BED-NC 3
BEDZ 9806
BEDZ* 154
BEDZ-HL 1
BEDZ-NC 8
BEG 686
BEM 226
BEM* 9
BEM-NC 2
BEN 2470
BEN-TL 2
BER 4379
BER* 47
BER*-NC 1
BER-HL 11
BER-NC 5
BER-TL 6
BEZ 10066
BEZ* 117
BEZ-HL 30
BEZ-NC 5
BEZ-TL 8
CC 37718
CC-HL 119
CC-NC 5
CC-TL 307
CC-TL-HL 2
CD 13510
CD$ 5
CD-HL 444
CD-NC 5
CD-TL 898
CD-TL-HL 17
CS 22143
CS-HL 25
CS-NC 5
CS-TL 2
DO 1353
DO* 485
DO*-HL 3
DO+PPSS 1
DO-HL 4
DO-NC 2
DO-TL 5
DOD 1047
DOD* 402
DOD*-TL 1
DOD-NC 1
DOZ 467
DOZ* 89
DOZ*-TL 1
DOZ-HL 16
DOZ-TL 2
DT 8957
DT$ 5
DT+BEZ 179
DT+BEZ-NC 1
DT+MD 3
DT-HL 6
DT-NC 7
DT-TL 9
DTI 2921
DTI-HL 6
DTI-TL 2
DTS 2435
DTS+BEZ 2
DTS-HL 2
DTX 104
EX 2164
EX+BEZ 105
EX+HVD 3
EX+HVZ 2
EX+MD 4
EX-HL 1
EX-NC 1
FW-* 6
FW-*-TL 2
FW-AT 24
FW-AT+NN-TL 13
FW-AT+NP-TL 2
FW-AT-HL 1
FW-AT-TL 44
FW-BE 1
FW-BER 3
FW-BEZ 4
FW-CC 27
FW-CC-TL 14
FW-CD 7
FW-CD-TL 2
FW-CS 3
FW-DT 2
FW-DT+BEZ 2
FW-DTS 1
FW-HV 1
FW-IN 84
FW-IN+AT 4
FW-IN+AT-T 3
FW-IN+AT-TL 18
FW-IN+NN 5
FW-IN+NN-TL 2
FW-IN+NP-TL 2
FW-IN-TL 40
FW-JJ 53
FW-JJ-NC 2
FW-JJ-TL 74
FW-JJR 1
FW-JJT 1
FW-NN 288
FW-NN$ 9
FW-NN$-TL 4
FW-NN-NC 6
FW-NN-TL 170
FW-NN-TL-NC 1
FW-NNS 83
FW-NNS-NC 2
FW-NNS-TL 36
FW-NP 7
FW-NP-TL 4
FW-NPS 2
FW-NPS-TL 1
FW-NR 1
FW-NR-TL 3
FW-OD-NC 1
FW-OD-TL 4
FW-PN 1
FW-PP$ 3
FW-PP$-NC 1
FW-PP$-TL 2
FW-PPL 9
FW-PPL+VBZ 2
FW-PPO 4
FW-PPO+IN 3
FW-PPS 1
FW-PPSS 6
FW-PPSS+HV 1
FW-QL 1
FW-RB 32
FW-RB+CC 1
FW-RB-TL 3
FW-TO+VB 1
FW-UH 8
FW-UH-NC 1
FW-UH-TL 1
FW-VB 26
FW-VB-NC 3
FW-VB-TL 1
FW-VBD 2
FW-VBD-TL 1
FW-VBG 7
FW-VBG-TL 1
FW-VBN 12
FW-VBZ 4
FW-WDT 16
FW-WPO 1
FW-WPS 1
HV 3928
HV* 42
HV+TO 3
HV-HL 3
HV-NC 11
HV-TL 3
HVD 4895
HVD* 99
HVD-HL 1
HVG 281
HVG-HL 1
HVN 237
HVZ 2433
HVZ* 22
HVZ-NC 2
HVZ-TL 4
IN 120557
IN+IN 1
IN+PPO 1
IN-HL 508
IN-NC 41
IN-TL 1477
IN-TL-HL 6
JJ 64028
JJ$-TL 1
JJ+JJ-NC 2
JJ-HL 396
JJ-NC 41
JJ-TL 4107
JJ-TL-HL 26
JJ-TL-NC 1
JJR 1958
JJR+CS 1
JJR-HL 17
JJR-NC 5
JJR-TL 15
JJS 359
JJS-HL 1
JJS-TL 20
JJT 1005
JJT-HL 6
JJT-NC 1
JJT-TL 4
MD 12431
MD* 866
MD*-HL 1
MD+HV 7
MD+PPSS 1
MD+TO 2
MD-HL 27
MD-NC 2
MD-TL 8
NIL 157
NN 152470
NN$ 1480
NN$-HL 20
NN$-TL 361
NN+BEZ 34
NN+BEZ-TL 2
NN+HVD-TL 1
NN+HVZ 5
NN+HVZ-TL 1
NN+IN 1
NN+MD 2
NN+NN-NC 1
NN-HL 1471
NN-NC 118
NN-TL 13372
NN-TL-HL 129
NN-TL-NC 3
NNS 55110
NNS$ 257
NNS$-HL 4
NNS$-NC 2
NNS$-TL 74
NNS$-TL-HL 1
NNS+MD 2
NNS-HL 609
NNS-NC 26
NNS-TL 2226
NNS-TL-HL 14
NNS-TL-NC 3
NP 34476
NP$ 2565
NP$-HL 8
NP$-TL 141
NP+BEZ 25
NP+BEZ-NC 3
NP+HVZ 6
NP+HVZ-NC 1
NP+MD 2
NP-HL 517
NP-NC 15
NP-TL 4019
NP-TL-HL 7
NPS 1275
NPS$ 38
NPS$-HL 1
NPS$-TL 3
NPS-HL 8
NPS-NC 2
NPS-TL 67
NR 1566
NR$ 66
NR$-TL 11
NR+MD 1
NR-HL 10
NR-NC 4
NR-TL 309
NR-TL-HL 5
NRS 16
NRS-TL 1
OD 1935
OD-HL 8
OD-NC 1
OD-TL 201
PN 2573
PN$ 89
PN+BEZ 7
PN+HVD 1
PN+HVZ 3
PN+MD 3
PN-HL 2
PN-NC 2
PN-TL 5
PP$ 16872
PP$$ 164
PP$-HL 10
PP$-NC 13
PP$-TL 35
PPL 1233
PPL-HL 1
PPL-NC 2
PPL-TL 1
PPLS 345
PPO 11181
PPO-HL 5
PPO-NC 9
PPO-TL 13
PPS 18253
PPS+BEZ 430
PPS+BEZ-HL 1
PPS+BEZ-NC 3
PPS+HVD 83
PPS+HVZ 43
PPS+MD 144
PPS-HL 19
PPS-NC 9
PPS-TL 6
PPSS 13802
PPSS+BEM 270
PPSS+BER 278
PPSS+BER-N 1
PPSS+BER-NC 1
PPSS+BER-TL 1
PPSS+BEZ 1
PPSS+BEZ* 1
PPSS+HV 241
PPSS+HV-TL 1
PPSS+HVD 83
PPSS+MD 484
PPSS+MD-NC 2
PPSS+VB 2
PPSS-HL 25
PPSS-NC 31
PPSS-TL 9
QL 8735
QL-HL 4
QL-NC 2
QL-TL 6
QLP 261
RB 36464
RB$ 9
RB+BEZ 11
RB+BEZ-HL 1
RB+BEZ-NC 1
RB+CS 3
RB-HL 49
RB-NC 26
RB-TL 40
RBR 1182
RBR+CS 1
RBR-NC 1
RBT 101
RN 9
RP 6009
RP+IN 4
RP-HL 14
RP-NC 5
RP-TL 4
TO 14918
TO+VB 2
TO-HL 55
TO-NC 13
TO-TL 10
UH 608
UH-HL 1
UH-NC 5
UH-TL 15
VB 33693
VB+AT 2
VB+IN 3
VB+JJ-NC 1
VB+PPO 71
VB+RP 2
VB+TO 4
VB+VB-NC 1
VB-HL 125
VB-NC 41
VB-TL 96
VBD 26167
VBD-HL 8
VBD-NC 11
VBD-TL 6
VBG 17893
VBG+TO 17
VBG-HL 146
VBG-NC 16
VBG-TL 133
VBN 29186
VBN+TO 5
VBN-HL 137
VBN-NC 9
VBN-TL 591
VBN-TL-HL 6
VBN-TL-NC 3
VBZ 7373
VBZ-HL 72
VBZ-NC 7
VBZ-TL 17
WDT 5539
WDT+BER 1
WDT+BER+PP 1
WDT+BEZ 47
WDT+BEZ-HL 1
WDT+BEZ-NC 2
WDT+BEZ-TL 1
WDT+DO+PPS 1
WDT+DOD 1
WDT+HVZ 2
WDT-HL 30
WDT-NC 7
WP$ 252
WPO 280
WPO-NC 1
WPO-TL 4
WPS 3924
WPS+BEZ 21
WPS+BEZ-NC 2
WPS+BEZ-TL 1
WPS+HVD 6
WPS+HVZ 2
WPS+MD 8
WPS-HL 2
WPS-NC 3
WPS-TL 12
WQL 176
WQL-TL 5
WRB 4509
WRB+BER 1
WRB+BEZ 11
WRB+BEZ-TL 3
WRB+DO 1
WRB+DOD 6
WRB+DOD* 1
WRB+DOZ 1
WRB+IN 1
WRB+MD 1
WRB-HL 36
WRB-NC 7
WRB-TL 9
`` 8837
71994
然后:

# POS that starts with JJ are adjectives, sum the counts up
print sum(pos_freq[i] for i in pos_freq if i.startswith('JJ'))
[out]:

' 317
'' 8789
( 2264
(-HL 162
) 2273
)-HL 184
* 4603
*-HL 8
*-NC 1
*-TL 1
, 58156
,-HL 171
,-NC 5
,-TL 4
-- 3405
---HL 26
. 60638
.-HL 598
.-NC 16
.-TL 2
: 1558
:-HL 138
:-TL 22
ABL 357
ABN 3010
ABN-HL 4
ABN-NC 1
ABN-TL 7
ABX 730
AP 9522
AP$ 9
AP+AP-NC 1
AP-HL 40
AP-NC 2
AP-TL 18
AT 97959
AT-HL 332
AT-NC 35
AT-TL 746
AT-TL-HL 5
BE 6360
BE-HL 13
BE-TL 1
BED 3282
BED* 22
BED-NC 3
BEDZ 9806
BEDZ* 154
BEDZ-HL 1
BEDZ-NC 8
BEG 686
BEM 226
BEM* 9
BEM-NC 2
BEN 2470
BEN-TL 2
BER 4379
BER* 47
BER*-NC 1
BER-HL 11
BER-NC 5
BER-TL 6
BEZ 10066
BEZ* 117
BEZ-HL 30
BEZ-NC 5
BEZ-TL 8
CC 37718
CC-HL 119
CC-NC 5
CC-TL 307
CC-TL-HL 2
CD 13510
CD$ 5
CD-HL 444
CD-NC 5
CD-TL 898
CD-TL-HL 17
CS 22143
CS-HL 25
CS-NC 5
CS-TL 2
DO 1353
DO* 485
DO*-HL 3
DO+PPSS 1
DO-HL 4
DO-NC 2
DO-TL 5
DOD 1047
DOD* 402
DOD*-TL 1
DOD-NC 1
DOZ 467
DOZ* 89
DOZ*-TL 1
DOZ-HL 16
DOZ-TL 2
DT 8957
DT$ 5
DT+BEZ 179
DT+BEZ-NC 1
DT+MD 3
DT-HL 6
DT-NC 7
DT-TL 9
DTI 2921
DTI-HL 6
DTI-TL 2
DTS 2435
DTS+BEZ 2
DTS-HL 2
DTX 104
EX 2164
EX+BEZ 105
EX+HVD 3
EX+HVZ 2
EX+MD 4
EX-HL 1
EX-NC 1
FW-* 6
FW-*-TL 2
FW-AT 24
FW-AT+NN-TL 13
FW-AT+NP-TL 2
FW-AT-HL 1
FW-AT-TL 44
FW-BE 1
FW-BER 3
FW-BEZ 4
FW-CC 27
FW-CC-TL 14
FW-CD 7
FW-CD-TL 2
FW-CS 3
FW-DT 2
FW-DT+BEZ 2
FW-DTS 1
FW-HV 1
FW-IN 84
FW-IN+AT 4
FW-IN+AT-T 3
FW-IN+AT-TL 18
FW-IN+NN 5
FW-IN+NN-TL 2
FW-IN+NP-TL 2
FW-IN-TL 40
FW-JJ 53
FW-JJ-NC 2
FW-JJ-TL 74
FW-JJR 1
FW-JJT 1
FW-NN 288
FW-NN$ 9
FW-NN$-TL 4
FW-NN-NC 6
FW-NN-TL 170
FW-NN-TL-NC 1
FW-NNS 83
FW-NNS-NC 2
FW-NNS-TL 36
FW-NP 7
FW-NP-TL 4
FW-NPS 2
FW-NPS-TL 1
FW-NR 1
FW-NR-TL 3
FW-OD-NC 1
FW-OD-TL 4
FW-PN 1
FW-PP$ 3
FW-PP$-NC 1
FW-PP$-TL 2
FW-PPL 9
FW-PPL+VBZ 2
FW-PPO 4
FW-PPO+IN 3
FW-PPS 1
FW-PPSS 6
FW-PPSS+HV 1
FW-QL 1
FW-RB 32
FW-RB+CC 1
FW-RB-TL 3
FW-TO+VB 1
FW-UH 8
FW-UH-NC 1
FW-UH-TL 1
FW-VB 26
FW-VB-NC 3
FW-VB-TL 1
FW-VBD 2
FW-VBD-TL 1
FW-VBG 7
FW-VBG-TL 1
FW-VBN 12
FW-VBZ 4
FW-WDT 16
FW-WPO 1
FW-WPS 1
HV 3928
HV* 42
HV+TO 3
HV-HL 3
HV-NC 11
HV-TL 3
HVD 4895
HVD* 99
HVD-HL 1
HVG 281
HVG-HL 1
HVN 237
HVZ 2433
HVZ* 22
HVZ-NC 2
HVZ-TL 4
IN 120557
IN+IN 1
IN+PPO 1
IN-HL 508
IN-NC 41
IN-TL 1477
IN-TL-HL 6
JJ 64028
JJ$-TL 1
JJ+JJ-NC 2
JJ-HL 396
JJ-NC 41
JJ-TL 4107
JJ-TL-HL 26
JJ-TL-NC 1
JJR 1958
JJR+CS 1
JJR-HL 17
JJR-NC 5
JJR-TL 15
JJS 359
JJS-HL 1
JJS-TL 20
JJT 1005
JJT-HL 6
JJT-NC 1
JJT-TL 4
MD 12431
MD* 866
MD*-HL 1
MD+HV 7
MD+PPSS 1
MD+TO 2
MD-HL 27
MD-NC 2
MD-TL 8
NIL 157
NN 152470
NN$ 1480
NN$-HL 20
NN$-TL 361
NN+BEZ 34
NN+BEZ-TL 2
NN+HVD-TL 1
NN+HVZ 5
NN+HVZ-TL 1
NN+IN 1
NN+MD 2
NN+NN-NC 1
NN-HL 1471
NN-NC 118
NN-TL 13372
NN-TL-HL 129
NN-TL-NC 3
NNS 55110
NNS$ 257
NNS$-HL 4
NNS$-NC 2
NNS$-TL 74
NNS$-TL-HL 1
NNS+MD 2
NNS-HL 609
NNS-NC 26
NNS-TL 2226
NNS-TL-HL 14
NNS-TL-NC 3
NP 34476
NP$ 2565
NP$-HL 8
NP$-TL 141
NP+BEZ 25
NP+BEZ-NC 3
NP+HVZ 6
NP+HVZ-NC 1
NP+MD 2
NP-HL 517
NP-NC 15
NP-TL 4019
NP-TL-HL 7
NPS 1275
NPS$ 38
NPS$-HL 1
NPS$-TL 3
NPS-HL 8
NPS-NC 2
NPS-TL 67
NR 1566
NR$ 66
NR$-TL 11
NR+MD 1
NR-HL 10
NR-NC 4
NR-TL 309
NR-TL-HL 5
NRS 16
NRS-TL 1
OD 1935
OD-HL 8
OD-NC 1
OD-TL 201
PN 2573
PN$ 89
PN+BEZ 7
PN+HVD 1
PN+HVZ 3
PN+MD 3
PN-HL 2
PN-NC 2
PN-TL 5
PP$ 16872
PP$$ 164
PP$-HL 10
PP$-NC 13
PP$-TL 35
PPL 1233
PPL-HL 1
PPL-NC 2
PPL-TL 1
PPLS 345
PPO 11181
PPO-HL 5
PPO-NC 9
PPO-TL 13
PPS 18253
PPS+BEZ 430
PPS+BEZ-HL 1
PPS+BEZ-NC 3
PPS+HVD 83
PPS+HVZ 43
PPS+MD 144
PPS-HL 19
PPS-NC 9
PPS-TL 6
PPSS 13802
PPSS+BEM 270
PPSS+BER 278
PPSS+BER-N 1
PPSS+BER-NC 1
PPSS+BER-TL 1
PPSS+BEZ 1
PPSS+BEZ* 1
PPSS+HV 241
PPSS+HV-TL 1
PPSS+HVD 83
PPSS+MD 484
PPSS+MD-NC 2
PPSS+VB 2
PPSS-HL 25
PPSS-NC 31
PPSS-TL 9
QL 8735
QL-HL 4
QL-NC 2
QL-TL 6
QLP 261
RB 36464
RB$ 9
RB+BEZ 11
RB+BEZ-HL 1
RB+BEZ-NC 1
RB+CS 3
RB-HL 49
RB-NC 26
RB-TL 40
RBR 1182
RBR+CS 1
RBR-NC 1
RBT 101
RN 9
RP 6009
RP+IN 4
RP-HL 14
RP-NC 5
RP-TL 4
TO 14918
TO+VB 2
TO-HL 55
TO-NC 13
TO-TL 10
UH 608
UH-HL 1
UH-NC 5
UH-TL 15
VB 33693
VB+AT 2
VB+IN 3
VB+JJ-NC 1
VB+PPO 71
VB+RP 2
VB+TO 4
VB+VB-NC 1
VB-HL 125
VB-NC 41
VB-TL 96
VBD 26167
VBD-HL 8
VBD-NC 11
VBD-TL 6
VBG 17893
VBG+TO 17
VBG-HL 146
VBG-NC 16
VBG-TL 133
VBN 29186
VBN+TO 5
VBN-HL 137
VBN-NC 9
VBN-TL 591
VBN-TL-HL 6
VBN-TL-NC 3
VBZ 7373
VBZ-HL 72
VBZ-NC 7
VBZ-TL 17
WDT 5539
WDT+BER 1
WDT+BER+PP 1
WDT+BEZ 47
WDT+BEZ-HL 1
WDT+BEZ-NC 2
WDT+BEZ-TL 1
WDT+DO+PPS 1
WDT+DOD 1
WDT+HVZ 2
WDT-HL 30
WDT-NC 7
WP$ 252
WPO 280
WPO-NC 1
WPO-TL 4
WPS 3924
WPS+BEZ 21
WPS+BEZ-NC 2
WPS+BEZ-TL 1
WPS+HVD 6
WPS+HVZ 2
WPS+MD 8
WPS-HL 2
WPS-NC 3
WPS-TL 12
WQL 176
WQL-TL 5
WRB 4509
WRB+BER 1
WRB+BEZ 11
WRB+BEZ-TL 3
WRB+DO 1
WRB+DOD 6
WRB+DOD* 1
WRB+DOZ 1
WRB+IN 1
WRB+MD 1
WRB-HL 36
WRB-NC 7
WRB-TL 9
`` 8837
71994

Brown语料库已经有POS标记(),所以您所需要做的就是迭代它(如图所示)并将所有形容词(如定义的)放入列表中。我尝试这样做只是为了测试print(Brown.tagged_words(tagset='JJ'),但单词在元组中,所以我得到一个(any_words,unknown).那么,我如何才能指定我只想打印任何只与标记JJ配对的单词的tuple呢?这相当简单,也许您应该修改python基础知识。