Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 计算一个角色在电影剧本中说的话_Python_Python 3.x_Text_Count_Movie - Fatal编程技术网

Python 计算一个角色在电影剧本中说的话

Python 计算一个角色在电影剧本中说的话,python,python-3.x,text,count,movie,Python,Python 3.x,Text,Count,Movie,在一些帮助下,我已经设法找出了所说的话。 现在,我正在寻找一个被选中的人说的文本。 这样我就可以输入MIA,并得到她在电影中说的每一句话 像这样: name = input("Enter name:") wordsspoken(script, name) name1 = input("Enter another name:") wordsspoken(script, name1) 所以我可以数一数后面的单词 这就是电影剧本的样子 An awkward beat. They pass a woo

在一些帮助下,我已经设法找出了所说的话。 现在,我正在寻找一个被选中的人说的文本。 这样我就可以输入MIA,并得到她在电影中说的每一句话 像这样:

name = input("Enter name:")
wordsspoken(script, name)
name1 = input("Enter another name:")
wordsspoken(script, name1)
所以我可以数一数后面的单词

这就是电影剧本的样子

An awkward beat. They pass a wooden SALOON -- where a WESTERN
 is being shot. Extras in COWBOY costumes drink coffee on the
 steps.
                     Revision                        25.


                   MIA (CONT'D)
      I love this stuff. Makes coming to work
      easier.

                   SEBASTIAN
      I know what you mean. I get breakfast
      five miles out of the way just to sit
      outside a jazz club.

                   MIA
      Oh yeah?

                   SEBASTIAN
      It was called Van Beek. The swing bands
      played there. Count Basie. Chick Webb.
             (then,)
      It's a samba-tapas place now.

                   MIA
      A what?

                   SEBASTIAN
      Samba-tapas. It's... Exactly. The joke's on
      history.

我会要求用户首先输入脚本中的所有名称。然后问他们想要这些单词的名字。我会逐字搜索文本,直到找到所需的名称,并将以下单词复制到变量中,直到找到与脚本中其他人匹配的名称。现在人们可以说出另一个角色的名字,但是如果你假设说的人的头衔都是大写的,或者在一行上,那么文本应该很容易过滤

for word in script:
    if word == speaker and word.isupper(): # you may want to check that this is on its own line as well.
        recording = True
    elif word in character_names and word.isupper():  # you may want to check that this is on its own line as well.
        recording = False

    if recording:
        spoken_text += word + " "

我会要求用户首先输入脚本中的所有名称。然后问他们想要这些单词的名字。我会逐字搜索文本,直到找到所需的名称,并将以下单词复制到变量中,直到找到与脚本中其他人匹配的名称。现在人们可以说出另一个角色的名字,但是如果你假设说的人的头衔都是大写的,或者在一行上,那么文本应该很容易过滤

for word in script:
    if word == speaker and word.isupper(): # you may want to check that this is on its own line as well.
        recording = True
    elif word in character_names and word.isupper():  # you may want to check that this is on its own line as well.
        recording = False

    if recording:
        spoken_text += word + " "

如果你只想通过一次脚本来计算你的计数(我想可能会很长),你可以只跟踪哪个角色在说话;设置类似于小型状态机的东西:

import re
from collections import Counter, defaultdict

words_spoken = defaultdict(Counter)
currently_speaking = 'Narrator'

for line in SCRIPT.split('\n'):
    name = line.replace('(CONT\'D)', '').strip()
    if re.match('^[A-Z]+$', name):
        currently_speaking = name
    else:
        words_spoken[currently_speaking].update(line.split())
您可以使用更复杂的正则表达式来检测说话人何时改变,但这应该可以做到


如果你只想通过一次脚本(我想可能会很长)来计算你的计数,你可以只跟踪哪个角色在说话;设置类似于小型状态机的东西:

import re
from collections import Counter, defaultdict

words_spoken = defaultdict(Counter)
currently_speaking = 'Narrator'

for line in SCRIPT.split('\n'):
    name = line.replace('(CONT\'D)', '').strip()
    if re.match('^[A-Z]+$', name):
        currently_speaking = name
    else:
        words_spoken[currently_speaking].update(line.split())
您可以使用更复杂的正则表达式来检测说话人何时改变,但这应该可以做到


我将概述如何生成一个dict,该dict可以为所有演讲者提供所说的字数,并提供一个近似于现有实现的dict

import re

def wordsspoken(script,name):
    word_count = 0
    for line in script.split('\n'):
        if re.match('^[ ]{19}[^ ]{1,}.*', line): # name of speaker
            speaker = line.split(' (')[0][19:]
        if re.match('^[ ]{6}[^ ]{1,}.*', line): # dialogue line
            if speaker == name:
                word_count += len(line.split())
    print(word_count)

def main():
    name = input("Enter name:")
    wordsspoken(script, name)
    name1 = input("Enter another name:")
    wordsspoken(script, name1)
一般用途 如果我们将一个单词定义为沿“”拆分的字符串中的任何字符块(空格)

如果JOHN DOE说出55个单词,则生成格式为
{'JOHN DOE':55}
的dict

示例输出:

>>> word_count['MIA']

13
您的实现 下面是上述过程的一个版本,它近似于您的实现

import re

def wordsspoken(script,name):
    word_count = 0
    for line in script.split('\n'):
        if re.match('^[ ]{19}[^ ]{1,}.*', line): # name of speaker
            speaker = line.split(' (')[0][19:]
        if re.match('^[ ]{6}[^ ]{1,}.*', line): # dialogue line
            if speaker == name:
                word_count += len(line.split())
    print(word_count)

def main():
    name = input("Enter name:")
    wordsspoken(script, name)
    name1 = input("Enter another name:")
    wordsspoken(script, name1)

我将概述如何生成一个dict,该dict可以为所有演讲者提供所说的单词数量,并提供一个近似于现有实现的dict

import re

def wordsspoken(script,name):
    word_count = 0
    for line in script.split('\n'):
        if re.match('^[ ]{19}[^ ]{1,}.*', line): # name of speaker
            speaker = line.split(' (')[0][19:]
        if re.match('^[ ]{6}[^ ]{1,}.*', line): # dialogue line
            if speaker == name:
                word_count += len(line.split())
    print(word_count)

def main():
    name = input("Enter name:")
    wordsspoken(script, name)
    name1 = input("Enter another name:")
    wordsspoken(script, name1)
一般用途 如果我们将一个单词定义为沿“”拆分的字符串中的任何字符块(空格)

如果JOHN DOE说出55个单词,则生成格式为
{'JOHN DOE':55}
的dict

示例输出:

>>> word_count['MIA']

13
您的实现 下面是上述过程的一个版本,它近似于您的实现

import re

def wordsspoken(script,name):
    word_count = 0
    for line in script.split('\n'):
        if re.match('^[ ]{19}[^ ]{1,}.*', line): # name of speaker
            speaker = line.split(' (')[0][19:]
        if re.match('^[ ]{6}[^ ]{1,}.*', line): # dialogue line
            if speaker == name:
                word_count += len(line.split())
    print(word_count)

def main():
    name = input("Enter name:")
    wordsspoken(script, name)
    name1 = input("Enter another name:")
    wordsspoken(script, name1)

上面有一些好主意。在Python2.x和3.x中,以下内容应该可以正常工作:

import codecs
from collections import defaultdict

speaker_words = defaultdict(str)

with codecs.open('script.txt', 'r', 'utf8') as f:
  speaker = ''
  for line in f.read().split('\n'):
    # skip empty lines
    if not line.split():
      continue

    # speakers have their names in all uppercase
    first_word = line.split()[0]
    if (len(first_word) > 1) and all([char.isupper() for char in first_word]):
      # remove the (CONT'D) from a speaker string
      speaker = line.split('(')[0].strip()

    # check if this is a dialogue line
    elif len(line) - len(line.lstrip()) == 6:
      speaker_words[speaker] += line.strip() + ' '

# get a Python-version-agnostic input
try:
  prompt = raw_input
except:
  prompt = input

speaker = prompt('Enter name: ').strip().upper()
print(speaker_words[speaker])
示例输出:

Enter name: sebastian
I know what you mean. I get breakfast five miles out of the way just to sit outside a jazz club. It was called Van Beek. The swing bands played there. Count Basie. Chick Webb. It's a samba-tapas place now. Samba-tapas. It's... Exactly. The joke's on history.

上面有一些好主意。在Python2.x和3.x中,以下内容应该可以正常工作:

import codecs
from collections import defaultdict

speaker_words = defaultdict(str)

with codecs.open('script.txt', 'r', 'utf8') as f:
  speaker = ''
  for line in f.read().split('\n'):
    # skip empty lines
    if not line.split():
      continue

    # speakers have their names in all uppercase
    first_word = line.split()[0]
    if (len(first_word) > 1) and all([char.isupper() for char in first_word]):
      # remove the (CONT'D) from a speaker string
      speaker = line.split('(')[0].strip()

    # check if this is a dialogue line
    elif len(line) - len(line.lstrip()) == 6:
      speaker_words[speaker] += line.strip() + ' '

# get a Python-version-agnostic input
try:
  prompt = raw_input
except:
  prompt = input

speaker = prompt('Enter name: ').strip().upper()
print(speaker_words[speaker])
示例输出:

Enter name: sebastian
I know what you mean. I get breakfast five miles out of the way just to sit outside a jazz club. It was called Van Beek. The swing bands played there. Count Basie. Chick Webb. It's a samba-tapas place now. Samba-tapas. It's... Exactly. The joke's on history.

这是一个粗略的算法,可能需要对不需要的东西进行细化,如(CONT'D)等。这是一个粗略的算法,可能需要对不需要的东西进行细化,如(CONT'D)等。在剧本创作中,
(CONT'D)
的其他东西可以放在对话中字符名后的括号中。我将更改
行。替换
语句以反映这一点。@唯一的问题似乎是dict.将每个单词显示一次。但我需要一个数字end@duhaimePinged在屏幕编写中,只需将<代码>设置<代码>换成<代码>计数器
中的<代码>初始化(调整帖子以反映这一点),其他<代码>(续)
可以放在对话中字符名后的括号中。我将更改
行。替换
语句以反映这一点。@唯一的问题似乎是dict.将每个单词显示一次。但我需要一个数字end@duhaimepingedJust调出
set
用于
Counter
中的
words\u speaked
初始化(只是调整了帖子以反映这一点)我发现这是一个错误:回溯(最近一次调用):文件“/Users/*path*.py”,第19行,在wordsspoken(脚本,名称)中文件“/Users/*path*.py”,第13行,如果speaker==name:UnboundLocalError:local变量“speaker”在赋值之前被引用,你知道要更改什么吗?如果你给
wordsspoken
一个脚本,在该脚本中,在引入演讲者之前,第一行对话被读取。例如,如果使用MIA(CONT'D)
之后的所有内容,而不是整个脚本。这段代码不解释没有说话人的对话,但您可以通过指定一个通用名称或抛出没有说话人的对话行来实现。我得到的错误是:回溯(最近一次调用):File“/Users/*path*.py”,第19行,在wordsspoken(脚本,名称)文件“/Users/*path*.py”中,第13行,wordsspoken if speaker==name:UnboundLocalError:赋值前引用的局部变量“speaker”你知道要更改什么吗?如果你给
wordsspoken
一个脚本,在该脚本中,在引入演讲者之前读取第一行对话,就会发生这种情况。例如,如果使用MIA(CONT'D)
之后的所有内容,而不是整个脚本。这段代码不解释没有说话人的对话,但您可以通过指定一个通用名称或抛出没有说话人的对话行来实现。