Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x 有没有办法提取';公司名称'';职位名称';和';工作地点';从下面的每一行字符串_Python 3.x_Nlp_Data Processing - Fatal编程技术网

Python 3.x 有没有办法提取';公司名称'';职位名称';和';工作地点';从下面的每一行字符串

Python 3.x 有没有办法提取';公司名称'';职位名称';和';工作地点';从下面的每一行字符串,python-3.x,nlp,data-processing,Python 3.x,Nlp,Data Processing,从下面的每一行字符串中,我想提取公司名称、职务和工作地点。有没有办法做到这一点?因为模式不一致。谢谢 "Jerry (YC S17) Is Hiring Senior Software Dev, Data Engineer (Toronto/Remote)" "Iris Automation Is Hiring an Account Executive for B2B Flying Vehicle Software" "Strikingly (YC W13) is hiring in our

从下面的每一行字符串中,我想提取公司名称、职务和工作地点。有没有办法做到这一点?因为模式不一致。谢谢

"Jerry (YC S17) Is Hiring Senior Software Dev, Data Engineer (Toronto/Remote)"

"Iris Automation Is Hiring an Account Executive for B2B Flying Vehicle Software"

"Strikingly (YC W13) is hiring in our Shanghai office"

"BuildZoom (YC W13) is hiring  help make remodeling cheaper"

"EquipmentShare (YC W15) Is Looking for an Experienced React Native Dev"

"Saleswhale (YC S16) AI Assistant Startup Is Hiring Customer Success Managers"

"Streak (YC S11) is profitable, well funded and hiring in Vancouver"

"Tesorio (YC S15) Is Hiring Engineering Managers, Senior Python Engineer"

"Checkr (YC S14) is hiring engineers to build the future of online trust"

"Rescale Is Hiring a Senior DevOps Engineer in San Francisco"

"Tremendous.com is hiring its first engineer"

"Remix is looking for a front-end engineer to help build better public transit"

"Atomwise (YC W15) Is Hiring a Senior Machine Learning Research Scientist in SF"

"Confident Cannabis (YC S15) Is Hiring Engineers"

"WaystoCap (YC W17) is hiring a software engineer in Spain"

"Smarking (YC W15) Is Hiring a Customer Service Manager"

"Sunsama (YC W19) Is Hiring a Senior Full Stack Engineer (RN/GraphQL/Node)"

"Pachyderm Raised $10M and Is Looking for a Senior Full-Stack Engineer"

"Picktrace (YC S15) is hiring a senior Android engineer"

"Segment is hiring engineers to create our developer platform"

"XIX Is Hiring a Senior Front End Engineer"

"Athelas (YC S16) is hiring software engineers"

"Dyneti (YC W19) is hiring software engineers"

"ZeroCater (YC W11) Is Hiring a Principal Engineer in SF: Must Love Food"

"Mux is looking for developers who want to help developers build better video"

"Munich, Germany: Demodesk (YC W19) Is Hiring Software Engineers"

"New Story (YC Nonprofit) Hiring a JavaScript Software Engineer"

"Quit Genius (YC W18) Is Hiring a Product Manager in London"

"Flexport is hiring senior engineers in SF  Come get to know us"

"OneSignal Is Hiring Ruby on Rails and DevOps Engineers in San Mateo"
*************这就是我想要的**************

例1

Jerry(YC S17)正在招聘高级软件开发人员、数据工程师(多伦多/远程)

公司名称:杰瑞

职位名称:高级软件开发人员,数据工程师

地点:多伦多/偏远地区

例2

“Remix正在寻找一名前端工程师来帮助建设更好的公共交通”

公司名称:Remix

职位名称:前端工程师

地点:

例3

德国慕尼黑:Demodesk(YC W19)正在招聘软件工程师

公司名称:Demodesk

职位名称:软件工程师

地点:德国慕尼黑

  • 收集大量这样的例子。大概10万左右就可以了,但如果你能得到100万个样本,那就更好了
  • 手动拆分它们。如果你能负担得起,如果你把工作分配给实习生/机械土耳其人/等等,速度会快得多
  • 在数据集上训练ML模型。不要忘记随机抽取样本进行培训/测试。以90%以上的准确率为目标,但也不要过度拟合数据
  • 收集大量这样的例子。大概10万左右就可以了,但如果你能得到100万个样本,那就更好了
  • 手动拆分它们。如果你能负担得起,如果你把工作分配给实习生/机械土耳其人/等等,速度会快得多
  • 在数据集上训练ML模型。不要忘记随机抽取样本进行培训/测试。以90%以上的准确率为目标,但也不要过度拟合数据

  • 据我所知,如果模式不一致,我们就无法从任何字符串中提取数据。
    它只能由人类完成,因为他们具有理解技能,或者根据我的知识,如果模式不一致,则您需要实现ML。

    ,因为我们无法从任何字符串中提取和获取数据。
    它只能由具有理解技能的人来完成,或者您需要实现ML。

    我们可以轻松地使用SPACY、CRF、StanfordNLP和LSTM模型。70%的培训和30%的数据测试。我更喜欢双向LSTM。我们可以轻松使用SPACY、CRF、StanfordNLP和LSTM模型。70%的培训和30%的数据测试。我更喜欢双向LSTM。事实上,对于相对简单的模型(如预训练单词嵌入+CRF),需要的示例要少得多。根据我的经验,大约1000个示例已经足够获得良好的质量。至于模型的确切结构,您可以重用现有架构进行词性标记。事实上,对于相对简单的模型(例如,预训练单词嵌入+CRF),需要的示例要少得多。根据我的经验,大约1000个示例已经足够获得良好的质量。至于模型的确切结构,您可以重用现有体系结构进行词性标记。