在python中解析聊天日志，当前使用BeautifulSoup_Python_Html Parsing_Chat - Fatal编程技术网

在python中解析聊天日志，当前使用BeautifulSoup

python

在python中解析聊天日志，当前使用BeautifulSoup,python,html-parsing,chat,Python,Html Parsing,Chat,我在使用Python 2.7解析IM聊天日志时遇到一些问题。我目前正在使用beautifulsou.get\u text。这通常有效，但有时会掩盖有趣的东西。例如： <font color="#A82F2F"><font size="2">(3/11/2016 3:11:57 PM)</font> <b>user name:</b></font> <html xmlns='http://jabber.org/proto

我在使用Python 2.7解析IM聊天日志时遇到一些问题。我目前正在使用

beautifulsou.get\u text

。这通常有效，但有时会掩盖有趣的东西。例如：

<font color="#A82F2F"><font size="2">(3/11/2016 3:11:57 PM)</font> <b>user name:</b></font> <html xmlns='http://jabber.org/protocol/xhtml-im'><body xmlns='http://www.w3.org/1999/xhtml'><p>Have you posted the key to <a href="https://___.edu/sshkeys/?">https://___.edu/sshkeys/?</a></p></body></html><br/>

（2016年11月3日下午3:11:57）用户名：你把钥匙贴到

在本例中，我得到了

是否已将密钥发布到部分，但它去掉了https:\uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
部分
大多数（并非所有）行的格式都相同。i、 日期、时间、用户、有趣的东西
有没有更好的解析方法来获取文本和所有有趣的内容？
您可以利用：
取决于你想如何输出这些信息，你必须或多或少地变得聪明。
这对我来说很有用-获取（3/11/2016 3:11:57 PM）用户名：你发布了https://___.edu/sshkeys/?
使用获取文本（）
时。把你目前掌握的代码贴出来。谢谢，嗯。我不太喜欢：soup=BeautifulSoup（I，“lxml”）soupy=soup.get_text（）
for anchor in soup.find_all('a', href=True):
    print("The anchor url={} text={}".format(anchor['href'], anchor['text'])




[cordova]相关文章推荐



                                                        
Cordova 使用phonegap，存储数据的好方法是什么？
cordova 
防止cordova提示提供电话信息的好方法是什么
cordova 
Cordova 因内存泄漏问题导致iOS 7录制崩溃
cordovaios7 
Cordova PhoneGap/WP8-无法加载带有参数的url
cordovawindows-phone-8windows-phone 
Cordova phonegap中的编译错误，生成时出错？
cordova 
如果没有wifi连接，Cordova插件网络将崩溃
cordova 
Cordova 科尔多瓦。我使用推送通知还是调度程序？
cordovapush-notification 
Cordova 单击电话号码时如何打开应用程序选择器？
cordova 
如何在VisualStudio中引用自定义cordova插件的js文件？
cordovavisual-studio-2013 
使用Meteor Cordova从后台加载应用程序时，回调函数是什么？
cordovameteor 
我收到错误-错误：找不到模块'；科尔多瓦普通'；安装Cordova时
cordovainstallation 
Apache Cordova支持的iOS版本
cordova 
输入类型=“文件”在cordova中不起作用
cordova 
如何从cordova应用程序调用自定义cordova插件？
cordova 
Cordova 是否可以使内容安全策略中的内容src成为动态的？
cordova 
Cordova添加平台：未能安装'；cordova插件白名单'；CordovaError:JDK 1.8或更高版本的要求检查失败
cordova 
Cordova 要发送到服务器ionic 2+；
cordovaionic-frameworkionic2 
向Cordova项目添加插件时：Meteor
cordovameteor 
Cordova 爱奥尼亚3-应用程序关闭时如何接收本地通知
cordovaionic-framework 
Cordova Ionic 3在未安装插件的情况下首次启动iOS 12时失败
cordova 
                                       





随机文章推荐



                                                        
Jms SUN IMQ主题消息预览
jms 
Jms 将文件从一台服务器复制到另一台服务器
jms 
JMS/ActiveMQ动态创建/删除主题
jmsactivemq 
JMS在欧洲的英语或俄语培训
jms 
Jms 在注册持久订阅后重置ActiveMQ
jmsactivemq 
Jms ActiveMQ和MSMQ之间的桥接
jmsactivemq 
Jms 慰安妇'；t在Camel中使用并行线程运行
jmsapache-camel 
是否存在JMS/Websphere MQ消息分段的解决方法？
jmsibm-mq 
Jms 如何有选择地处理来自websphere MQ的消息
jmsibm-mq 
使用Java 1.6通过JMS连接到SQS
jms 
如何使用Glassfish服务器增加JMS队列的最大池大小
jms 
Jms ServiceMix:logmessage"；无法启动代理程序"；
jmsactivemq 
JMS主题时间依赖关系
jms 
Jms 无法将对象类型消息发布到Mule 4中的消息传递队列（ActiveMQ）
jmsmule 
是否可能有一个远程池化JMS连接工厂（WildFly 10）？
jms 
使用Spring的具有不同主机的多个JMS侦听器
jms


                                        

                                        
                                        


                                                
                                                        [python]相关推荐
                                                        
Python 子进程stdin.write期间管道破裂
									Python
							 
Python 使用pymongo从_id引用自动检索和嵌入文档
									Python
							 									Mongodb
							 
Python 多处理时如何获取每个进程ID
									Python
							 
Python Pygame没有'；无法捕获mac上的按键关闭事件
									Python
							 									Macos
							 
Python ValueError:scikit学习的随机森林分类
									Python
							 									Machine Learning
							 									Scikit Learn
							 
Python 区别np。在哪里和这个工作？
									Python
							 									Numpy
							 
Python dict后代的快速复制
									Python
							 									Performance
							 									Dictionary
							 
Python Angular post使用Django Rest框架在Django中返回403，而get返回301
									Python
							 									Angularjs
							 									Django
							 									Post
							 									Django Rest Framework
							 
Python 从另一个子包导入子包的模块项
									Python
							 
Python Django单元测试期间需要运行芹菜工人
									Python
							 									Django
							 
Python 使用scrapy抓取页面
									Python
							 									Scrapy
							 
保存到文本文件-Python时，列表中的所有项都会重复
									Python
							 									List
							 									Io
							 
Python SQLite数据库被SELECT子句锁定
									Python
							 									Sqlite
							 
Python 操纵单元测试&x27；s Fixture classmethods通过元类
									Python
							 
使用python检索最新的TLS密码套件
									Python
							 									Ssl
							 									Encryption
							 
Python 如何设置实时视频捕获的时间限制？
									Python
							 									Python 2.7
							 									Video
							 									Image Processing
							 
Python 是否可以为熊猫中的多行使用一个唯一索引？
									Python
							 									Pandas
							 
Python staticmethod和classmethod的区别
									Python
							 									Oop
							 									Methods
							 
Python 如何使用pandas读取gzip hdf5
									Python
							 									Pandas
							 
Python Django：使用模型中的文件字段将文件上载到RESTAPI
									Python
							 									Django
							 									Rest
							 									Api
							 
Python 动态规划w/PyTorch：最长公共子序列
									Python
							 									Pytorch
							 
Python 子列表中意外反映的列表更改列表
									Python
							 									List
							 
Zapier代码中的Python mulitpart/formdata POST请求
									Python
							 
Python 什么命令会在bash脚本中引发错误？
									Python
							 									Macos
							 
Python3和限制用户手动输入调频电台频率
									Python
							 
Python+；Selenium firefox webdriver-从网站中取出图像
									Python
							 									Selenium
							 
重建sklearn图像块会产生空白图像
系统：
python:3.6.8 | Anaconda定制（64位）|（默认值，2018年12月29日，19:04:46）[GCC 4.2.1兼容Clang 4.0.1（标记/发布(U 401/final）]
可执行文件：/Users/steve/miniconda3/envs/retinanet/bin/python
机器：Darwin-18.2.0-x86_64-i386-64位
布拉斯：
宏：SCIPY\u MKL\u H=None，HAVE\u CBLAS=
									Python
							 									Scikit Learn
							 									Computer Vision
							 
Python 从API提取时发生JSON解码错误？
									Python
							 									Api
							 
Python 部署到Elastic Beanstalk时Django出现了望台区域错误
									Python
							 									Django
							 									Amazon Web Services
							 
在Python中使用click模块传递默认参数
									Python
							 
                                                        
                                                

                                                
                                                        Tags
                                                        
Asp.net Mvc 4
Opencart
Macos
Encoding
Octave
Linkedin
Primefaces
Url
Doctrine
Gps
Sql Server 2008 R2
Lisp
Twig
Safari
Eclipse Rcp
Nosql
Dom
Office365
Workflow
Ibm Midrange
Xml
Asterisk
User Interface
Ruby On Rails 3.1
Mercurial
Jsp
Go
Sapui5
Junit
Windows 7
Sharepoint 2013
Dialogflow Es
Sql Server 2008
Computer Science
Memory
Gmail
Jasper Reports
Bluetooth
Domain Driven Design
Web Crawler
Ruby On Rails
Google Analytics
Assembly
Apache Storm
Ftp
Xampp
Jira
Asp Classic
Actions On Google
Android Emulator
Transactions
Ssrs 2008
Editor
Azure Cosmosdb
Stored Procedures
Jsf
Sms
Web Services
Nunit
Tkinter
Composer Php
Kubernetes
Mariadb
Linux
Coding Style
Jms
Mongodb
Responsive Design
Vector
Prometheus
Cocoa
Visual Studio 2013
Android
Shell
Xamarin.forms
Google Drive Api
Sql
Dependencies
Model
Protocol Buffers
Fluent Nhibernate
Jvm
Virtual Machine
Module
Stream
Terraform
Sql Server
C#
Antlr4
Sphinx
Botframework
Sugarcrm
Project Management
Ms Access
Knockout.js
Kendo Ui
Plone
Zsh
Jdbc
Fortran
Azure Active Directory
Time Complexity
Yocto
Listview
Yii
Spring Integration
Dotnetnuke
Express
Vhdl
Couchbase
Graphviz
Xamarin.ios
Ffmpeg
Google Cloud Dataflow
Ethereum
Pointers
Logging
Tree
Visual Studio Code
Silverlight
Laravel 5
Shopify
Numpy
Iphone
Events
Apache Pig
Python 2.7
React Native
Batch File
Acumatica
Fonts
Amazon Dynamodb
Facebook
Common Lisp
Cmake
Sails.js
Spotify
Testing
Emacs
Windows Phone 7
Oauth
Interface
Testng
Symfony
Unicode
Google Visualization
Sas
Spring Mvc
Office Js
Logstash
Parse Platform
Rally
Microsoft Graph Api
Dynamics Crm
Oracle Apex
Opengl Es
Uiview
Function
Sdk
Firefox
Gulp
Hazelcast
Phpstorm
Artifactory
Wcf
Scikit Learn
Kotlin
Cypress
Swift
If Statement
Arangodb
Permissions
Gstreamer
Vbscript
Javafx 2
Sencha Touch
Appium
Glsl
Combobox
Isabelle
Forms
Continuous Integration
Selenium Webdriver
Socket.io
Amazon Ec2
Amazon Cloudformation
Erlang
Jakarta Ee
Google Chrome Devtools
Doxygen
C
Nginx
Doctrine Orm
Zend Framework
Import
Aurelia
Neo4j
Canvas
Reference
Input
Ag Grid


                

                        
						
                        
                                
                                        
                                                
                                                        
                                                                Copyright © 2024. All Rights Reserved by  - Fatal编程技术网