Regex 提高Shell脚本性能_Regex_Performance_Unix_Shell_Grep - Fatal编程技术网

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Regex 提高Shell脚本性能_Regex_Performance_Unix_Shell_Grep - Fatal编程技术网

Regex 提高Shell脚本性能

regex performance unix shell grep

Regex 提高Shell脚本性能,regex,performance,unix,shell,grep,Regex,Performance,Unix,Shell,Grep,此shell脚本用于从$2中提取一行数据，如果它包含模式$line $line使用正则表达式[A-Z0-9.-]+@[A-Z0-9.-]+（一个简单的电子邮件匹配）构造，形成文件$1中的行 #! /bin/sh clear for line in `cat "$1" | grep -i -o -E "[A-Z0-9.-]+@[A-Z0-9.-]+"` do echo `cat "$2" | grep -m 1 "\b$line\b"` done 文件$1具有短数据行（80到

此shell脚本用于从

$2

中提取一行数据，如果它包含模式

$line

$line

使用正则表达式

[A-Z0-9.-]+@[A-Z0-9.-]+

（一个简单的电子邮件匹配）构造，形成文件

$1

中的行

#! /bin/sh

clear

for line in `cat "$1" | grep -i -o -E "[A-Z0-9.-]+@[A-Z0-9.-]+"`
do
    echo `cat "$2" | grep -m 1 "\b$line\b"`
done

文件
$1
具有短数据行（<100个字符），包含约50k行（约1-1.5MB）

文件
$2
有稍长的文本行（>80到<200），有超过200行（约200MB）

它运行的台式机有大量的RAM（6G）和2-4核的Xenon处理器

由于目前需要1-2小时才能完全运行（并输出到另一个文件），是否有任何快速修复方法来提高性能

注意：我愿意接受所有建议，但我们无法重新编写整个系统等。此外，数据来自第三方，并且容易出现随机格式。

如果$1是一个文件，请不要使用“cat | grep”。相反，直接将文件传递给grep。应该像

grep -i -o -E "[A-Z0-9.-]+@[A-Z0-9.-]+" $1

此外，您可能需要调整正则表达式。您至少应该在电子邮件地址中使用下划线（“389;”），因此

grep -i -o -E "[A-Z0-9._-]+@[A-Z0-9.-]+" $1

如果$1是一个文件，不要使用“cat | grep”。相反，直接将文件传递给grep。应该像

grep -i -o -E "[A-Z0-9.-]+@[A-Z0-9.-]+" $1

此外，您可能需要调整正则表达式。您至少应该在电子邮件地址中使用下划线（“389;”），因此

grep -i -o -E "[A-Z0-9._-]+@[A-Z0-9.-]+" $1

快速建议：

避免使用此选项，并将

cat X | grep Y

更改为

grep Y X

您可以处理

grep

输出，因为它是通过管道传输而不是使用反勾号生成的。使用反勾号需要先完成第一个

grep

，然后才能开始第二个

grep

因此：

下一步：

不要重复处理

$2

。它很大。您可以保存所有模式，然后对文件执行单个grep

将循环替换为

sed

不再重复

grep

：

grep -i -o -E "[A-Z0-9.-]+@[A-Z0-9.-]+" "$1" | sed -E 's/^|$/\\1/g' > patterns
grep -f patterns "$2"

最后，使用一些

bash

fancity（参见

manbash

→ 进程替换）我们可以丢弃临时文件，并在一个长行中执行此操作：

grep -f <(grep -i -o -E "[A-Z0-9.-]+@[A-Z0-9.-]+" "$1" | sed -E 's/^|$/\\b/g') "$2"

一次可以处理100个图案。它一次能做的越多，它就越少通过你的第二个文件。

快速建议：

避免使用此选项，并将

cat X | grep Y

更改为

grep Y X

您可以处理

grep

输出，因为它是通过管道传输而不是使用反勾号生成的。使用反勾号需要先完成第一个

grep

，然后才能开始第二个

grep

因此：

下一步：

不要重复处理

$2

。它很大。您可以保存所有模式，然后对文件执行单个grep

将循环替换为

sed

不再重复

grep

：

grep -i -o -E "[A-Z0-9.-]+@[A-Z0-9.-]+" "$1" | sed -E 's/^|$/\\1/g' > patterns
grep -f patterns "$2"

最后，使用一些

bash

fancity（参见

manbash

→ 进程替换）我们可以丢弃临时文件，并在一个长行中执行此操作：

grep -f <(grep -i -o -E "[A-Z0-9.-]+@[A-Z0-9.-]+" "$1" | sed -E 's/^|$/\\b/g') "$2"

一次可以处理100个图案。它一次可以执行的操作越多，它就越少地传递第二个文件。

问题在于您正在传输过多的shell命令，以及不必要地使用cat

一个可能的解决方案是只使用awk

awk 'FNR==NR{
    # get all email address from file1
    for(i=1;i<=NF;i++){
        if ( $i ~ /[a-zA-Z0-9.-]+@[a-zA-Z0-9.-]+/){
            email[$i]
        }
    }
    next
}
{
 for(i in email) {
    if ($0 ~ i) {
        print 
    }
 }
}' file1 file2

awk'FNR==NR{
#从文件1获取所有电子邮件地址
对于（i=1；i而言，问题在于您正在传输过多的shell命令，以及不必要地使用cat
一个可能的解决方案是只使用awk
awk 'FNR==NR{
    # get all email address from file1
    for(i=1;i<=NF;i++){
        if ( $i ~ /[a-zA-Z0-9.-]+@[a-zA-Z0-9.-]+/){
            email[$i]
        }
    }
    next
}
{
 for(i in email) {
    if ($0 ~ i) {
        print 
    }
 }
}' file1 file2

awk'FNR==NR{
#从文件1获取所有电子邮件地址
对于（i=1；i我将取消循环，因为将200万行文件灰显50k次可能非常昂贵；）
让你可以把这个循环拿出来
首先使用外部grep命令创建一个包含所有电子邮件地址的文件。
然后将其作为一个模式文件，使用grep-f进行第二次grep，我将执行循环，因为将一个200万行文件进行50k次的grep可能非常昂贵；）
让你可以把这个循环拿出来
首先使用外部grep命令创建一个包含所有电子邮件地址的文件。
然后将其用作模式文件，通过使用grep-f进行第二次grep，正如John Kugelman已经回答的那样，通过管道而不是使用反勾号来处理grep
输出。如果使用反勾号，将首先运行反勾号中的整个表达式，然后使用来自的输出运行外部表达式背景符号作为参数
首先，这将比必要的慢很多，因为管道将允许两个程序同时运行（如果它们都是CPU密集型的，并且您有多个CPU，这将非常好）。然而，这还有另一个非常重要的方面，线路
for line in `cat "$1" | grep -i -o -E "[A-Z0-9.-]+@[A-Z0-9.-]+"`

可能会对外壳产生渴望。大多数外壳（至少据我所知）限制命令行的长度，或者至少限制命令的参数，我认为这也可能成为for
循环的问题。
正如John Kugelman已经回答的那样，通过管道处理grep
输出，而不是使用backticks。如果使用backticks，则backticks中的整个表达式将首先运行，然后外部表达式将以backticks的输出作为参数运行
首先，这将比必要的慢很多，因为管道将允许两个程序同时运行（如果它们都是CPU密集型的，并且您有多个CPU，这将非常好）。然而，这还有另一个非常重要的方面，线路
for line in `cat "$1" | grep -i -o -E "[A-Z0-9.-]+@[A-Z0-9.-]+"`

可能会对shell处理变得太长。大多数shell（至少据我所知）限制命令行的长度，或者至少限制命令的参数，我认为这也可能成为for
循环的问题。
不要这样做




[performance]相关文章推荐



                                                        
Performance Oracle数据库性能相关
performanceoracle 
Performance JBoss5.1.0。GA关机
performance 
Performance IIS7 Web服务的性能问题
performanceweb-servicesiis-7 
Performance Dojo浏览器在加载模块文件方面的性能
performancedojo 
Performance mongo慢速查询
performancemongodb 
Performance StringHTTPMessageConverter加载导致java堆跳转的所有字符集
performancespring-mvccharacter-encoding 
Performance 在没有断点的情况下运行非常慢的应用程序
performancevb6 
Performance Oracle查询：它能更高效吗？
performanceoracle 
Performance Joomla 1.5：改进菜单相关查询
performance 
Performance 虚拟投票已从虚拟投票属性中投票
performanceentity-framework 
Performance css和js的Typo3 Neos性能
performancedocker 
Performance 从大列表计算所有差异值的算法
performancealgorithm 
Performance 多重线性优化
performancematlab 
Performance Tomcat似乎只向套接字发送64k，然后在发送下一个64k数据块之前等待ack
performancesocketstomcattcp 
Performance 重建同步驱动程序与异步驱动程序的MongoDB Java性能对比
performancemongodb 
Performance 在圆上选择邻居
performancematlab 
Performance 如何在Grpc服务器实现中模拟Grpc客户端
performanceunit-testinggo 
Performance OpenCL clCreateContextFromType函数导致内存泄漏
performancedebuggingmemory-leaksopencl 
Performance 根据一个列表对元素进行分组，并将另一个列表中的相应条目相加
performanceoptimization 
Performance JMeter中某些用户的部分注销
performancejmeter 
                                       





随机文章推荐



                                                        
Pine script hline没有'；在PineScript中无法按预期工作
pine-script 
Pine script 是否可以用一个脚本交易多个资产？
pine-script 
Pine script 使用TradingView和Pine进行源代码控制
pine-script 
Pine script 未声明的标识符'shigh'；tradingview（脚本）
pine-script 
Pine script 时间戳不适用于内置变量
pine-script 
Pine script 我该如何问一个好问题？这是错误吗？
pine-script 
Pine script 从头开始计算时的标准偏差不匹配
pine-script 
Pine script 用有限的长度画一条水平线
pine-script 
Pine script 交易一天太晚了
pine-script 
Pine script 使用Pine脚本是否可以将某些文件中的一些外部数据添加到TradingView图表中？
pine-script 
Pine script tradingview在使用+；公式中
pine-script 
Pine script 监控实时价格行为的Tradingview伪造指标
pine-script 
Pine script 如何在pine脚本中输入酒吧开盘价的多头/空头策略？
pine-script 
Pine script 使用多个数组元素为相同符号生成多行（Pine脚本）
pine-script 
Pine script Pinescript-十进制输入选项
pine-script 
Pine script 目前正在input.resolution之间切换，希望使用安全功能，但可以'；我没办法
pine-script 
Pine script 我如何才能在我的战略中获得真正的价值，在松树脚本的交叉/交叉下？
pine-script 
Pine script 如何为过滤器订单使用回测线？松树
pine-script 
Pine script 上周H/L指标仅在小时及以下时间段可见
pine-script 
Pine script 关于交易视图内置指示器自定义筛选程序
pine-script


                                        

                                        
                                        


                                                
                                                        [regex]相关推荐
                                                        
                                                        
                                                

                                                
                                                        Tags
                                                        
Push Notification
Scala
Tridion
Time
Angular Material
Common Lisp
3d
Heroku
Algorithm
Redux
Mysql
Eclipse Rcp
Cocoa Touch
Xsd
C#
React Native
Xslt
Hazelcast
Zurb Foundation
Entity Framework 4
Websocket
Compiler Construction
Version Control
Windows Phone 7
Ocaml
Cucumber
Webpack
Elm
Computer Science
Yii
Xna
File Upload
Xaml
Dictionary
Tableau Api
Hbase
Dart
Nginx
Python 2.7
Google Colaboratory
Angularjs
Biztalk
Silverlight
Swiftui
Azure Sql Database
Artificial Intelligence
Webstorm
Exception
Microservices
For Loop
Date
Kibana
Pytorch
Mapping
Linq To Sql
Marklogic
Dll
Report
Grid
Octave
Typescript
Macros
Stata
Telegram
Bazel
Backbone.js
Nlp
Inheritance
Rss
Nhibernate
Spotify
Cobol
Parsing
Gradle
Apache Zookeeper
Ssis
Jsf
String
Firebase
Colors
Orientdb
Encoding
Protractor
Amazon Redshift
Scrapy
Email
Cloud
Nuget
Weblogic
Linux Kernel
Ffmpeg
Permissions
Keras
Routes
Apache2
Functional Programming
Dynamics Crm
Llvm
Graphql
Couchbase
Crystal Reports
Neo4j
Browser
Dynamic
Docker Compose
Keycloak
Methods
Google Calendar Api
Mobile
Spring Batch
System Verilog
Microsoft Graph Api
Regex
Graph
Android Fragments
Itext
Uitableview
Responsive Design
Windows Services
Drupal 7
Streaming
Subsonic
Asterisk
Google Visualization
Operating System
Mdx
Awk
Stream
Visual Studio 2012
Ibm Mq
Sql Server
Django Models
Visual C++
Laravel 5
Sap
Jms
Ms Word
Timer
Asp.net Mvc 3
Generics
If Statement
Dojo
Parse Platform
Openid
Aurelia
Doxygen
Delphi
Jquery
Google Cloud Firestore
Exception Handling
Abap
Ms Access
C++
Composer Php
Qt
Drop Down Menu
Jira
Gruntjs
Julia
Clearcase
Gremlin
Windbg
Wicket
Sharepoint 2013
Codenameone
Neural Network
Adobe
Soap
Hive
Caching
Modelica
Cmake
Extjs
Fiware
Google Compute Engine
Rally
Pycharm
Sonarqube
Numpy
C
Requirejs
Database Design
Editor
Youtube Api
Doctrine
Apache Kafka
Raspberry Pi
Ip
Database
Forms
Omnet++
Prestashop
Tags
Docusignapi
Floating Point
Rxjs
Robotframework
Angular
Tcp
Air
Fonts


                

                        
						
                        
                                
                                        
                                                
                                                        
                                                                Copyright © 2024. All Rights Reserved by  - Fatal编程技术网