Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Regex 正则表达式-记事本++;搜索并替换丢失的线路_Regex_Csv_Notepad++ - Fatal编程技术网

Regex 正则表达式-记事本++;搜索并替换丢失的线路

Regex 正则表达式-记事本++;搜索并替换丢失的线路,regex,csv,notepad++,Regex,Csv,Notepad++,我对regex非常陌生,我正在尝试使用Notepad++清理一些CSV文件。我运行的是7.8.2版(64位),因为我的文件太大,32位版本无法打开 在数据中,大多数字段都是标准化的,并由系统自动生成。每行正好有30个字段。但是,用户可以在一个字段中输入注释,在少数情况下,用户在该字段中输入了换行符。发生这种情况时,Notepad++会为此数据创建新行 例如,下面的第三行应该是第二行的延续(根据原始帖子中的简明示例编辑): 我正在尝试删除第二行中的额外换行符,以使数据看起来像: "39901","

我对regex非常陌生,我正在尝试使用Notepad++清理一些CSV文件。我运行的是7.8.2版(64位),因为我的文件太大,32位版本无法打开

在数据中,大多数字段都是标准化的,并由系统自动生成。每行正好有30个字段。但是,用户可以在一个字段中输入注释,在少数情况下,用户在该字段中输入了换行符。发生这种情况时,Notepad++会为此数据创建新行

例如,下面的第三行应该是第二行的延续(根据原始帖子中的简明示例编辑):

我正在尝试删除第二行中的额外换行符,以使数据看起来像:

"39901","0002286898","88","ACTUALS","TO RECORD ACCRUED LIABILITIES FOR GOODS OR SERVICES RECEIVED AT JUNE 30 2016 PER ATTACHED SCHEDULE. FOR 39901, IU journal  2297455 CONTACT: [NAME PHONE NUMBER] / [NAME PHONE NUMBER]","LA","34000000","Accrued Liabilities","","11000","","","","","","","","","","","","","2017","1","191313.130","07/28/2016","07/01/2016","","Accrued Liabilities","" 
"39901","0002290128","7","ACTUALS","To record accrued liabilities for goods or services received at June 30, 2016 per the attached schedule.  Contact [NAME PHONE NUMBER EMAIL] or [NAME PHONE NUMBER EMAIL]","LA","34000000","Accrued Liabilities","","11000","","","","","","","","","","","","","2017","1","2556242.170","07/31/2016","07/01/2016","","Accrued Liabilities","" 
"39901","0002291224","37","ACTUALS","TO RECORD ACCRUED LIABILITIES FOR GOODS OR SERVICES RECEIVED AT JUNE 30 PER THE ATTACHED SCHEDULE.  FOR 34530, CONTACT: [NAME PHONE NUMBER EMAIL]","LA","34000000","Accrued Liabilities","","11000","","","","","","","","","","","","","2017","1","3010262.140","07/27/2016","07/01/2016","","Accrued Liabilities","" 
"39901","0002291259","2","ACTUALS","TO RECORD ACCRUED LIABILITIES FOR GOODS OR SERVICES RECEIVED AT JUNE 30 PER THE ATTACHED SCHEDULE.  FOR 34571, CONTACT: [NAME PHONE NUMBER] / [NAME PHONE NUMBER]","LA","34000000","Accrued Liabilities","","11000","","","","","","","","","","","","","2017","1","38140.260","07/27/2016","07/01/2016","","Accrued Liabilities","" 
"39901","0002291336","12","ACTUALS","TO RECORD ACCRUED LIABILITIES FOR GOODS OR SERVICES RECEIVED AT JUNE 30  PER ATTACHED SCHEDULE. FOR 345.20","LA","34000000","Accrued Liabilities","","11000","","","","","","","","","","","","","2017","1","2768000.000","08/01/2016","07/01/2016","","Accrued Liabilities",""
没有回车符,只有换行符,因此搜索
\n
也会标记所有应该合法结束该行的换行符

在这种情况下,数据的结构使最后一列始终为空
(“”)
。因此,我尝试搜索结尾不为空的行,即以字母、数字、句点、空格等结尾的行。我的计划是用唯一的奇数词替换这些实例,然后进行第二次扩展搜索和替换,以删除新表达式和换行符

虽然笨拙,但我一直在分步进行:

  • \d{1}$
    查找最后一个字符为数字的行
  • \w{1}$
    查找最后一个字符为字母的行
  • \s{1}$
    查找最后一个字符为空白的行;及
  • \.$
    查找以句点结尾的行

然后,我将进行最后一次搜索,以查找任何不是以
39901
开头的掉队者

我将这些搜索作为常规搜索运行,然后将其替换为
REPLACEHERE999
,我假设没有其他人输入数据。我知道这将删除并替换行中的最后一个字符–最终的数字、字母、空格等–但我可以接受。在完成这些替换之后,我计划进行第二次扩展搜索,用一个空格替换掉
REPLACEHERE999\un
,同时去掉
REPLACEHERE999\uu
和换行符

当我进行第一次搜索时,他们会根据我最初在Power Query–377中获得的错误数进行合理的替换,例如
\d{1}$
。但是,一旦我进行了这些替换,行数就会显著减少。最初,我有3919186行,但在第一次搜索和替换之后—
\d{1}$
,我只有1543818行,不到我开始时的一半。当我一次完成前几个替换时,我不会丢失行,但当我使用“全部替换”时,它们就会消失

同样,我刚开始使用regex/Notepad++,所以我可能缺少一些基本的东西。但是,如果我只做了有限数量的替换,为什么我的很多行都消失了呢

欢迎对我的搜索或思考提出意见和建议,但消失的线条是这里的关键问题

谢谢

  • Ctrl+H
  • 查找内容:
    \R(?)
  • 替换为:
    留空
  • 检查环绕
  • 检查正则表达式
  • 全部替换
说明:

\R          # any kind of linebreak
(?!“)       # negative lookahead, make sure we haven't “ after
屏幕截图(之前):

\R          # any kind of linebreak
(?!“)       # negative lookahead, make sure we haven't “ after

屏幕截图(之后):

\R          # any kind of linebreak
(?!“)       # negative lookahead, make sure we haven't “ after

  • Ctrl+H
  • 查找内容:
    \R(?)
  • 替换为:
    留空
  • 检查环绕
  • 检查正则表达式
  • 全部替换
说明:

\R          # any kind of linebreak
(?!“)       # negative lookahead, make sure we haven't “ after
屏幕截图(之前):

\R          # any kind of linebreak
(?!“)       # negative lookahead, make sure we haven't “ after

屏幕截图(之后):

\R          # any kind of linebreak
(?!“)       # negative lookahead, make sure we haven't “ after

假设每行正好包含30列,每列可以包含双引号以外的任何字符:

打开扩展模式和正则表达式搜索并环绕, 您可以通过两个步骤完成此操作:

  • 删除所有换行符

  • 使用此正则表达式,
    ((“[^”]*”,){29}([^”]*”)\s?

    并将其替换为“替换为:”字段中的
    $1\n

  • 说明:

    • 每个字段的格式为
      “[^”]*”
      。在您的示例中,共有30行,前29行后跟逗号
    • 在我的正则表达式中,允许的字符是除双引号以外的所有字符
    • 让我们将
      [^”]
      表示为
      \x
      。然后每个字段的格式为
      “\x*”
      ,然后我们将regex
      (“\x*”,{29}”\x*”)
      重复多次。我们为该格式的每个段添加一行新行
    • \s?
      可以处理每30个条目后的剩余空间

    注意:链接使用上一个包含较少的正则表达式。

    假设每行正好包含30列,每列可以包含双引号以外的任何字符:

    打开扩展模式和正则表达式搜索并环绕, 您可以通过两个步骤完成此操作:

  • 删除所有换行符

  • 使用此正则表达式,
    ((“[^”]*”,){29}([^”]*”)\s?

    并将其替换为“替换为:”字段中的
    $1\n

  • 说明:

    • 每个字段的格式为
      “[^”]*”
      。在您的示例中,共有30行,前29行后跟逗号
    • 在我的正则表达式中,允许的字符是除双引号以外的所有字符
    • 让我们将
      [^”]
      表示为
      \x
      。然后每个字段的形式为
      “\x*”
      ,然后我们将regex
      (“\x*”,{29}”\x*”)
      重复多次。我们为每个segme添加一行新行