Bash 搜索该单词并使用shell脚本导出该单词后的35个字符

Bash 搜索该单词并使用shell脚本导出该单词后的35个字符,bash,shell,sed,Bash,Shell,Sed,我有一个文件input.txt,里面有很多奇怪的字符、html标签和有用的材料。我想在单词“description”后显示35个字符,不包括像$&lmp这样的奇怪字符,并且在新文件output.txt中没有html标记 输入样本: </image> <title>A Londoner Looks Back: Were The Olympics Awesome?</title> <link>http://www.askmen.com/

我有一个文件input.txt,里面有很多奇怪的字符、html标签和有用的材料。我想在单词“description”后显示35个字符,不包括像$&lmp这样的奇怪字符,并且在新文件output.txt中没有html标记

输入样本:

    </image>
  <title>A Londoner Looks Back: Were The Olympics Awesome?</title>
  <link>http://www.askmen.com/sports/fanatic/london-olympics-post-mortem.html</link>
  <description rdf:parseType="Literal">

                The other evening I walked out of London&amp;rsquo;s &lt;a
href="http://www.askmen.com/fashion/watch_100/135_olympic-watches.html"&gt;Olympic
stadium onto the new &amp;ldquo;Javelin&amp;rdquo; train into town. (The journey from east to
central London, quite recently still something of a commuter&amp;rsquo;s nightmare, took just
six minutes.) A railway worker on the platform didn&amp;rsquo;t just point everyone the way
onto the train; he did a dance for us. You don&amp;rsquo;t usually get that on London
transport. These Olympics made the city happier.I now live in Paris, but I
consider myself a Londoner. I went to nursery school in London, spent 15 years of my life
in the city, speak in a London accent, visit my parents and siblings here, and, as someone
of mongrel origin who belongs nowhere, I feel at home in the world&amp;rsquo;s most
cosmopolitan city. To steal a line from the 1980s film Sammy and Rosie Get Laid:
&amp;ldquo;I&amp;rsquo;m not English. I&amp;rsquo;m a Londoner.&amp;rdquo; But London is also a sprawling,
gray, wet, overpriced city where traveling anywhere always seems to take forever, and
Londoners are not positive people. In fact, we are whiners. Going into the
Olympics, the whining was at full blast. Landing in London days before &lt;a
href="http://www.askmen.com/sports/bodybuilding/olympic-bobsledding.html"&gt;the Games
began, I found my friends and family full of dread. The Games&amp;rsquo; organizers had
indicated that while the Olympics were on, traveling anywhere would take even longer than
forever. My sister had been told to be at her desk at 7 a.m. during the Games to avoid the
rush hour -- this in a city where many people start work nearer to 10 a.m. A friend showed
me a kind of war scenario prepared by the bank where he worked, full of ominous questions
like, &amp;ldquo;What if your supply chain stopped?&amp;rdquo; &amp;ldquo;What if your technology
failed?&amp;rdquo; &amp;ldquo;What if your brand, image and reputation were impacted by any of the
above?&amp;rdquo; And what was all this upheaval in aid of? To watch some doped-up
moustachioed Eastern European women win incomprehensible weightlifting events? In a YouGov
survey days before the opening ceremony, only 51% of Britons expressed an interest in the
Olympics -- and that was a lot better than earlier surveys.On the day of the
opening ceremony I happened to have a meeting down the street from my last &lt;a
href="http://www.askmen.com/london/"&gt;London address (a shared flat above a now defunct
liquor store). I ran to Baker Street tube, as I&amp;rsquo;d done a thousand times before. Then
I got on a media bus to the opening ceremony that passed Southwark Bridge with the
Financial Times building where I had worked in the 1990s. It was like a dream:
You move through a familiar landscape that has been transformed. The Olympics helped me
see London afresh.It was during the opening ceremony that the mood among
Londoners changed. I know foreigners didn&amp;rsquo;t get all the references: the Windrush
ship that brought the first Jamaican immigrants to Britain in 1948, the BBC weather
forecaster Michael Fish assuring us there would be no hurricane the night before one
struck in 1987, the dance of the state-funded National Health Service nurses. But Londoners got it. Danny Boyle, the director, gave us a multicultural and funny
Britain that had finally shed its imperial delusions of grandeur. The Olympic torch was
run into the stadium not by an Aryan superman but by the pot-bellied middle-aged ex-rower
Steve Redgrave, who can&amp;rsquo;t run. For the first time in my life, Boyle&amp;rsquo;s Britain
made me feel a patriot. The opening ceremony remains the highlight of my Olympics.Then the sports began, and with it the instinctive expectation that the Brits
would fall flat on their faces. We may have invented modern sports, but England&amp;rsquo;s
soccer team hasn&amp;rsquo;t won a prize since 1966, and no British man has won Wimbledon
since 1936. Surely our Olympians would continue the tradition?&amp;nbsp;It seemed
so on the first day, when Britain&amp;rsquo;s much-hyped male cyclists failed to win a medal
or even to figure in the run-in in front of Buckingham Palace. Only on the second day did
our first medal arrive: A silver for cyclist Lizzie Armitstead, a polite young vegetarian
from the rural north so little-known that at the press conference she had to introduce
herself to the nation. &amp;ldquo;I could never get my head around eating corpses,&amp;rdquo; she
explained. On the fourth morning, Britain still had no golds. The more
excitable newspapers began demanding inquests. And then the golds came in a crazy rush,
won by a bunch of underpaid Britons of all colors whose frank delight was irresistible.
Above all, there was Mo Farah, the Somali-born runner, who had arrived in London&amp;rsquo;s
suburbs as an eight-year-old barely able to speak English, and had really wanted to play
on the wing for Arsenal, but who won gold in the 10,000 and 5,000 meters instead. After
his first gold, an African journalist asked if he wouldn&amp;rsquo;t rather have been running
for Somalia. &amp;ldquo;Look, mate, this is my country,&amp;rdquo; replied Farah. He was
Boyle&amp;rsquo;s multicultural Britain. The second Saturday of the Olympics, when
Farah was among six Britons to win gold, was Britain&amp;rsquo;s best sporting day since 1966.
It was our best single Olympic day since the Games were held in London in 1908. Of course,
we embarked on an orgy of patriotism. On BBC TV, the new &amp;ldquo;British heroes&amp;rdquo; were
feted much like &amp;ldquo;heroes of the harvest&amp;rdquo; on North Korean state TV. Foreigners
rightly accused the Britons of practically ignoring the other 200 nations. However,
that&amp;rsquo;s what every country at the Olympics does. Each country watches its own Games.                


</item>
  <title>How Facial Hair Can Save You From Skin Cancer</title>
  <link>http://www.askmen.com/sports/news/moustaches-and-skin-cancer.html</link>
  <description rdf:parseType="Literal">
我试过:

sed 's/^.*<description>/<description>/
s/&lt;/</g
s/&gt;/>/g
s/&amp;rsquo;/'"'"'/g
s/&amp;ccedil;/c/g
s/<[^>]*>//g
s/^\(.\{35\}\).*/\1/' inputsample.txt
sed的/^*//
s///g
s/&;rsquo;/'“'”/g
s/&;ccedil/c/g
s/]*>//g
s/^\(.\{35\}\)./\1/'inputsample.txt

我认为sed不可能,因为sed不理解XML实体。对于这种情况,需要使用Perl或Python之类的编程语言

我能找到的最接近你的是:

$ sed -nE '/description/s/.*description(.{,35}).*>/\1/p' file_name
-E
意味着使用扩展正则表达式,因此
{35,35}
将起作用。
-n
表示不要打印。我正在捕捉接下来的35个字符,并用整行替换它们


但是,任何特殊实体(如
和所有赌注)都是无效的。

您是否愿意使用
sed
?和/或可能不使用bash,但更适合于此工作,如使用xml解析器的编程语言?最好编写xpath来解析xmlNo,我希望使用sed实现此目标。您的示例似乎与示例数据不匹配,并且没有一个示例输出限制为35个字符。像“请帮助我,它很紧急”这样的请求等同于“为我做”这样有经验的成员。你肯定不是这个意思。考虑编辑你的帖子向我们展示你尝试过的代码,为什么你认为它不起作用。我同意其他关于xpath或xmlstarlet将是该项目更好的工具的评论,因此我要警告您,在sed中这并不容易。awk会容易一点,你能用它吗?祝你好运。谢谢大卫,我已经编辑了我的问题…我将运行sript并让你知道…谢谢!您的代码出现语法错误。不能运行它。将代码修改为:sed'/description/s/description(.{35,35})。*/\1/p'文件名,但输出与输入文件相同。发现问题。Sed通常要求在保存括号之前使用反斜杠。但是,在GNU版本(Linux上)中,如果使用
-E
扩展正则表达式参数,则在括号前使用反斜杠是不允许的。我还将
{35,35}
修改为
{,35}
,改为最多35个字符,而不是35个字符,并假设一个
是你不想要的东西。它现在应该可以工作了。结果相当混乱:rdf:parseType=“Literal”rdf:parseType=“Literal”rdf:parseType=“Literal”rdf:parseType=“Literal”rdf:parseType=“Literal”rdf:parseType=“Literal”rdf:parseType=“Literal”rdf:parseType=“Literal”rdf:parseType=“Literal”rdf:parseType=“Literal”rdf:parseType=“Literal”rdf:parseType=“Literal”rdf:parseType=“Literal”rdf:parseType=“Literal”rdf:parseType=“Literal”rdf:parseType=“Literal”rdf:parseType=“Literal”etcWell,这正是您所说的。“描述”一词后35个字符。行是
,因此我从该文本中选择
rdf:parseType=“Literal”
。如果这不是您想要的,请解释您的预期输出应该是什么。
$ sed -nE '/description/s/.*description(.{,35}).*>/\1/p' file_name