Java 在字符串中的第一个大写字母处拆分

Java 在字符串中的第一个大写字母处拆分,java,string,split,jsoup,Java,String,Split,Jsoup,所以,我从一个歌词网站上抓取,我想格式化它,就像网站上有一样。现在,当我得到输出时,字符串都在同一行上,如下所示。我正在使用Jsoup从HTML中获取信息。我想做的是像网站上的歌词一样,在大写字母前拆分每一行 I was told a million times Of all the troubles in my way How I had to keep on trying Little better ev'ry day But if I crossed a million rivers An

所以,我从一个歌词网站上抓取,我想格式化它,就像网站上有一样。现在,当我得到输出时,字符串都在同一行上,如下所示。我正在使用Jsoup从HTML中获取信息。我想做的是像网站上的歌词一样,在大写字母前拆分每一行

I was told a million times Of all the troubles in my way How I had to keep on trying Little better ev'ry day But if I crossed a million rivers And I rode a million miles Then I'd still be where I started Bread and butter for a smile Well I sold a million mirrors In a shop in Alley Way But I never saw my face In any window any day Well they say your folks are telling you To be a super star But I tell you just be satisfied To stay right where you are Keep yourself alive keep yourself alive It'll take you all your time and a money Honey you'll survive Well I've loved a million women In a belladonic haze And I ate a million dinners Brought to me on silver trays Give me ev'rything I need To feed my body and my soul And I'll grow a little bigger Maybe that can be my goal I was told a million times Of all the people in my way How I had to keep on trying And get better ev'ry day But if I crossed a million rivers And I rode a million miles Then I'd still be where I started Still be where I started Keep yourself alive keep yourself alive It'll take you all your time and money honey You'll survive Keep yourself alive Keep yourself alive It'll take you all your time and money To keep me satisfied Do you think you're better ev'ry day No I just think I'm two steps nearer to my grave Keep yourself alive Keep yourself alive mm You take your time and take your money Keep yourself alive Keep yourself alive Keep yourself alive All you people keep yourself alive Keep yourself alive Keep yourself alive It'll take you all your time and a money To keep me satisfied Keep yourself alive Keep yourself alive All you people keep yourself alive Take you all your time and money honey You will survive Keep you satisfied Keep you satisfied
我希望它的格式如下:

到目前为止,我的代码是:

public static void lyricScrape() throws IOException {

    Scanner search = new Scanner(System.in);
    String artist;
    String song;
    Document doc;

        artist = search.nextLine();
        artist = artist.toLowerCase();
        artist = artist.replaceAll(" ", "");
        System.out.println("Artist saved");

        song = search.nextLine();
        song = song.toLowerCase();
        System.out.println("Song saved");
        song = song.replaceAll(" ", "");

        doc = Jsoup.connect("http://www.azlyrics.com/lyrics/"+artist+"/"+song+".html").get();
        Elements element = doc.select("div[style^=margin]");
        String lyrics = element.text();
        System.out.println(lyrics);


    }

String.split
接受正则表达式。大写字母的正则表达式是
“[a-Z]”
,但您希望保留字符,因此请查找
“\\[a-Z]”
(前面的空格)。最后,使其不捕获字母:

String[] lines = lyrics.split("\\ (?=[A-Z])");
formatted = lyrics.replaceAll("\\ (?=[A-Z])", "\n");
要弥补一个字母
I
,可以使用

String[] lines = lyrics.split("\\ (?!I\\s)(?=[A-Z])");
formatted = lyrics.replaceAll("\\ (?!I\\s)(?=[A-Z])", "\n");
基于

在HMTL中的每个

之后添加一些特殊文本怎么样。这样,当您调用
text()
时,您将使用类似于
line[specialString]line的
line
来代替
line
,然后您就可以用
\n
替换此
[specialString]
。我是说

element.select("br").append("@REPLACEME@");
String lyrics = element.text().replaceAll("\\s*@REPLACEME@\\s*", "\n");

您还可以在歌词的HTML文本代码上使用
Jsoup.clean
方法删除所有不需要的标记,如
,但在本例中由您定义的标记除外

,然后用
\n
替换此
br
标记
取决于HTML在

之后是否有换行符。所以你的代码看起来像

String lyrics = Jsoup.clean(
                    element.html(), //html to clean
                    Whitelist.none().addTags("br")//allowed tags
                ).replace("<br /> ", "");
String歌词=Jsoup.clean(
element.html(),//要清理的html
Whitelist.none().addTags(“br”)//允许的标记
).替换(“
,”);
代词“I”呢?它总是大写的。@manouti感谢你的评论,我采用了第二种解决方案-谢谢你的提示。我不认为这是我想要的。我想知道从哪里开始一条新线路。逃离太空的原因是什么@黑橙子-这个解决方案有什么问题?@TedHopp一些regex口味。。。我只是特别小心。不确定Java如何处理未替换的空间。@TedHopp我正试图避免数组。@BlackOranges那你在找什么?