在Java中删除字符串中第一个单词的最佳方法
去除字符串中第一个标记的最快方法是什么?到目前为止,我已经尝试过:在Java中删除字符串中第一个单词的最佳方法,java,string,performance,Java,String,Performance,去除字符串中第一个标记的最快方法是什么?到目前为止,我已经尝试过: String parentStringValue = this.stringValue.split(" ", 2)[1]; 而且它的内存和速度都非常低效(在15字长的字符串中重复数百万次)。假设字符串由空格分隔的标记组成。可以使用string.substring和string.indexOf的组合 大致如下: // TODO check indexOf does not return -1 this.stringValue.s
String parentStringValue = this.stringValue.split(" ", 2)[1];
而且它的内存和速度都非常低效(在15字长的字符串中重复数百万次)。假设字符串由空格分隔的标记组成。可以使用
string.substring
和string.indexOf
的组合
大致如下:
// TODO check indexOf does not return -1
this.stringValue.substring(this.stringValue.indexOf(" ") + 1)
为此,可以使用
String.substring
和String.indexOf
的组合
大致如下:
// TODO check indexOf does not return -1
this.stringValue.substring(this.stringValue.indexOf(" ") + 1)
无需拆分和创建数组,只需使用子字符串即可
String str="I want to remove I";
String parentStringValue = str.substring(str.indexOf(" ")+1);
System.out.println(parentStringValue);
输出:
want to remove I
无需拆分和创建数组,只需使用子字符串即可
String str="I want to remove I";
String parentStringValue = str.substring(str.indexOf(" ")+1);
System.out.println(parentStringValue);
输出:
want to remove I
试试这个:
String s = "This is a test";
System.out.println(s.replaceFirst("\\w+\\s", ""));
试试这个:
String s = "This is a test";
System.out.println(s.replaceFirst("\\w+\\s", ""));
如果您不反对使用,那么可以使用
StringUtils
类
这意味着您不必满足返回-1的String.indexOf:
String parentStringValue = StringUtils.substringAfter(yourString, " ");
如果您不反对使用,那么可以使用
StringUtils
类
这意味着您不必满足返回-1的String.indexOf:
String parentStringValue = StringUtils.substringAfter(yourString, " ");
在执行字符串操作时,请尝试使用StringBuffer或StringBuilder,以免留下大量新的未使用对象并导致内存效率低下,因为如您所述,重复了数百万次在执行字符串操作时,请尝试使用StringBuffer或StringBuilder,以免留下大量新的未使用对象和导致内存效率低下,因为正如您所提到的,StringBuilder与子字符串(x)重复了数百万次StringBuilder与子字符串(x)vs
split(x)
vs Regex
答案已编辑:主要缺陷已纠正
在纠正了我的基准测试中的一些相当严重的缺陷之后(正如Jay Askren在评论中指出的)。StringBuilder
方法以显著优势成为最快的方法(尽管这假设StringBuilder
对象是预先创建的),子字符串排在第二位split()
以比StringBuilder
方法慢10倍的速度排在倒数第二
ArrayList<String> strings = new ArrayList<String>();
ArrayList<StringBuilder> stringBuilders = new ArrayList<StringBuilder>();
for(int i = 0; i < 1000; i++) strings.add("Remove the word remove from String "+i);
for(int i = 0; i < 1000; i++) stringBuilders.add(new StringBuilder(i+" Remove the word remove from String "+i));
Pattern pattern = Pattern.compile("\\w+\\s");
// StringBuilder method
before = System.currentTimeMillis();
for(int i = 0; i < 5000; i++){
for(StringBuilder s : stringBuilders){
s.delete(0, s.indexOf(" ") + 1);
}
}
after = System.currentTimeMillis() - before;
System.out.println("StringBuilder Method Took "+after);
// Substring method
before = System.currentTimeMillis();
for(int i = 0; i < 5000; i++){
for(String s : strings){
String newvalue = s.substring(s.indexOf(" ") + 1);
}
}
after = System.currentTimeMillis() - before;
System.out.println("Substring Method Took "+after);
//Split method
before = System.currentTimeMillis();
for(int i = 0; i < 5000; i++){
for(String s : strings){
String newvalue = s.split(" ", 2)[1];
System.out.print("");
}
}
after = System.currentTimeMillis() - before;
System.out.println("Your Method Took "+after);
// Regex method
before = System.currentTimeMillis();
for(int i = 0; i < 5000; i++){
for(String s : strings){
String newvalue = pattern.matcher(s).replaceFirst("");
}
}
after = System.currentTimeMillis() - before;
System.out.println("Regex Method Took "+after);
值得一提的是,使用长度大于1的字符串调用split() StringBuilder vssubstring(x)
vssplit(x)
vs Regex
答案已编辑:主要缺陷已纠正
在纠正了我的基准测试中的一些相当严重的缺陷之后(正如Jay Askren在评论中指出的)。StringBuilder
方法以显著优势成为最快的方法(尽管这假设StringBuilder
对象是预先创建的),子字符串排在第二位split()
以比StringBuilder
方法慢10倍的速度排在倒数第二
ArrayList<String> strings = new ArrayList<String>();
ArrayList<StringBuilder> stringBuilders = new ArrayList<StringBuilder>();
for(int i = 0; i < 1000; i++) strings.add("Remove the word remove from String "+i);
for(int i = 0; i < 1000; i++) stringBuilders.add(new StringBuilder(i+" Remove the word remove from String "+i));
Pattern pattern = Pattern.compile("\\w+\\s");
// StringBuilder method
before = System.currentTimeMillis();
for(int i = 0; i < 5000; i++){
for(StringBuilder s : stringBuilders){
s.delete(0, s.indexOf(" ") + 1);
}
}
after = System.currentTimeMillis() - before;
System.out.println("StringBuilder Method Took "+after);
// Substring method
before = System.currentTimeMillis();
for(int i = 0; i < 5000; i++){
for(String s : strings){
String newvalue = s.substring(s.indexOf(" ") + 1);
}
}
after = System.currentTimeMillis() - before;
System.out.println("Substring Method Took "+after);
//Split method
before = System.currentTimeMillis();
for(int i = 0; i < 5000; i++){
for(String s : strings){
String newvalue = s.split(" ", 2)[1];
System.out.print("");
}
}
after = System.currentTimeMillis() - before;
System.out.println("Your Method Took "+after);
// Regex method
before = System.currentTimeMillis();
for(int i = 0; i < 5000; i++){
for(String s : strings){
String newvalue = pattern.matcher(s).replaceFirst("");
}
}
after = System.currentTimeMillis() - before;
System.out.println("Regex Method Took "+after);
值得一提的是,使用长度大于1的字符串调用split() 鲁迪的基准有很多问题,包括不公平和不正确地支持分割法。因此,我采用了他的基准并对其进行了改进。如果您碰巧有一组StringBuilder,StringBuilder方法会稍微快一点,但是如果您需要首先从字符串转换它们,那么它会非常慢。子字符串方法是第二快的方法,如果您有字符串而不是字符串生成器,那么应该使用它。CommonsLang是第二快的,子字符串方法和CommonsLang方法都比使用split快4到5倍。String.replaceFirst()使用正则表达式,速度非常慢,因为它每次运行时都需要编译正则表达式,这会使运行时间加倍。即使没有编译步骤,它也比其他步骤慢得多
下面是基准测试的代码。您需要将ApacheCommonsLang添加到类路径中才能运行此操作
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Pattern;
import org.apache.commons.lang3.StringUtils;
/**
*
*/
public class StringTest {
public static void main(String[] args) {
int numIterations = 100000;
int numRuns = 10;
ArrayList<String> strings = new ArrayList<String>();
for(int i = 0; i < 1000; i++) strings.add("Remove the word remove from String "+i);
//Your method
long before = 0;
long after = 0;
for(int j=0; j < numRuns; j++) {
before = System.currentTimeMillis();
for(int i = 0; i < numIterations; i++){
for(String s : strings){
String newvalue = s.split(" ", 2)[1];
// System.out.println("split " + newvalue);
}
}
after = System.currentTimeMillis() - before;
System.out.println("Split Took "+after + " ms");
}
// Substring method
for(int j=0; j < numRuns; j++) {
before = System.currentTimeMillis();
for(int i = 0; i < numIterations; i++){
for(String s : strings){
String newvalue = s.substring(s.indexOf(" ") + 1);
}
}
after = System.currentTimeMillis() - before;
System.out.println("Substring Took "+after + " ms");
}
// Apache Commons Lang method
before = System.currentTimeMillis();
for(int j=0; j < numRuns; j++) {
before = System.currentTimeMillis();
for(int i = 0; i < numIterations; i++){
for (String s : strings) {
String parentStringValue = StringUtils.substringAfter(s, " ");
}
}
after = System.currentTimeMillis() - before;
System.out.println("CommonsLang Took "+after + " ms");
}
for(int j=0; j < numRuns; j++) {
long deleteTime = 0l;
before = System.currentTimeMillis();
for(int i = 0; i < numIterations; i++){
List<StringBuilder> stringBuilders = new ArrayList<StringBuilder>();
for (String s : strings) {
stringBuilders.add(new StringBuilder(s));
}
long beforeDelete = System.currentTimeMillis();
for (StringBuilder s : stringBuilders) {
s.delete(0, s.indexOf(" ") + 1);
}
deleteTime+=(System.currentTimeMillis() - beforeDelete);
}
after = System.currentTimeMillis() - before;
System.out.println("StringBuilder Delete " + deleteTime + " ms out of " + after + " total ms");
}
// Faster Regex method
Pattern pattern = Pattern.compile("\\w+\\s");
for(int j=0; j < numRuns; j++) {
before = System.currentTimeMillis();
for(int i = 0; i < numIterations; i++){
for (String s : strings) {
String newvalue = pattern.matcher(s).replaceFirst("");
}
}
after = System.currentTimeMillis() - before;
System.out.println("Faster Regex Took "+after + " ms");
}
// Slow Regex method
for(int j=0; j < numRuns; j++) {
before = System.currentTimeMillis();
for(int i = 0; i < numIterations; i++){
for (String s : strings) {
String newvalue = s.replaceFirst("\\w+\\s", "");
}
}
after = System.currentTimeMillis() - before;
System.out.println("Slow Regex Took " + after + " ms");
}
}
}
鲁迪的基准有很多问题,包括不公平和错误地支持分割方法。因此,我采用了他的基准并对其进行了改进。如果您碰巧有一组StringBuilder,StringBuilder方法会稍微快一点,但是如果您需要首先从字符串转换它们,那么它会非常慢。子字符串方法是第二快的方法,如果您有字符串而不是字符串生成器,那么应该使用它。CommonsLang是第二快的,子字符串方法和CommonsLang方法都比使用split快4到5倍。String.replaceFirst()使用正则表达式,速度非常慢,因为它每次运行时都需要编译正则表达式,这会使运行时间加倍。即使没有编译步骤,它也比其他步骤慢得多
下面是基准测试的代码。您需要将ApacheCommonsLang添加到类路径中才能运行此操作
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Pattern;
import org.apache.commons.lang3.StringUtils;
/**
*
*/
public class StringTest {
public static void main(String[] args) {
int numIterations = 100000;
int numRuns = 10;
ArrayList<String> strings = new ArrayList<String>();
for(int i = 0; i < 1000; i++) strings.add("Remove the word remove from String "+i);
//Your method
long before = 0;
long after = 0;
for(int j=0; j < numRuns; j++) {
before = System.currentTimeMillis();
for(int i = 0; i < numIterations; i++){
for(String s : strings){
String newvalue = s.split(" ", 2)[1];
// System.out.println("split " + newvalue);
}
}
after = System.currentTimeMillis() - before;
System.out.println("Split Took "+after + " ms");
}
// Substring method
for(int j=0; j < numRuns; j++) {
before = System.currentTimeMillis();
for(int i = 0; i < numIterations; i++){
for(String s : strings){
String newvalue = s.substring(s.indexOf(" ") + 1);
}
}
after = System.currentTimeMillis() - before;
System.out.println("Substring Took "+after + " ms");
}
// Apache Commons Lang method
before = System.currentTimeMillis();
for(int j=0; j < numRuns; j++) {
before = System.currentTimeMillis();
for(int i = 0; i < numIterations; i++){
for (String s : strings) {
String parentStringValue = StringUtils.substringAfter(s, " ");
}
}
after = System.currentTimeMillis() - before;
System.out.println("CommonsLang Took "+after + " ms");
}
for(int j=0; j < numRuns; j++) {
long deleteTime = 0l;
before = System.currentTimeMillis();
for(int i = 0; i < numIterations; i++){
List<StringBuilder> stringBuilders = new ArrayList<StringBuilder>();
for (String s : strings) {
stringBuilders.add(new StringBuilder(s));
}
long beforeDelete = System.currentTimeMillis();
for (StringBuilder s : stringBuilders) {
s.delete(0, s.indexOf(" ") + 1);
}
deleteTime+=(System.currentTimeMillis() - beforeDelete);
}
after = System.currentTimeMillis() - before;
System.out.println("StringBuilder Delete " + deleteTime + " ms out of " + after + " total ms");
}
// Faster Regex method
Pattern pattern = Pattern.compile("\\w+\\s");
for(int j=0; j < numRuns; j++) {
before = System.currentTimeMillis();
for(int i = 0; i < numIterations; i++){
for (String s : strings) {
String newvalue = pattern.matcher(s).replaceFirst("");
}
}
after = System.currentTimeMillis() - before;
System.out.println("Faster Regex Took "+after + " ms");
}
// Slow Regex method
for(int j=0; j < numRuns; j++) {
before = System.currentTimeMillis();
for(int i = 0; i < numIterations; i++){
for (String s : strings) {
String newvalue = s.replaceFirst("\\w+\\s", "");
}
}
after = System.currentTimeMillis() - before;
System.out.println("Slow Regex Took " + after + " ms");
}
}
}
你用过子串吗?问题是,你是否使用了子字符串?问题还不清楚,还是很慢。这似乎比我目前的做法更糟。@user3639557为什么你认为它慢?我可以看到我在文件中的行上迭代的速度有多快。此操作在每条生产线上进行。我每10000行生成并输出一次。我在不读取文件的情况下使用循环进行了尝试,在405毫秒内有10000条记录。我认为这并不慢。@user3639557-如果您在观看它,那么是您的System.out.println()语句减慢了您的进程。与大多数流程相比,在屏幕上打印速度非常慢。删除它,只输出开始和结束时间。它仍然很慢。这似乎比我现在做的更糟。@user3639557为什么你认为它慢?我可以看到h