Algorithm 查找字符串中第一个未重复的字符

Algorithm 查找字符串中第一个未重复的字符,algorithm,language-agnostic,string,Algorithm,Language Agnostic,String,查找字符串中只出现一次的第一个字符的最快方法是什么?我看到人们在下面发布了一些令人愉快的答案,因此我想提供一些更深入的内容 Ruby中的惯用解决方案 我们可以在字符串中找到第一个不重复的字符,如下所示: def first_unrepeated_char字符串 string.each|char.tally.find{| |,n | n==1}。首先 结束 Ruby是如何做到这一点的 阅读Ruby的源代码 让我们分解这个解决方案,考虑Ruby使用的每一步的算法。 首先,我们调用字符串上的每个字符

查找字符串中只出现一次的第一个字符的最快方法是什么?

我看到人们在下面发布了一些令人愉快的答案,因此我想提供一些更深入的内容

Ruby中的惯用解决方案 我们可以在字符串中找到第一个不重复的字符,如下所示:

def first_unrepeated_char字符串
string.each|char.tally.find{| |,n | n==1}。首先
结束
Ruby是如何做到这一点的

阅读Ruby的源代码

让我们分解这个解决方案,考虑Ruby使用的每一步的算法。 首先,我们调用字符串上的每个字符。这将创建一个枚举器,允许我们一次访问一个字符的字符串。由于Ruby处理Unicode字符,所以我们从枚举器获得的每个值都可以是可变的字节数,这一点很复杂。如果我们知道我们的输入是ASCII或类似的,我们可以使用

每个字节

每个_char
方法是

rb\u str\u每个字符(值str)
{
返回大小的枚举数(str、0、0、rb\u str\u每个字符大小);
返回rb_str_enumerate_字符(str,0);
}
依次,
rb\u string\u enumerate\u chars

rb\u str\u枚举字符(值str,值ary)
{
值orig=str;
长i,len,n;
常量字符*ptr;
rb_编码*enc;
str=rb_str_新_冻结(str);
ptr=RSTRING_ptr(str);
len=RSTRING_len(str);
enc=rb_enc_get(str);
如果(ENC_代码范围_清洁_P(ENC_代码范围(str))){
对于(i=0;i
从中我们可以看到,它调用
rb_enc_mbclen
(或其快速版本)来获取字符串中下一个字符的长度(以字节为单位),以便可以迭代下一步。通过懒洋洋地迭代一个字符串,一次只读取一个字符,当
tally
消耗迭代器时,我们只对输入字符串执行一次完整的遍历

理货是:

静态无效
汇总(值哈希、值组)
{
值tally=rb_hash_aref(散列,组);
如果(无P(计数)){
tally=INT2FIX(1);
}
否则如果(FIXNUM_P(计数)和&tally
这里,
tally\u i
使用
RB\u BLOCK\u CALL\u FUNC\u ARGLIST
反复调用
tally\u up
,每次迭代时更新tally散列

粗糙时间与记忆分析
each_char
方法没有分配一个数组来急切地保存字符串的字符,因此它有一个小的恒定内存开销。当我们
tall
字符时,我们分配一个散列并将计数数据放入其中,在最坏的情况下,它会占用与输入字符串乘以某个常数因子一样多的内存

时间方面,
tally
对字符串进行完整扫描,调用
find
查找第一个不重复的字符将再次扫描哈希,每个字符的最坏情况复杂度为O(n)

public void findUnique(String string) {
    ArrayList<Character> uniqueList = new ArrayList<>();
    int[] chatArr = new int[128];
    for (int i = 0; i < string.length(); i++) {
        Character ch = string.charAt(i);
        if (chatArr[ch] != -1) {
            chatArr[ch] = -1;
            uniqueList.add(ch);
        } else {
            uniqueList.remove(ch);
        }
    }
    if (uniqueList.size() == 0) {
        System.out.println("No unique character found!");
    } else {
        System.out.println("First unique character is :" + uniqueList.get(0));
    }
}
然而,tally也会在每次迭代中更新一个哈希值。在每个字符上更新哈希值的速度可能会像O(n)一样慢,所以这个Ruby解决方案最糟糕的情况可能是O(n^2)

然而,在合理的假设下,更新一个散列,因此我们可以预期平均摊销案例看起来像O(n)


我以前用Python接受的答案 在处理整个字符串之前,您无法知道该字符是否不重复,因此我的建议如下:

def first_non_repeated_character(string):
  chars = []
  repeated = []
  for character in string:
    if character in chars:
      chars.remove(character)
      repeated.append(character)
    else:
      if not character in repeated:
        chars.append(character)
  if len(chars):
    return chars[0]
  else:
    return False
编辑:最初发布的代码是错误的,但这个最新的代码片段被证明可以在Ryan的计算机上工作™.

它必须至少为O(n),因为在读取所有字符之前,您不知道是否会重复某个字符

因此,您可以对字符进行迭代,并在第一次看到每个字符时将其附加到列表中,并分别记录您看到它的次数(事实上,对计数来说唯一重要的值是“0”、“1”或“大于1”)

当到达字符串末尾时,只需在列表中找到计数正好为1的第一个字符


Python中的示例代码:

def first_non_repeated_character(s):
    counts = defaultdict(int)
    l = []
    for c in s:
        counts[c] += 1
        if counts[c] == 1:
            l.append(c)

    for c in l:
        if counts[c] == 1:
            return c

    return None

这在O(n)中运行。

为什么不使用基于堆的数据结构,例如最小优先级队列。从字符串中读取每个字符时,根据字符串中的位置和到目前为止出现的次数,以优先级将其添加到队列中。您可以修改队列以在冲突时添加优先级,以便角色的优先级是该角色出现次数的总和。在循环结束时,队列中的第一个元素将是字符串中频率最低的字符,如果有多个计数=1的字符,则第一个元素是添加到队列中的第一个唯一字符。

在C中,这几乎是(不完全是O(n!),但大于0(n2))

但对于大小合理的字符串,它的性能将优于“更好”的算法,因为O太小了。这还可以很容易地告诉您第一个非重复字符串的位置

char FirstNonRepeatedChar(char * psz)
{
   for (int ii = 0; psz[ii] != 0; ++ii)
   {
      for (int jj = ii+1; ; ++jj)
      {
         // if we hit the end of string, then we found a non-repeat character.
         //
         if (psz[jj] == 0)
            return psz[ii]; // this character doesn't repeat

         // if we found a repeat character, we can stop looking.
         //
         if (psz[ii] == psz[jj])
            break; 
      }
   }

   return 0; // there were no non-repeating characters.
}

编辑:此代码假定您不是指连续重复的字符

计数器需要Python2.7Python3.1

>>> from collections import Counter
>>> def first_non_repeated_character(s):
...     counts = Counter(s)
...     for c in s:
...         if counts[c]==1:
...             return c
...     return None
... 
>>> first_non_repeated_character("aaabbbcddd")
'c'
>>> first_non_repeated_character("aaaebbbcddd")
'e'
>>> from collections import Counter
>>> def first_non_repeated_character(s):
...     return min((k for k,v in Counter(s).items() if v<2), key=s.index)
...
>>> first_non_repeated_character("aaabbbcddd")
'c'
>>> first_non_repeated_character("aaaebbbcddd")
'e'

下面是另一个有趣的方法。计数器需要Python2.7Python3.1

>>> from collections import Counter
>>> def first_non_repeated_character(s):
...     counts = Counter(s)
...     for c in s:
...         if counts[c]==1:
...             return c
...     return None
... 
>>> first_non_repeated_character("aaabbbcddd")
'c'
>>> first_non_repeated_character("aaaebbbcddd")
'e'
>>> from collections import Counter
>>> def first_non_repeated_character(s):
...     return min((k for k,v in Counter(s).items() if v<2), key=s.index)
...
>>> first_non_repeated_character("aaabbbcddd")
'c'
>>> first_non_repeated_character("aaaebbbcddd")
'e'
来自集合导入计数器的
>>
>
use strict;
use warnings;

foreach my $word(@ARGV)
{
  my @distinct_chars;
  my %char_counts;

  my @chars=split(//,$word);

  foreach (@chars)
  {
    push @distinct_chars,$_ unless $_~~@distinct_chars;
    $char_counts{$_}++;
  }

  my $first_non_repeated="";

  foreach(@distinct_chars)
  {
    if($char_counts{$_}==1)
    {
      $first_non_repeated=$_;
      last;
    }
  }

  if(length($first_non_repeated))
  {
    print "For \"$word\", the first non-repeated character is '$first_non_repeated'.\n";
  }
  else
  {
    print "All characters in \"$word\" are repeated.\n";
  }
}
jmaney> perl non_repeated.pl aabccd "a huge string in which some characters repeat" abcabc
For "aabccd", the first non-repeated character is 'b'.
For "a huge string in which some characters repeat", the first non-repeated character is 'u'.
All characters in "abcabc" are repeated.
unsigned char find_first_unique(unsigned char *string)
{
    int chars[256];
    int i=0;
    memset(chars, 0, sizeof(chars));

    while (string[i++])
    {
        chars[string[i]]++;
    }

    i = 0;
    while (string[i++])
    {
        if (chars[string[i]] == 1) return string[i];
    }
    return 0;
}
    public static String findFirstUnique(String str)
    {
        String unique = "";

        foreach (char ch in str)
        {
            if (unique.Contains(ch)) unique=unique.Replace(ch.ToString(), "");
            else unique += ch.ToString();
        }
        return unique[0].ToString();
    }
def first_non_repeated_character(s):
    counts = defaultdict(int)
    for c in s:
        counts[c] += 1
    for c in s:
        if counts[c] == 1:
            return c
    return None
string = "conservationist deliberately treasures analytical";

Cases[Gather @ Characters @ string, {_}, 1, 1][[1]]
{"v"}
var string = "tooth";
var hash = [];
for(var i=0; j=string.length, i<j; i++){
    if(hash[string[i]] !== undefined){
        hash[string[i]] = hash[string[i]] + 1;
    }else{
        hash[string[i]] = 1;
    }
}

for(i=0; j=string.length, i<j; i++){
    if(hash[string[i]] === 1){
        console.info( string[i] );
        return false;
    }
}
// prints "h"
C code 
-----
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
    char t_c;
    char *t_p = argv[1] ;
    char count[128]={'\0'};
    char ch;

    for(t_c = *(argv[1]); t_c != '\0'; t_c = *(++t_p))
        count[t_c]++;
    t_p = argv[1];
    for(t_c = *t_p; t_c != '\0'; t_c = *(++t_p))
    {
        if(count[t_c] == 1)
        {
            printf("Element is %c\n",t_c);
            break;
        }
    }

return 0;    
} 
char FindUniqueChar(char *a)
{
    int i=0;
    bool repeat=false;
    while(a[i] != '\0')
    {
      if (a[i] == a[i+1])
      {
        repeat = true;
      }
      else
      {
            if(!repeat)
            {
            cout<<a[i];
            return a[i];
            }
        repeat=false;
      }
      i++;
    }
    return a[i];
}
using System;
using System.Linq;
using System.Text;

namespace SomethingDigital
{
    class FirstNonRepeatingChar
    {
        public static void Main()
        {
            String input = "geeksforgeeksandgeeksquizfor";
            char[] str = input.ToCharArray();

            bool[] b = new bool[256];
            String unique1 = "";
            String unique2 = "";

            foreach (char ch in str)
            {
                if (!unique1.Contains(ch))
                {
                    unique1 = unique1 + ch;
                    unique2 = unique2 + ch;
                }
                else
                {
                    unique2 = unique2.Replace(ch.ToString(), "");
                }
            }
            if (unique2 != "")
            {
                Console.WriteLine(unique2[0].ToString());
                Console.ReadLine();
            }
            else
            {
                Console.WriteLine("No non repeated string");
                Console.ReadLine();
            }
        }
    }
}
def first_non_repeated_character(string)
  string1 = string.split('')
  string2 = string.split('')

  string1.each do |let1|
    counter = 0
    string2.each do |let2|
      if let1 == let2
        counter+=1
      end
    end
  if counter == 1 
    return let1
    break
  end
end
end

p first_non_repeated_character('dont doddle in the forest')
var first_non_repeated_character = function (string) {
  var string1 = string.split('');
  var string2 = string.split('');

  var single_letters = [];

  for (var i = 0; i < string1.length; i++) {
    var count = 0;
    for (var x = 0; x < string2.length; x++) {
      if (string1[i] == string2[x]) {
        count++
      }
    }
    if (count == 1) {
      return string1[i];
    }
  }
}

console.log(first_non_repeated_character('dont doddle in the forest'));
console.log(first_non_repeated_character('how are you today really?'));
#include <iostream>
#include <cstdio>
#include <cstdlib>
#include <climits>
using namespace std;

#define No_of_chars 256

//store the count and the index where the char first appear
typedef struct countarray
{
    int count;
    int index;
}countarray;

//returns the count array
    countarray *getcountarray(char *str)
    {
        countarray *count;
        count=new countarray[No_of_chars];
        for(int i=0;i<No_of_chars;i++)
        {
            count[i].count=0;
            count[i].index=-1;
        }
        for(int i=0;*(str+i);i++)
        {
            (count[*(str+i)].count)++;
            if(count[*(str+i)].count==1) //if count==1 then update the index
                count[*(str+i)].index=i; 

        }
        return count;
    }

    char firstnonrepeatingchar(char *str)
    {
        countarray *array;
        array = getcountarray(str);
        int result = INT_MAX;
        for(int i=0;i<No_of_chars;i++)
        {
            if(array[i].count==1 && result > array[i].index)
                result = array[i].index;
        }
        delete[] (array);
        return (str[result]);
    }

    int main()
    {
        char str[] = "geeksforgeeks";
        cout<<"First non repeating character is "<<firstnonrepeatingchar(str)<<endl;        
        return 0;
    }
var arr = string.split("");
var occurences = {};
var tmp;
var lowestindex = string.length+1;

arr.forEach( function(c){ 
  tmp = c;
  if( typeof occurences[tmp] == "undefined")
    occurences[tmp] = tmp;
  else 
    occurences[tmp] += tmp;
});


for(var p in occurences) {
  if(occurences[p].length == 1)
    lowestindex = Math.min(lowestindex, string.indexOf(p));
}

if(lowestindex > string.length)
  return null;

return string[lowestindex];

}
private static string FirstNoRepeatingCharacter(string aword)
    {
        Dictionary<string, int> dic = new Dictionary<string, int>();            

        for (int i = 0; i < aword.Length; i++)
        {
            if (!dic.ContainsKey(aword.Substring(i, 1)))
                dic.Add(aword.Substring(i, 1), 1);
            else
                dic[aword.Substring(i, 1)]++;
        }

        foreach (var item in dic)
        {
            if (item.Value == 1) return item.Key;
        }
        return string.Empty;
    }
public void firstUniqueChar(String str){
    String unique= "";
    String repeated = "";
    str = str.toLowerCase();
    for(int i=0; i<str.length();i++){
        char ch = str.charAt(i);
        if(!(repeated.contains(str.subSequence(i, i+1))))
            if(unique.contains(str.subSequence(i, i+1))){
                unique = unique.replaceAll(Character.toString(ch), "");
                repeated = repeated+ch;
            }
            else
                unique = unique+ch;
    }
    System.out.println(unique.charAt(0));
}
import static java.util.stream.Collectors.counting;
import static java.util.stream.Collectors.groupingBy;

import java.util.Arrays;
import java.util.List;
import java.util.Map;

// Runs in O(N) time and uses lambdas and the stream API from Java 8
//   Also, it is only three lines of code!
private static String findFirstUniqueCharacterPerformantWithLambda(String inputString) {
  // convert the input string into a list of characters
  final List<String> inputCharacters = Arrays.asList(inputString.split(""));

  // first, construct a map to count the number of occurrences of each character
  final Map<Object, Long> characterCounts = inputCharacters
    .stream()
    .collect(groupingBy(s -> s, counting()));

  // then, find the first unique character by consulting the count map
  return inputCharacters
    .stream()
    .filter(s -> characterCounts.get(s) == 1)
    .findFirst()
    .orElse(null);
}
public void findUnique(String string) {
    ArrayList<Character> uniqueList = new ArrayList<>();
    int[] chatArr = new int[128];
    for (int i = 0; i < string.length(); i++) {
        Character ch = string.charAt(i);
        if (chatArr[ch] != -1) {
            chatArr[ch] = -1;
            uniqueList.add(ch);
        } else {
            uniqueList.remove(ch);
        }
    }
    if (uniqueList.size() == 0) {
        System.out.println("No unique character found!");
    } else {
        System.out.println("First unique character is :" + uniqueList.get(0));
    }
}
def first_unique(s):
    repeated = []

    while s:
        if s[0] not in s[1:] and s[0] not in repeated:
            return s[0]
        else:
            repeated.append(s[0])
            s = s[1:]
    return None
(first_unique('abdcab') == 'd', first_unique('aabbccdad') == None, first_unique('') == None, first_unique('a') == 'a')
public class Test4 {
    public static void main(String[] args) {
        String a = "GiniGinaProtijayi";

        firstUniqCharindex(a);
    }

    public static void firstUniqCharindex(String a) {
        int[] count = new int[256];
        for (int i = 0; i < a.length(); i++) {
            count[a.charAt(i)]++;
        }
        int index = -1;
        for (int i = 0; i < a.length(); i++) {
            if (count[a.charAt(i)] == 1) {
                index = i;
                break;
            } // if
        }
        System.out.println(index);// output => 8
        System.out.println(a.charAt(index)); //output => P

    }// end1
}
def firstUniqChar(a):
  count = [0] * 256
  for i in a: count[ord(i)] += 1 
  element = ""
  for items in a:
      if(count[ord(items) ] == 1):
          element = items ;
          break
  return element


a = "GiniGinaProtijayi";
print(firstUniqChar(a)) # output is P
public class Test2 {
    public static void main(String[] args) {
        String a = "GiniGinaProtijayi";

        Map<Character, Long> map = a.chars()
                .mapToObj(
                        ch -> Character.valueOf((char) ch)

        ).collect(
                Collectors.groupingBy(
                        Function.identity(), 
                        LinkedHashMap::new,
                        Collectors.counting()));

        System.out.println("MAP => " + map);
        // {G=2, i=5, n=2, a=2, P=1, r=1, o=1, t=1, j=1, y=1}

        Character chh = map
                .entrySet()
                .stream()
                .filter(entry -> entry.getValue() == 1L)
                .map(entry -> entry.getKey())
                .findFirst()
                .get();
        System.out.println("First Non Repeating Character => " + chh);// P
    }// main

}