Javascript 在字符串中查找重复项

Javascript 在字符串中查找重复项,javascript,python,excel,Javascript,Python,Excel,我在工作中接到一个项目,需要从数据集中的多行中找到重复的配对。虽然数据集要大得多,但主要部分围绕培训日期、培训地点和培训师姓名展开。因此,每一行数据都有一个日期、一个位置,然后是一个逗号分隔的名称列表: Date Location Names 1/13/2014 Seattle A, B, D 1/16/2014 Dallas C, D, E 1/20/2014 New York A, D 1/23/2014 Dallas C, E 1

我在工作中接到一个项目,需要从数据集中的多行中找到重复的配对。虽然数据集要大得多,但主要部分围绕培训日期、培训地点和培训师姓名展开。因此,每一行数据都有一个日期、一个位置,然后是一个逗号分隔的名称列表:

Date    Location       Names
1/13/2014   Seattle    A, B, D
1/16/2014   Dallas     C, D, E
1/20/2014   New York   A, D
1/23/2014   Dallas     C, E
1/27/2014   Seattle    B, D
1/30/2014   Houston    C, A, F
2/3/2014    Washington DC   D, A, F
2/6/2014    Phoenix    B, E
2/10/2014   Seattle    C, B
2/13/2014   Miami      A, B, E
2/17/2014   Miami      C, D 
2/20/2014   New York   B, E, F
2/24/2014   Houston    A, B, F
我的目标是能够找到具有相似名称对的行。一个例子是知道A&B在西雅图于1月13日配对,迈阿密于2月13日配对,休斯顿于2月24日配对,尽管第三个名字每次都不同。因此,我不只是在整个名称字符串中查找重复项,我还希望在“names”列的部分段中查找配对项

这是否可以在Excel中实现,或者我需要使用编程语言来完成任务

虽然我可以手动完成这项工作,但它代表了可以用于其他事情的大量时间。如果有一种方法,我可以自动化这一点,这将使我的这部分任务简单得多


提前感谢您在前进道路上提供的任何帮助或建议

好的。我觉得很无聊,用Python代码做了这一切。我想你熟悉这门语言;但是,您应该能够在安装了Python的任何计算机上运行以下代码

我做了一些假设。例如,我使用了您的示例输入作为确定输入

有几件事会把计划搞砸:

  • 不区分大小写输入。注意大写字母等
  • 具有具有以下行的输入文件:“日期位置名称”。只需删除并在文件中保留直接事实即可。我很懒,不想调整这个
  • 一吨其他小东西。只要按程序要求去做,不要输入时髦的输入
关于节目:

围绕使用以人名为关键字的词典展开。字典中的值是一个元组集,元组包含它们在哪一天去过的地方。通过比较这些集合并得到交点,我们可以找到答案

因为我把这当作Python练习,所以有点乱。我已经有一段时间没有用Python编写代码了,在不使用对象的情况下完成这一切让我感到兴奋。只需按照“说明”进行操作,并将存储所有信息的inputfile保存在与正在运行的代码相同的文件夹中

作为补充说明,您可能需要检查程序是否产生正确的输出

如果您有任何问题,请随时与我联系

def readWord(line, stringIndex):
    word = ""
    while(line[stringIndex] != " "):
        word += line[stringIndex]
        stringIndex += 1
    return word, stringIndex

def removeSpacing(line, stringIndex):
    while(line[stringIndex] == " "):
        stringIndex += 1
    return stringIndex

def readPeople(line, stringIndex):
    lineSize = len(line)
    people = []
    while(stringIndex < lineSize):
        people.append(line[stringIndex])
        stringIndex += 3
    return people


def readLine(travels, line):
    stringIndex = 0

    date, stringIndex = readWord(line, stringIndex)
    stringIndex = removeSpacing(line, stringIndex)
    location, stringIndex = readWord(line, stringIndex)
    stringIndex = removeSpacing(line, stringIndex)
    people = readPeople(line, stringIndex)

    for person in people:
        if(person not in travels.keys()):
            travels[person] = set()
        travels[person].add((date, location))

    return travels


def main():

    f = open(input("Enter filename (must be in same folder as this program code. For instance, name could be: testDocument.txt\n\n"))
    travels = dict()
    for line in f:
        travels = readLine(travels, line)
    print("\n\n\n\n PROGRAM RUNNING \n \n")
    while(True):
        persons = []
        userInput = "empty"
        while(userInput):
            userInput = input("Enter person name (Type Enter to finish typing names): ")
            if(userInput):
                persons.append(userInput)
        output = travels[persons[0]]
        for person in persons[1:]:
            output = output.intersection(travels[person])
        print("")
        for hit in output:
            print(hit)
        print("\nFINISHED WITH ONE RUN. STARTING NEW ONE\n")
def readWord(行、字符串索引):
word=“”
而(行[stringIndex]!=“”):
word+=行[stringIndex]
stringIndex+=1
返回字,字符串索引
def removeSpacing(行、字符串索引):
而(行[stringIndex]==“”):
stringIndex+=1
返回字符串索引
def readPeople(行、字符串索引):
lineSize=len(行)
人=[]
而(stringIndex
您可以使用VBA来完成。下面的解决方案假设

  • 您的数据位于A:C列的活动工作表中
  • 您的结果将在列中输出,例如:
  • 输出将是一个按对排序的列表,然后按日期排序,这样您就可以很容易地看到对重复的位置
  • 该例行程序假定一次不超过三名培训师,但可以进行修改,添加更多可能的组合
  • 只有一名培训师的城市将被忽略
该例程使用一个类模块收集信息,使用两个集合处理数据。它还利用了集合不允许添加具有相同键的两个项的功能

类模块 重命名类模块:cPairs



正则模
选项显式
选项比较文本
公共cP作为CPAIR,colP作为集合
公共冷空气收集
作为变体的公共vSrc
公共vRes()作为变体
公共rRes范围
公共I等于长,J等于长
公共V作为变体
公共密钥作为字符串
子FindPairs()
vSrc=范围(“A1”,单元格(行数,“C”)。结束(xlUp))
Set colP=新集合
Set colCityPairs=新集合
“收集成双
对于I=2至UBound(vSrc)
V=拆分(替换(vSrc(I,3),“”,“”,“”,“”,“”)
如果UBound(V)>=1,则
“把成对的东西分类
单泡运动
选择案例UBound(V)
案例1
地址对V(0),V(1)
案例2
地址对V(0),V(1)
地址对V(0),V(2)
地址对V(1),V(2)
结束选择
如果结束
接下来我
ReDim vRes(0到colCityPairs.Coun
Option Explicit
Private pTrainer1 As String
Private pTrainer2 As String
Private pCity As String
Private pDT As Date
Public Property Get Trainer1() As String
    Trainer1 = pTrainer1
End Property
Public Property Let Trainer1(Value As String)
        pTrainer1 = Value
End Property
Public Property Get Trainer2() As String
    Trainer2 = pTrainer2
End Property
Public Property Let Trainer2(Value As String)
    pTrainer2 = Value
End Property
Public Property Get City() As String
    City = pCity
End Property
Public Property Let City(Value As String)
    pCity = Value
End Property

Public Property Get DT() As Date
    DT = pDT
End Property
Public Property Let DT(Value As Date)
    pDT = Value
End Property
Option Explicit
Option Compare Text
Public cP As cPairs, colP As Collection
Public colCityPairs As Collection
Public vSrc As Variant
Public vRes() As Variant
Public rRes As Range
Public I As Long, J As Long
Public V As Variant
Public sKey As String

Sub FindPairs()
vSrc = Range("A1", Cells(Rows.Count, "C").End(xlUp))
Set colP = New Collection
Set colCityPairs = New Collection

'Collect Pairs
For I = 2 To UBound(vSrc)
    V = Split(Replace(vSrc(I, 3), " ", ""), ",")

    If UBound(V) >= 1 Then
        'sort the pairs
        SingleBubbleSort V

    Select Case UBound(V)
        Case 1
            AddPairs V(0), V(1)

        Case 2
            AddPairs V(0), V(1)
            AddPairs V(0), V(2)
            AddPairs V(1), V(2)
    End Select
    End If
Next I

ReDim vRes(0 To colCityPairs.Count, 1 To 3)
    vRes(0, 1) = "Date"
    vRes(0, 2) = "Location"
    vRes(0, 3) = "Pairs"

For I = 1 To colCityPairs.Count
    With colCityPairs(I)
        vRes(I, 1) = .DT
        vRes(I, 2) = .City
        vRes(I, 3) = .Trainer1 & ", " & .Trainer2
    End With
Next I

Set rRes = Range("E1").Resize(UBound(vRes, 1) + 1, UBound(vRes, 2))
With rRes
    .EntireColumn.Clear
    .Value = vRes
    With .Rows(1)
        .HorizontalAlignment = xlCenter
        .Font.Bold = True
    End With

    .Sort key1:=.Columns(3), order1:=xlAscending, key2:=.Columns(1), order2:=xlAscending, _
            Header:=xlYes
    .EntireColumn.AutoFit

    V = VBA.Array(vbYellow, vbGreen)
    J = 0
    For I = 2 To rRes.Rows.Count
        If rRes(I, 3) = rRes(I - 1, 3) Then
            .Rows(I).Interior.Color = .Rows(I - 1).Interior.Color
        Else
            J = J + 1
            .Rows(I).Interior.Color = V(J Mod 2)
        End If
    Next I
End With
End Sub

Sub AddPairs(T1, T2)

Set cP = New cPairs
With cP
    .Trainer1 = T1
    .Trainer2 = T2
    .City = vSrc(I, 2)
    .DT = vSrc(I, 1)
    sKey = .Trainer1 & "|" & .Trainer2

    On Error Resume Next

    colP.Add cP, sKey
    If Err.Number = 457 Then
        Err.Clear
        colCityPairs.Add colP(sKey), sKey & "|" & colP(sKey).DT & "|" & colP(sKey).City
        colCityPairs.Add cP, sKey & "|" & .DT & "|" & .City
    Else
        If Err.Number <> 0 Then Stop
    End If

    On Error GoTo 0

End With

End Sub

Sub SingleBubbleSort(TempArray As Variant)
'copied directly from support.microsoft.com
    Dim Temp As Variant
    Dim I As Integer
    Dim NoExchanges As Integer

    ' Loop until no more "exchanges" are made.
    Do
        NoExchanges = True

        ' Loop through each element in the array.
        For I = LBound(TempArray) To UBound(TempArray) - 1

            ' If the element is greater than the element
            ' following it, exchange the two elements.
            If TempArray(I) > TempArray(I + 1) Then
                NoExchanges = False
                Temp = TempArray(I)
                TempArray(I) = TempArray(I + 1)
                TempArray(I + 1) = Temp
            End If
        Next I
    Loop While Not (NoExchanges)
End Sub