Javascript 在字符串中查找重复项
我在工作中接到一个项目,需要从数据集中的多行中找到重复的配对。虽然数据集要大得多,但主要部分围绕培训日期、培训地点和培训师姓名展开。因此,每一行数据都有一个日期、一个位置,然后是一个逗号分隔的名称列表:Javascript 在字符串中查找重复项,javascript,python,excel,Javascript,Python,Excel,我在工作中接到一个项目,需要从数据集中的多行中找到重复的配对。虽然数据集要大得多,但主要部分围绕培训日期、培训地点和培训师姓名展开。因此,每一行数据都有一个日期、一个位置,然后是一个逗号分隔的名称列表: Date Location Names 1/13/2014 Seattle A, B, D 1/16/2014 Dallas C, D, E 1/20/2014 New York A, D 1/23/2014 Dallas C, E 1
Date Location Names
1/13/2014 Seattle A, B, D
1/16/2014 Dallas C, D, E
1/20/2014 New York A, D
1/23/2014 Dallas C, E
1/27/2014 Seattle B, D
1/30/2014 Houston C, A, F
2/3/2014 Washington DC D, A, F
2/6/2014 Phoenix B, E
2/10/2014 Seattle C, B
2/13/2014 Miami A, B, E
2/17/2014 Miami C, D
2/20/2014 New York B, E, F
2/24/2014 Houston A, B, F
我的目标是能够找到具有相似名称对的行。一个例子是知道A&B在西雅图于1月13日配对,迈阿密于2月13日配对,休斯顿于2月24日配对,尽管第三个名字每次都不同。因此,我不只是在整个名称字符串中查找重复项,我还希望在“names”列的部分段中查找配对项
这是否可以在Excel中实现,或者我需要使用编程语言来完成任务
虽然我可以手动完成这项工作,但它代表了可以用于其他事情的大量时间。如果有一种方法,我可以自动化这一点,这将使我的这部分任务简单得多
提前感谢您在前进道路上提供的任何帮助或建议 好的。我觉得很无聊,用Python代码做了这一切。我想你熟悉这门语言;但是,您应该能够在安装了Python的任何计算机上运行以下代码 我做了一些假设。例如,我使用了您的示例输入作为确定输入 有几件事会把计划搞砸:
- 不区分大小写输入。注意大写字母等
- 具有具有以下行的输入文件:“日期位置名称”。只需删除并在文件中保留直接事实即可。我很懒,不想调整这个
- 一吨其他小东西。只要按程序要求去做,不要输入时髦的输入
def readWord(line, stringIndex):
word = ""
while(line[stringIndex] != " "):
word += line[stringIndex]
stringIndex += 1
return word, stringIndex
def removeSpacing(line, stringIndex):
while(line[stringIndex] == " "):
stringIndex += 1
return stringIndex
def readPeople(line, stringIndex):
lineSize = len(line)
people = []
while(stringIndex < lineSize):
people.append(line[stringIndex])
stringIndex += 3
return people
def readLine(travels, line):
stringIndex = 0
date, stringIndex = readWord(line, stringIndex)
stringIndex = removeSpacing(line, stringIndex)
location, stringIndex = readWord(line, stringIndex)
stringIndex = removeSpacing(line, stringIndex)
people = readPeople(line, stringIndex)
for person in people:
if(person not in travels.keys()):
travels[person] = set()
travels[person].add((date, location))
return travels
def main():
f = open(input("Enter filename (must be in same folder as this program code. For instance, name could be: testDocument.txt\n\n"))
travels = dict()
for line in f:
travels = readLine(travels, line)
print("\n\n\n\n PROGRAM RUNNING \n \n")
while(True):
persons = []
userInput = "empty"
while(userInput):
userInput = input("Enter person name (Type Enter to finish typing names): ")
if(userInput):
persons.append(userInput)
output = travels[persons[0]]
for person in persons[1:]:
output = output.intersection(travels[person])
print("")
for hit in output:
print(hit)
print("\nFINISHED WITH ONE RUN. STARTING NEW ONE\n")
def readWord(行、字符串索引):
word=“”
而(行[stringIndex]!=“”):
word+=行[stringIndex]
stringIndex+=1
返回字,字符串索引
def removeSpacing(行、字符串索引):
而(行[stringIndex]==“”):
stringIndex+=1
返回字符串索引
def readPeople(行、字符串索引):
lineSize=len(行)
人=[]
而(stringIndex
您可以使用VBA来完成。下面的解决方案假设
- 您的数据位于A:C列的活动工作表中
- 您的结果将在列中输出,例如:
- 输出将是一个按对排序的列表,然后按日期排序,这样您就可以很容易地看到对重复的位置
- 该例行程序假定一次不超过三名培训师,但可以进行修改,添加更多可能的组合
- 只有一名培训师的城市将被忽略
正则模
选项显式
选项比较文本
公共cP作为CPAIR,colP作为集合
公共冷空气收集
作为变体的公共vSrc
公共vRes()作为变体
公共rRes范围
公共I等于长,J等于长
公共V作为变体
公共密钥作为字符串
子FindPairs()
vSrc=范围(“A1”,单元格(行数,“C”)。结束(xlUp))
Set colP=新集合
Set colCityPairs=新集合
“收集成双
对于I=2至UBound(vSrc)
V=拆分(替换(vSrc(I,3),“”,“”,“”,“”,“”)
如果UBound(V)>=1,则
“把成对的东西分类
单泡运动
选择案例UBound(V)
案例1
地址对V(0),V(1)
案例2
地址对V(0),V(1)
地址对V(0),V(2)
地址对V(1),V(2)
结束选择
如果结束
接下来我
ReDim vRes(0到colCityPairs.Coun
Option Explicit
Private pTrainer1 As String
Private pTrainer2 As String
Private pCity As String
Private pDT As Date
Public Property Get Trainer1() As String
Trainer1 = pTrainer1
End Property
Public Property Let Trainer1(Value As String)
pTrainer1 = Value
End Property
Public Property Get Trainer2() As String
Trainer2 = pTrainer2
End Property
Public Property Let Trainer2(Value As String)
pTrainer2 = Value
End Property
Public Property Get City() As String
City = pCity
End Property
Public Property Let City(Value As String)
pCity = Value
End Property
Public Property Get DT() As Date
DT = pDT
End Property
Public Property Let DT(Value As Date)
pDT = Value
End Property
Option Explicit
Option Compare Text
Public cP As cPairs, colP As Collection
Public colCityPairs As Collection
Public vSrc As Variant
Public vRes() As Variant
Public rRes As Range
Public I As Long, J As Long
Public V As Variant
Public sKey As String
Sub FindPairs()
vSrc = Range("A1", Cells(Rows.Count, "C").End(xlUp))
Set colP = New Collection
Set colCityPairs = New Collection
'Collect Pairs
For I = 2 To UBound(vSrc)
V = Split(Replace(vSrc(I, 3), " ", ""), ",")
If UBound(V) >= 1 Then
'sort the pairs
SingleBubbleSort V
Select Case UBound(V)
Case 1
AddPairs V(0), V(1)
Case 2
AddPairs V(0), V(1)
AddPairs V(0), V(2)
AddPairs V(1), V(2)
End Select
End If
Next I
ReDim vRes(0 To colCityPairs.Count, 1 To 3)
vRes(0, 1) = "Date"
vRes(0, 2) = "Location"
vRes(0, 3) = "Pairs"
For I = 1 To colCityPairs.Count
With colCityPairs(I)
vRes(I, 1) = .DT
vRes(I, 2) = .City
vRes(I, 3) = .Trainer1 & ", " & .Trainer2
End With
Next I
Set rRes = Range("E1").Resize(UBound(vRes, 1) + 1, UBound(vRes, 2))
With rRes
.EntireColumn.Clear
.Value = vRes
With .Rows(1)
.HorizontalAlignment = xlCenter
.Font.Bold = True
End With
.Sort key1:=.Columns(3), order1:=xlAscending, key2:=.Columns(1), order2:=xlAscending, _
Header:=xlYes
.EntireColumn.AutoFit
V = VBA.Array(vbYellow, vbGreen)
J = 0
For I = 2 To rRes.Rows.Count
If rRes(I, 3) = rRes(I - 1, 3) Then
.Rows(I).Interior.Color = .Rows(I - 1).Interior.Color
Else
J = J + 1
.Rows(I).Interior.Color = V(J Mod 2)
End If
Next I
End With
End Sub
Sub AddPairs(T1, T2)
Set cP = New cPairs
With cP
.Trainer1 = T1
.Trainer2 = T2
.City = vSrc(I, 2)
.DT = vSrc(I, 1)
sKey = .Trainer1 & "|" & .Trainer2
On Error Resume Next
colP.Add cP, sKey
If Err.Number = 457 Then
Err.Clear
colCityPairs.Add colP(sKey), sKey & "|" & colP(sKey).DT & "|" & colP(sKey).City
colCityPairs.Add cP, sKey & "|" & .DT & "|" & .City
Else
If Err.Number <> 0 Then Stop
End If
On Error GoTo 0
End With
End Sub
Sub SingleBubbleSort(TempArray As Variant)
'copied directly from support.microsoft.com
Dim Temp As Variant
Dim I As Integer
Dim NoExchanges As Integer
' Loop until no more "exchanges" are made.
Do
NoExchanges = True
' Loop through each element in the array.
For I = LBound(TempArray) To UBound(TempArray) - 1
' If the element is greater than the element
' following it, exchange the two elements.
If TempArray(I) > TempArray(I + 1) Then
NoExchanges = False
Temp = TempArray(I)
TempArray(I) = TempArray(I + 1)
TempArray(I + 1) = Temp
End If
Next I
Loop While Not (NoExchanges)
End Sub