Find diff between different arrays

Asked

Viewed 50 times

2

GOAL

I have two arrays where it contains different formatted information, but the data is the same, for example on ARRAY1 the CLIENTE1-00 and the same as CLIENTE1 of ARRAY2, I just need to do a DIFFto get what does not belong between array 1 for array2, which would be the failures, these arrays are filled with a SELECT, ARRAY1 -> IBM DB2 and ARRAY2 -> POSTGRES, I use their libs for this ibm_db_dbi and psycopg2.

PROBLEM

I am facing problems with the way the data is inside the arrays, and in formatting the ARRAY1 to the ARRAY2, I don’t know if the problem of formatting ARRAY1 is related to BD API, but whenever I do this intersection I get some error related to ARRAYS.

ARRAY 1

(u'CLIENT1-00', u'SRVDADOS1:OS', u'MICROSOFT WINDOWS SERVER 2008 R2 STANDARD')
(u'CLIENT2-01', u'SRVDADOS2:OS', u'MICROSOFT WINDOWS SERVER 2008 STANDARD')
(u'CLIENT3-01', u'SRVDADOS3:OS', u'MICROSOFT WINDOWS SERVER 2008 STANDARD')
(u'CLIENT4-00', u'SRVDADOS4:OS', u'LINUX CENTOS')
(u'CLIENT5-00', u'SRVDADOS7:OS', u'MAINFRAME')

ARRAY 2

('CLIENT1', 'SRVDADOS1', 'Windows')
('CLIENT2', 'SRVDADOS2', 'Windows')
('CLIENT3', 'SRVDADOS3', 'Windows')
('CLIENT4', 'SRVDADOS4', 'Linux')

UNHINGED OUTPUT

["CLIENTE5","SRVDADOS7","MAINFRAME"]

DETAILING

To COLUMN 1 DO ARRAY1 is the client name field, where the only difference between ARRAY 2 is that there will be a hyphen between them with a numbering ranging from 00 to 01.

To COLUMN 2 DO ARRAY 1 is the field containing the hostname of the server, where the only difference between ARRAY 2 is :OS (I can concatenate in select to add :OS)

To COLUMN 3 DO ARRAY1 is the field that contains the operating system, which I believe will be the biggest problem, because the difference between ARRAY 2 is that it comes along the version of the operating system, I thought to use something similar to LIKE of SQL, for whenever there is the word WINDOWS inside the ARRAY 1 it ignore the rest and look just like WINDOWS, thus being able to compare with the ARRAY 2

Developed to date...

VIEW IN IDEONE.COM

  • And what is the expected output for this example?

  • For example, if there is a CLIENT5 in ARRAY1 with a HOST other than ARRAY2, it must be returned within a new array. newresult = ["CLIENTE5","SRVDADOS7","MAINFRAME"]

  • So for the example of the question the output would be empty? The host and operating system values will not be taken into account?

  • They would be, what I need is to shape so that both arrays are identical to make this comparison, and get what exists in ARRAY1 and does not exist in ARRAY2

  • The array will have about 10mil records, I do not know to what extent it pays to use IF...

  • In the question example, customer 1 has SRVDADOS1:OS in array 1 and SRVDADOS1 array 2. Values are different, so it should be in the output?

  • The data are written differently, but are the same, this is an error that is within the database of ARRAY1, it should not come for example SRVDADOS:OS, but only SRVDADOS. What I need is to transform this information before making the comparison, because if it will not always return as a DIFF, because the data are written differently, but represent the same thing. What would be the desired OUTPUT, is what really does not exist between them...

  • It’s kind of confusing that question...

  • Much, then I will ask you to edit the question and describe in detail what is the process of transformation of this data (of all of them). For example, the name seems to just treat the hyphen, but that’s an assumption. What if the name has more than one hyphen? What if it has no hyphen? What if it has lowercase and uppercase letters? What if in one it is CLIENT1 and the other CLIENT01? On the host the same thing... And on the operating system only makes the situation worse. How will extract the simple OS name from the first array to compare with the second?

  • I will try to explain better in the question, but the problem is complex, because I am stuck to the database of ARRAY1, she is my reference, I have no other place to take as reference.

  • In question in Operating System I thought of using something similar to LIKE of SQL, for when it identifies the word WINDOWS at the beginning within ARRAY1, already modify only for WINDOWS.

  • @Andersoncarloswoss, I reformulated, see if it facilitates understanding.

Show 7 more comments

1 answer

2


Friend, as your arrays have partially different values you will need to do the comparisons manually, you will not have a python ready method to help you.
If the data that are equal between the 2 arrays were equal you could do so:

array1 = ['um', 'dois', 'três', 'quatro']
array2 = ['um', 'dois']
print(list(set(array1) - set(array2))) 

The output of this command is:

['quatro', 'três']

But in your case, you will have to compare if the element of an array contains the element of the other array, it can be done this way:

array1 = [(u'CLIENT1-00', u'SRVDADOS1:OS', u'MICROSOFT WINDOWS SERVER 2008 R2 STANDARD'),
(u'CLIENT2-01', u'SRVDADOS2:OS', u'MICROSOFT WINDOWS SERVER 2008 STANDARD'),
(u'CLIENT3-01', u'SRVDADOS3:OS', u'MICROSOFT WINDOWS SERVER 2008 STANDARD'),
(u'CLIENT4-00', u'SRVDADOS4:OS', u'LINUX CENTOS'),
(u'CLIENT6-00', u'SRVDADOS6:OS', u'LINUX CENTOS'),
(u'CLIENT7-00', u'SRVDADOS7:OS', u'LINUX CENTOS')]

array2 = [('CLIENT1', 'SRVDADOS1', 'Windows'),
('CLIENT2', 'SRVDADOS2', 'Windows'),
('CLIENT3', 'SRVDADOS3', 'Windows'),
('CLIENT4', 'SRVDADOS4', 'Linux'),
('CLIENT5', 'SRVDADOS5', 'Linux')]

new_result = []
#Procura o que tem no array2 e que não tem no array1
for row2 in array2:
  for row1 in array1:
    if row1[0].upper().__contains__(row2[0].upper()) and row1[1].upper().__contains__(row2[1].upper()) and row1[2].upper().__contains__(row2[2].upper()):
      #Se encontrou então passa pro próximo.
      break
    if row1[0] == array1[len(array1)-1][0] and not(row1[0].upper().__contains__(row2[0].upper()) and row1[1].upper().__contains__(row2[1].upper()) and row1[2].upper().__contains__(row2[2].upper())):
    #Se não encontrou adiciona o valor no new_result 
      new_result.append(row2)

#Procura o que tem no array1 e que não tem no array2
for row1 in array1:
  for row2 in array2:
    if row1[0].upper().__contains__(row2[0].upper()) and row1[1].upper().__contains__(row2[1].upper()) and row1[2].upper().__contains__(row2[2].upper()):
      #Se encontrou então passa pro próximo.
      break
    if row2[0] == array2[len(array2)-1][0] and not(row1[0].upper().__contains__(row2[0].upper()) and row1[1].upper().__contains__(row2[1].upper()) and row1[2].upper().__contains__(row2[2].upper())):
      #Se não encontrou adiciona o valor no new_result
      new_result.append(row1)

print(new_result);

You can test the code above here

  • 1

    Man, if I gave you two reply check, it was excellent, congratulations!!

  • 1

    Perhaps it would be interesting to use the operator in instead of calling the method __contains__ directly

Browser other questions tagged

You are not signed in. Login or sign up in order to post.