2
GOAL
I have two arrays where it contains different formatted information, but the data is the same, for example on ARRAY1 the CLIENTE1-00 and the same as CLIENTE1 of ARRAY2, I just need to do a DIFF
to get what does not belong between array 1 for array2, which would be the failures, these arrays are filled with a SELECT
, ARRAY1 -> IBM DB2 and ARRAY2 -> POSTGRES, I use their libs for this ibm_db_dbi
and psycopg2
.
PROBLEM
I am facing problems with the way the data is inside the arrays, and in formatting the ARRAY1 to the ARRAY2, I don’t know if the problem of formatting ARRAY1 is related to BD API, but whenever I do this intersection I get some error related to ARRAYS.
ARRAY 1
(u'CLIENT1-00', u'SRVDADOS1:OS', u'MICROSOFT WINDOWS SERVER 2008 R2 STANDARD')
(u'CLIENT2-01', u'SRVDADOS2:OS', u'MICROSOFT WINDOWS SERVER 2008 STANDARD')
(u'CLIENT3-01', u'SRVDADOS3:OS', u'MICROSOFT WINDOWS SERVER 2008 STANDARD')
(u'CLIENT4-00', u'SRVDADOS4:OS', u'LINUX CENTOS')
(u'CLIENT5-00', u'SRVDADOS7:OS', u'MAINFRAME')
ARRAY 2
('CLIENT1', 'SRVDADOS1', 'Windows')
('CLIENT2', 'SRVDADOS2', 'Windows')
('CLIENT3', 'SRVDADOS3', 'Windows')
('CLIENT4', 'SRVDADOS4', 'Linux')
UNHINGED OUTPUT
["CLIENTE5","SRVDADOS7","MAINFRAME"]
DETAILING
To COLUMN 1 DO ARRAY1 is the client name field, where the only difference between ARRAY 2 is that there will be a hyphen between them with a numbering ranging from 00 to 01.
To COLUMN 2 DO ARRAY 1 is the field containing the hostname of the server, where the only difference between ARRAY 2 is :OS (I can concatenate in select to add :OS)
To COLUMN 3 DO ARRAY1 is the field that contains the operating system, which I believe will be the biggest problem, because the difference between ARRAY 2 is that it comes along the version of the operating system, I thought to use something similar to LIKE of SQL, for whenever there is the word WINDOWS inside the ARRAY 1 it ignore the rest and look just like WINDOWS, thus being able to compare with the ARRAY 2
Developed to date...
And what is the expected output for this example?
– Woss
For example, if there is a CLIENT5 in ARRAY1 with a HOST other than ARRAY2, it must be returned within a new array. newresult = ["CLIENTE5","SRVDADOS7","MAINFRAME"]
– Luis Henrique
So for the example of the question the output would be empty? The host and operating system values will not be taken into account?
– Woss
They would be, what I need is to shape so that both arrays are identical to make this comparison, and get what exists in ARRAY1 and does not exist in ARRAY2
– Luis Henrique
The array will have about 10mil records, I do not know to what extent it pays to use IF...
– Luis Henrique
In the question example, customer 1 has
SRVDADOS1:OS
in array 1 andSRVDADOS1
array 2. Values are different, so it should be in the output?– Woss
The data are written differently, but are the same, this is an error that is within the database of ARRAY1, it should not come for example SRVDADOS:OS, but only SRVDADOS. What I need is to transform this information before making the comparison, because if it will not always return as a DIFF, because the data are written differently, but represent the same thing. What would be the desired OUTPUT, is what really does not exist between them...
– Luis Henrique
It’s kind of confusing that question...
– Luis Henrique
Much, then I will ask you to edit the question and describe in detail what is the process of transformation of this data (of all of them). For example, the name seems to just treat the hyphen, but that’s an assumption. What if the name has more than one hyphen? What if it has no hyphen? What if it has lowercase and uppercase letters? What if in one it is
CLIENT1
and the otherCLIENT01
? On the host the same thing... And on the operating system only makes the situation worse. How will extract the simple OS name from the first array to compare with the second?– Woss
I will try to explain better in the question, but the problem is complex, because I am stuck to the database of ARRAY1, she is my reference, I have no other place to take as reference.
– Luis Henrique
In question in Operating System I thought of using something similar to LIKE of SQL, for when it identifies the word WINDOWS at the beginning within ARRAY1, already modify only for WINDOWS.
– Luis Henrique
@Andersoncarloswoss, I reformulated, see if it facilitates understanding.
– Luis Henrique