Insertion of foreign keys by the kettle

Asked

Viewed 287 times

0

I need to load some data that are in spreadsheets in a relational database, but I have been facing some challenges regarding the insertion of foreign keys.

In these images you can see that I first insert the project data, the auto increment of the database generates a key, I recover it in the table input and then turn this step with Excel Input of the relief data sheet.

I have only one project and several reliefs related to it.

The project data is in a spreadsheet and the relief data is in a different spreadsheet. inserir a descrição da imagem aqui

inserir a descrição da imagem aqui

inserir a descrição da imagem aqui

When trying to enter no error is returned, but the relief data is not included. It’s a seemingly simple problem, but I’m starting to understand Kettle now.

inserir a descrição da imagem aqui

inserir a descrição da imagem aqui

Thank you in advance!

EDIT2:

inserir a descrição da imagem aqui inserir a descrição da imagem aqui inserir a descrição da imagem aqui

1 answer

1


You are using multiple input sequences:

Excel input - Table input - Table input - Excel input - Table input

Each time an input step is used in the flow, the resulting table of this input is reset, that is, in the next step you only have what was added in the last input. What you need is for these 5 inputs to be made in separate streams on the same KTR, and unified by a common key, using the Join Teps (I recommend Multiway Merge Join, since there are more than 2 streams).

I also see that you are using the "Accept filenames from Previous step" option in your Excel input. This way Excel input will receive the absolute paths of the files by the table that is arriving at the input step, and will not use the desired path in the list of files/directories.

EDIT:

If no other update parameter is required, you can use the "Run SQL statements" step. In this step you can perform queries with variable substitution, that is, table rows that are fed to this step, will be part of the Query. inserir a descrição da imagem aqui In the query you will use, you should put a question mark in the query attributes, these "?" will be replaced in the same order as the parameter list.

Ex. If a table this way reaches the step:

inserir a descrição da imagem aqui

With a query this way:

inserir a descrição da imagem aqui

The query will be executed 2 times, changing the "?" in the id_project order, name and location.

  • In fact, I need to complete the output of the project table in order to recover the auto increment id that will be generated and include it in the relief table. I did a test with the merge, as you suggested, but I can’t visualize how useful it would be for me, since the completion of one flow is necessary for the beginning of the other.

  • I would need an example to help you in this case, it wasn’t very clear to me. Do you want to update the relief table from data from the project table ? Or from project table data you perform searches in the relief table ?

  • That’s it, I want to update the relief table from the project table data. The project has an ID that is generated automatically. What I need is for this automatically generated project ID to be inserted into the relief table, making it possible to link the tables. I’ll insert the print in the post to be clear.

  • As you can see in the image, I entered the project and the automatically generated ID was 157. Now, to realize the relationship with the relief table, I need to retrieve that 157 and insert into these relief table data.

  • I was doing this process manually, the problem is that the bank is in constant use and as there are many tables, ended up becoming unviable.

  • I edited the answer with a possible solution.

  • It only worked when only one amount is recovered. For example, when I need to update several different Ids like you put in your example, it didn’t work. The first number was repeated for everyone, instead of each line receiving a different value. I edited the question with the images to be clearer.

  • It may be necessary to control the data flow with 2 KTR’s inside a JOB, but still think that a WHERE clause in your update would better solve the problem.

Show 3 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.