How to write the Preview Data available on Merge Duplicates component?
I would like to write the same preview data output available on merge duplicates component, in order to create a synonym catalog. I mean, the survivor will be the reference data and non survivors will be the synonyms.
This is not really simple, as they'll be in their own rows, and DataCleaner mostly works on a row-by-row basis.
You'll probably need to do it in multiple jobs: Include the group ID and status, then write it to a temporary file. In a new job you can then filter on the status, picking all the non-survivor states, and then use the table lookup on the same file to find the master term of the group id (you can use an expression-based column to easily filter the survivor records). Then you can write the new output data from the original and lookup-up column.
I tried to follow your instructions without success. The Merge duplicates component does not have the group_id column. The options available for "Links All rows to" are Merge Status, Record_Id (merged) and Messages (merged). There is no option to include group_id and status together.
I somehow missed your reply until now, sorry about that.
The merge component outputs all rows that is mapped as an input column. So to have group_id in the output as well, adding it as an input column should make it available in the output.
Please note that we have an issue in 5.1.5 that makes the dialog not update the output datastreams until it has been closed and reopened, so you wont see that "group_id (Merged)" has been added until then.