Topic: Building a synonym catalog
Building a synonym catalog
Hi everybody,
I am trying to build a text-file synonym catalog in order to standardize acronyms used for the legal structure of some companies. My problem is that some acronyms end with a comma or even semicolon. Nevertheless the synonyms should be comma-separeted!? How can I define the synonym for the following example:
"Ltd,"
Thank you.
I am trying to build a text-file synonym catalog in order to standardize acronyms used for the legal structure of some companies. My problem is that some acronyms end with a comma or even semicolon. Nevertheless the synonyms should be comma-separeted!? How can I define the synonym for the following example:
"Ltd,"
Thank you.
Hi Juan,
This is a limitation of the text file synonym catalogs. But fortunately it is not a limitation of datastore based synonym catalogs. So if you could represent your synonyms in a normalized format in a CSV file or database or so, then you should be able to fix it.
Here's your basic structure (in CSV format):
Hope it works out.
This is a limitation of the text file synonym catalogs. But fortunately it is not a limitation of datastore based synonym catalogs. So if you could represent your synonyms in a normalized format in a CSV file or database or so, then you should be able to fix it.
Here's your basic structure (in CSV format):
"master","synonym"
"Limited","Ltd"
"Limited","Ltd,"
Hope it works out.
Hi Kasper,
thank you very much for the suggestion - it works fine. As a matter of principle if I have a SQL table with 2 columns of type varchar to replicate the structure of the CSV file then is the quote character i.e. " " still required?
Furthermore do you have an example on how to use regular expressions in order to replace certain strings?
By the way great software :-)
Thank you!
thank you very much for the suggestion - it works fine. As a matter of principle if I have a SQL table with 2 columns of type varchar to replicate the structure of the CSV file then is the quote character i.e. " " still required?
Furthermore do you have an example on how to use regular expressions in order to replace certain strings?
By the way great software :-)
Thank you!
Hi Juan,
No you dont need to quotes when using another data storage. The example was just made with quoted CSV values so you could copy-paste it if you liked.
With regard to regexes - have you checked out the webcast called 'Regex parsing'? I think this might help you.
No you dont need to quotes when using another data storage. The example was just made with quoted CSV values so you could copy-paste it if you liked.
With regard to regexes - have you checked out the webcast called 'Regex parsing'? I think this might help you.
Hi Kasper,
unfortunately I get always an error when trying to use the same synonym catalog from another datasource (table in MS SQL with identical values like in the CSV file, column types nvarchar and unquoted values). Anyway I am happy that it works by way of CSV.
The Regex parsing webcast is helpful - thank you. Which regular expression syntax (or "flavor") is supported i.e. JavaScript?
Thank you for your patience and support.
Best regards,
Juan
unfortunately I get always an error when trying to use the same synonym catalog from another datasource (table in MS SQL with identical values like in the CSV file, column types nvarchar and unquoted values). Anyway I am happy that it works by way of CSV.
The Regex parsing webcast is helpful - thank you. Which regular expression syntax (or "flavor") is supported i.e. JavaScript?
Thank you for your patience and support.
Best regards,
Juan
Hi Juan,
Sorry to hear about the error you get. Feel free to share more details - maybe it can be fixed. But good that it works with the CSV at least.
The regex "flavour" is Java:
http://docs.oracle.com/javase/tutorial/essential/regex/
Sorry to hear about the error you get. Feel free to share more details - maybe it can be fixed. But good that it works with the CSV at least.
The regex "flavour" is Java:
http://docs.oracle.com/javase/tutorial/essential/regex/
Hi Kasper,
I don't know if this is going to work but here is the link to a screenshot of the error message:
[http://dl.dropbox.com/u/385662/Screenshot.png]
I don't know if this is going to work but here is the link to a screenshot of the error message:
[http://dl.dropbox.com/u/385662/Screenshot.png]
Furthermore I noticed that after adding this new datastore synonym catalog (from MS SQL) and subsequently trying to edit it I get the following error message:
http://dl.dropbox.com/u/385662/Screenshot2.png
http://dl.dropbox.com/u/385662/Screenshot2.png
Hi Juan,
Can you take a look in the log directory of DataCleaner and if you can email me the most recent log, that would be great. Then at least I have more details to work out a detailed bug report.
Just send it to kasper.sorensen@humaninference.com
BR,
Kasper
Can you take a look in the log directory of DataCleaner and if you can email me the most recent log, that would be great. Then at least I have more details to work out a detailed bug report.
Just send it to kasper.sorensen@humaninference.com
BR,
Kasper
Hi Juan,
Have investigated your datastore synonym issue a bit more. Are you using the latest version of DataCleaner (version 2.5.2). It seems to me that the SQL quoting issue is an old one, that should have been fixed in this version.
Have investigated your datastore synonym issue a bit more. Are you using the latest version of DataCleaner (version 2.5.2). It seems to me that the SQL quoting issue is an old one, that should have been fixed in this version.
Hi Juan,
If I am not correct in the above, can you tell me which JDBC driver you are using? I have just tried with version 4 of the MS JDBC driver. You could also try the JTDS driver which is bundled with DC.
If I am not correct in the above, can you tell me which JDBC driver you are using? I have just tried with version 4 of the MS JDBC driver. You could also try the JTDS driver which is bundled with DC.
Hi Kasper,
apologize for the delay, nevertheless I can confirm that with the JDTS driver and DC version 2.5.2 the synonym catalog works fine. Thank you!
Furthermore I use version 4 of the JDBC driver....
Best regards,
Juan
apologize for the delay, nevertheless I can confirm that with the JDTS driver and DC version 2.5.2 the synonym catalog works fine. Thank you!
Furthermore I use version 4 of the JDBC driver....
Best regards,
Juan
GREAT! I am really glad to hear that, was starting to wonder if there was some devil in the detail that I had missed :-)
Happy data cleaning then.
Happy data cleaning then.
Log in by clicking the login link at the top of the screen
