back to forum.

Topic: Building a synonym catalog

Topic by
juan

2012-06-12
19:30

Building a synonym catalog

Hi everybody,

I am trying to build a text-file synonym catalog in order to standardize acronyms used for the legal structure of some companies. My problem is that some acronyms end with a comma or even semicolon. Nevertheless the synonyms should be comma-separeted!? How can I define the synonym for the following example:

"Ltd,"

Thank you.

Reply by
kasper

2012-06-12
22:17
Hi Juan,

This is a limitation of the text file synonym catalogs. But fortunately it is not a limitation of datastore based synonym catalogs. So if you could represent your synonyms in a normalized format in a CSV file or database or so, then you should be able to fix it.

Here's your basic structure (in CSV format):

"master","synonym"
"Limited","Ltd"
"Limited","Ltd,"

Hope it works out.

Reply by
juan

2012-06-13
19:47
Hi Kasper,

thank you very much for the suggestion - it works fine. As a matter of principle if I have a SQL table with 2 columns of type varchar to replicate the structure of the CSV file then is the quote character i.e. " " still required?

Furthermore do you have an example on how to use regular expressions in order to replace certain strings?

By the way great software :-)

Thank you!

Reply by
kasper

2012-06-14
09:04
Hi Juan,

No you dont need to quotes when using another data storage. The example was just made with quoted CSV values so you could copy-paste it if you liked.

With regard to regexes - have you checked out the webcast called 'Regex parsing'? I think this might help you.

Reply by
juan

2012-06-14
16:22
Hi Kasper,

unfortunately I get always an error when trying to use the same synonym catalog from another datasource (table in MS SQL with identical values like in the CSV file, column types nvarchar and unquoted values). Anyway I am happy that it works by way of CSV.

The Regex parsing webcast is helpful - thank you. Which regular expression syntax (or "flavor") is supported i.e. JavaScript?

Thank you for your patience and support.

Best regards,
Juan

Reply by
kasper

2012-06-14
16:48
Hi Juan,

Sorry to hear about the error you get. Feel free to share more details - maybe it can be fixed. But good that it works with the CSV at least.

The regex "flavour" is Java:

http://docs.oracle.com/javase/tutorial/essential/regex/

Reply by
juan

2012-06-14
17:02
Hi Kasper,

I don't know if this is going to work but here is the link to a screenshot of the error message:

[http://dl.dropbox.com/u/385662/Screenshot.png]

Reply by
juan

2012-06-14
17:23
Furthermore I noticed that after adding this new datastore synonym catalog (from MS SQL) and subsequently trying to edit it I get the following error message:

http://dl.dropbox.com/u/385662/Screenshot2.png

Reply by
kasper

2012-06-15
09:35
Hi Juan,

Can you take a look in the log directory of DataCleaner and if you can email me the most recent log, that would be great. Then at least I have more details to work out a detailed bug report.

Just send it to kasper.sorensen@humaninference.com

BR,
Kasper

Reply by
kasper

2012-06-19
17:07
Hi Juan,

Have investigated your datastore synonym issue a bit more. Are you using the latest version of DataCleaner (version 2.5.2). It seems to me that the SQL quoting issue is an old one, that should have been fixed in this version.

Reply by
kasper

2012-06-21
13:02
Hi Juan,

If I am not correct in the above, can you tell me which JDBC driver you are using? I have just tried with version 4 of the MS JDBC driver. You could also try the JTDS driver which is bundled with DC.

Reply by
juan

2012-06-26
09:22
Hi Kasper,

apologize for the delay, nevertheless I can confirm that with the JDTS driver and DC version 2.5.2 the synonym catalog works fine. Thank you!

Furthermore I use version 4 of the JDBC driver....

Best regards,
Juan

Reply by
kasper

2012-06-26
10:05
GREAT! I am really glad to hear that, was starting to wonder if there was some devil in the detail that I had missed :-)

Happy data cleaning then.

You need to be logged in to participate

In order to post your own comments on this topic, you need to be logged in.

Username:

Log in by clicking the login link at the top of the screen