Jump to content
  • Advertisement
Sign in to follow this  
FableFox

comparing company names

This topic is 2615 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm making a simple app that compare company names from two different list, and link same company with different spelling.

eg:

"Yahoo Public Limited Company" = "Yahoo PLC"
"Oracle Systems Bhd" = "Oracle"
"PT Indo Company TBK" = "Indo Company TBK PT"
"Trololo India" = "Trololo (I)"
"GE" = "General Electric"
"National Bank of India" = "India National Bank"
"Bank Hardcore" = "Hardcore Bank"
"MS" = "MicroSoft"

There are many ways, it seems, to write a company name. And sometimes, due to language, names are being shifted around, and there are long name, short name. So what is the best way to do this?

I was thinking about linked list, and flag score.

like this:

Oregon Fried Chicken
Oregon Fried Fish
Oregon Hospital
Kentucky Hospital.

where each word is put into a linked list, and each company is a linked list in itself.

Oregon Fried Chicken Public Limited Company compared to Oregon Fried Chicken PLC = 3 flag score.
Oregon Fried Chicken Public Limited Company compared to Oregon Fried Fish Public Limited Company = 2 score. This is because Public Limited Company is considered a business word, therefore being made redundant.
Oregon Fried Chicken compared to Oregon Hospital will only get 1 score.
OFC compared to Kentucky Hospital will get zero score (nothing is similar at all).

I plan to build a dictionary out of scores and comparison for future purposes for easy searches.

eg: If a bank called Bank of America is being called (due to grammar and all) America Bank, let say, In China, any report from there using America Bank will refered to as Bank of America. Thing is, even in the same country, there are many variations (like PLC = Public Limited Company, bracket country name India vs (I) or even (India).

This is so that I can take a list of companies in Excel, that came from different country, and being able to match, with a list of companies, from another country, about the same companies.

so if a list comtain

GE,$500
KFC,$400
Yahoo PLC, $200,

and another file contain

Kentucky Friend Chicken, $600
Yahoo Private Limited Company,$100
General Electric,$1000

it can combine into another Excel file,

Kentucky Fried Chicken,$1000
General Electric,$1500
Yahoo PLC,$300.

Don't ask me why I have to do this, or why all those report doesn't follow a standard company name (like a master list or something).
And the dict will keep on growing as it learn more "grammars" of people writing the company names. But I will of course verify each comparison before being added to the dict. And maybe tie in to Country too.
MAS in Malaysia = Malaysian Airline System
MAS in Singapore = Monetary Authority of Singapore.

Please help me here.

It can be either

VBA Excel (2007 or 2010)
VBA Access (2007 or 2010)
Visual Basic 2010 (free one)
C# 2010 (free one)
Combination of VB 2010 and office application sdk (excel or access, 2010 or 2007).

or

PureBasic + SQLLite

It's up to you to help me up to what amount:

ideas, codes, and what not. or maybe utilities / tools that already can do above things, if exist.

Thanks.

Share this post


Link to post
Share on other sites
Advertisement

There are many ways, it seems, to write a company name.


There is precisely one, single way to write a legal name of a registered company. That name is defined down to letter symbols and punctuation. So if you have "BigCo. Ltd", "BigCo Ltd" would not properly refer to said company, or at least is not proper naming of the intended company. It gets even more complicated when you have trademarks.

I plan to build a dictionary out of scores[/quote]

I once had winter tires made by "Nokian" made by company of same name. And they have absolutely nothing to do with phones or electronics.

If a bank called Bank of America is being called (due to grammar and all) America Bank, let say, In China[/quote]

Bank of America must be called "Bank of America", in roman letters, regardless of language. Otherwise, there needs to exist some sort of affiliated company which conducts business inside China, which could use a different name.


Different spelling of names is commonly used by companies. Any non-trivial company today will have several, perhaps dozens and up to hundreds or thousands of different registered companies in different shapes and forms. Purposes range from dealing with local laws, franchising, side-effects of M&A and right up there to money laundering and tax evasion.

In short, a company name is an opaque blob. It cannot be changed in any way. If it's somehow spelled incorrectly, such as on a legal document, said document might no longer be binding since it refers to different company. So by definition, an algorithm that matches such names cannot exist.

Don't ask me why I have to do this, or why all those report doesn't follow a standard company name (like a master list or something).[/quote]

If this is intended for corporate use, you're on a fools errand. Not only can it not be done, must not be done, it will also mismatch too many cases. If it's for a pet project, look into various text matching algorithms under Natural Language Processing.


Also related is that registered company name is rarely associated with brand. A company is rarely a single address operation. So majority of names listed above are trademarks (which also cannot be changed in any way). The company itself, if it exists in single place, will almost always be something slightly different. The lower down the profit ladder one goes, the uglier it gets.

Share this post


Link to post
Share on other sites
It's relatively simple to match a string against a predefined set of synonyms. If you are aiming to do this procedurally (i.e. by guessing which strings are synonymous) then you're on a fools' errand.

Share this post


Link to post
Share on other sites
Maybe I didn't put enough information.

1) I only plan to compare names, not products, etc. so it's a name vs name (except it's from different people / company / countries).

2) There are many way to write legal name: PLC and Public Limited Company is both legal. Sdn Bhd, Sendirian Berhad, SB is all legal. This is the main reason all business word is being removed from the comparison.

3) It's not for legal process, it's for data comparison and mining. Ironically, while above samples is made out of pupular and some are created name, they are all real life samples.

4) the reason why i'm building a dict and tied to a right name (in my opinion, maybe the real legal name) so that next time I receive a data, I already know what what. So in the case of Bank of America, if I receive a report from China that write it as America Bank, it no longer need to be compared, there is already a direct link. if report country == china, america bank = bank of america.

5) the biggest problem is the business words. PT, of, SB, PLC,etc, which is why I plan to get the main word only. All comparison will go through my eye before being entered into the dictionary for future link/tie/comparison.

The current problem is that it can't be compared to string by string, due to business names (PLC and it's long form). and some uses short form (GE for General Electric). Of course I know GE IS General Electric from the address, website, etc listed in the excel file. What I need is a comparison and combiner software that can be told that, and remember it in the future. In other words, the dictionary is like this

Yahoo Inc = (Yahoo, Yahoo!, Yahoo Inc, Yahoo Incorporated)
General Electric = (GE)
MAS = (Malaysian Airline System, Malaysian Airlines)

so the number of names linked to certain names is based on the report received and it will keep on learning. And I will keep on updating the software to fit the situation.

Currently I plant to use Pure Basic. I have experience in it, and it can be used to create portable application, and it's built in SQL Lite is good for the dictionary. I'll just export the excel into a file it can read, edit it, and turn it into a file that excel can import.

Maybe as a start I'll remove the "business names" (eg. PLC).

Still, I'm reading into VBA to see it's better that way.

Anyway, thank's for all the input.

Share this post


Link to post
Share on other sites
I googled "Natural Language Processing" and it's too overkill for my project. I understand what it trying to do, but it's like using ORACLE for something that SQL Lite is enough. You know, like using ORACLE for software such as KeePas.

It's a hobby project to help with my work. It doesn't have to be perfect on day one. Any hour it can shave off is good enough. That is why dictionary and scores is important. The more names and it's variation I encounter, the smarter the software can be, and the more hours it can save.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!