Company Name - Reg ex pattern

Hi there,

Is there any reg ex pattern we need to follow to validate the company name before we trigger search?

I see company names are more of mix of special characters however just concerned if we add XML tag or some character with different format (different language…)

Thanks.

1 Like

The Companies House XML Gateway lists data formats in their data schema (these are available from Companies House XML Gateway Input - Schema Status) for at:
http://xmlgw.companieshouse.gov.uk/v1-0/schema/chbase-v2-5.xsd

The Registered Company Name - minLength 1, maxLength 160

They capitalise company names (see Formatting of company names using all caps), but I believe search is case-insensitive here. In theory they’d normalise spacing e.g. convert multiple spaces to single but it seems this is not the case (Names with double spaces). I don’t know if the matching algorithm ignores number of spaces but suspect so.

company names are more of mix of special characters

Do you mean that they allow characters from the unicode range? Here’s an example:

FC011780 - this is COÖPERATIEVE RABOBANK U.A.

however just concerned if we add XML tag or some character with different format (different language…)

I’m not sure what you’re asking - the JSON input to search in the API is in the UTF-8 character set. The search seems to be reasonably sensible e.g. searching for “Cooperatieve Rabobank” (without umlaut o) brings up the company FC011780 above as the first item. I don’t know how far you can take this - but then I believe most company names are registered using standard ASCII characters.

I don’t know what you’re referring to with the “XML tag” part. Is that something to do with the XML gateway? If so then maybe try posting to that forum.

Hi, Thanks for the reply.

reg xml tags, I meant, can the company name be like “ABC <xyz>PQR123</xyz> PLC”.

Ah, understand. You’re issuing a “GET” so the company name when you search should be url encoded. So special characters you input would be as per that. I don’t know how CH would interpret e.g. UTF-8 characters. You can always test e.g. get your browser / javascript to send something! My previous:

…wasn’t correct e.g. here you’re not sending in JSON, it’s url-encoded strings.

As to what data you may get out, the UTF-8 / JSON comment stands. As to odd characters which are present in company names, here’s a selection I’ve encountered:

&, quotes, dashes, punctuation, dots
07881928 - &OFFICES CANARY WHARF LIMITED
FC004087 - “EL AL” ISRAEL AIRLINES LIMITED
02464812 - LLOYD’S OF LONDON (CASSIDY MEMBERS) NOMINEES LIMITED
08013348 - .CO LTD
08209882 - @ LTD
10449726 - — LTD
08209948 - ! LTD
05063820 - ? LTD
05062633 - * LTD
03237285 - … IN FOR A £ LIMITED
11641085 - RS.SAI LIMITED

Some companies are clearly tests - you are escaping characters, right?

05060411 - \ COMPANY LTD
08804157 - SAFDASD & SFSAF ’ SFDAASF" LTD

…especially for your database:

10219186 - - LTD
10542519 - ; DROP TABLE “COMPANIES”;-- LTD

Apparently there were some “long” names (although no examples are given) - but 160 characters is the limit you get back anyway:

There were some issues with odd characters in names but now fixed:

Double spaces in names:

Other notes and queries:

Partnerships can be dissolved and then return with the same name:

Plus a few historical exceptions which will remain as duplicates:

P.S. it seems this question was asked before but not answered:

This is the legislation covering allowable characters:
http://www.legislation.gov.uk/uksi/2009/1085/made

I never knew these were called “solidus”: \ /

Some more quirky names:
11363219 ; (they have exemption from use of Limited in name)
07846401 AUS %U2013 SERVICE LIMITED - did someone urlencode the n-dash by mistake when incorporating it?
11338331 COCO &AMP;AMP; ROX LIMITED - more encoding issues?

We’ve not found any emoji-names yet, probably only a matter of time…

1 Like

Here’s a list of ampersand problems that we have found:

1 Like

And a few more with other marks: quotation, plus, apostrophe:

Wondering how the people at ; (formerly ; LIMITED) pronounce their company name and could only think of:

Nobody’s perfect. Search result looks OK :


…but (how it appears in the API data too, including search):

I’ll let them know…
Chris

Just as a by-the-way some of the companies listed by @dmw will not be found in the API as they’ve been dissolved more than 6 years ago (you can see them in webCHeck).