Coding error in basic company info data (UTF-8)

Hi all, there seems an issue on the BasicCompanyDataAsOneFile-2023-08-01.zip (443Mb), downloaded at 18th Aug 2023.

Here is an example (company number 13944857), the special character become 3f which is not right in UTF-8 so they become question mark.

I checked the encoding, and the past releases had the right code, “e2 80 99” for the special character.

Anyone else see the same issue or any advice? Could it be fixed please? Thanks.

Sorry if it’s too technical. To put it simple, the company name of 13944857 in the bulk file is different from the website (the special character).

Cheng

Hello @wccheng I see the same issue (with ‘s being stated as ?s). This appears to be a fairly widespread issue on the August 2023 BasicCompanyDataAsOneFile file, affecting around 7300 data rows. Issue is not limited to just the Company Name but is present on instances in any field on the feed.

The same issue is also present on the ‘Company data as multiple files’ with part 2 being tested.

For now – I will code around this issue – hopefully the site admin’s will pick up on this issue next week.

Good spot - thank you. :+1:

1 Like

Yes, we are aware of the encoding issue. A change was made to the process that creates these bulk files and the encoding was not correct.
We have corrected that and will re-run the bulk files.

1 Like

Appreciate, everyone (@mh.hunt @MArkWilliams )!

The issue has been resolved and the file has been regenerated.
The frontend processing scripts are running now so the file will be available in the next 20 mins, so hopefully around 09:30hrs today (Tue 22nd Aug)

The encoding of the new release looks correct now, at least correct on my computer. Thank you so much!