The following will be a big issue for us going forward as if it’s not resolved we’re going to need to get new internal PDF compnonents (and those aren’t cheap) plus do all the coding, which includes a lot of legacy systems, to switch from one to the other.
Ignore references to attachments as this is copied from an email:
On the developer forum someone commented that some files do not render in Firefox (Doc images do not render in FireFox). Chris Smith responded that this was a known issue related to Firefox’s PDF rendering and it was being investigated.
However I’ve been noticing some odd results with some files that suggest to me that this might be a bigger problem. For me this manifests itself in errors when trying to manipulate documents using a third party PDF component (Aspose.pdf). However, it’s worth noting that this is not universal and I’ve tried some other third party services that don’t have problems with these files.
It’s also worth noting that not all images have problems rendering in Firefox. Some are fine. I can’t say for certain but I think the distinction is between documents that have been filed electronically and documents that have been scanned.
For the purpose of this I’m looking at two documents from the filing history of XYZ LIMITED
(02533344) (https://beta.companieshouse.gov.uk/company/02533344/filing-history)
The first is the first item in the filing history, an annual return submitted electronically on the first of June. I’m going to call this DOCUMENT E.
The second is the second item in the filing history, this is an MRO4 Satisfaction of Charge submitted on paper mid-may this year and scanned. I’m going to call this DOCUMENT S.
DOCUMENT E
Renders in Chrome
Renders in IE
Does not render in Firefox
Downloaded from beta API - Crashes Aspose.pdf
Downloaded from CH Direct – Does not crash Aspose.pdf
DOCUMENT S
Renders in Chrome
Renders in IE
Renders in Firefox
Downoaded from beta API - Does not crash Aspose.pdf
Downloaded from CH Direct – Does not crash Aspose.pdf
I think the last results are the key. The version of Document E from the API causes problems, the version from CH Direct does not. I’m calling these Eapi and Edirect and have attached them to this email.
There’s not much difference between them. A few bytes and they have been produced by different versions of the libtiff/tiff2pdf.
I ran them both (as well as the API version of Document S) through the PDF validator at pdf-tools.com. The results are meaningless to me as I don’t speak PDF, but it’s clear that both the Scanned API document and the Electronically filed CH Direct document don’t have the same issues the the Electronically filed API document has.
Validating file “Sapi.pdf” for [SCANNED DOCUMENT DOWNLOADED THROUGH API] conformance level pdfa-1b
The key Metadata is required but missing.
A device-specific color space (DeviceGray) without an appropriate output intent is used.
The document does not conform to the requested standard.
The document contains device-specific color spaces.
The document’s meta data is either missing orinconsistent or corrupt.
Done.
Validating file “Edirect.pdf” for [ELECTRONIC DOC DOWNLOADED THROUGH BETA API]
conformance level pdfa-1b
The key Metadata is required but missing.
A device-specific color space (DeviceGray) without an appropriate output intent is used.
The document does not conform to the requested standard.
The document contains device-specific color spaces.
The document’s meta data is either missing or inconsistent or corrupt.
Done.
Validating file “Eapi.pdf” [ELECTRONIC DOC DOWNLOADED THROUGH API]
for conformance level pdfa-1b
The ‘xref’ keyword was not found or the xref table is malformed.
The file trailer dictionary is missing or invalid.
The key Metadata is required but missing.
The “Length” key of the stream object is wrong.
The separator before ‘endstream’ must be an EOL. (6)
A device-specific color space (DeviceGray) without an appropriate output intent is used.
The “Length” key of the stream object is wrong.
The “Length” key of the stream object is wrong.
The “Length” key of the stream object is wrong.
The “Length” key of the stream object is wrong.
The “Length” key of the stream object is wrong.
The document does not conform to the requested standard.
The file format (header, trailer, objects, xref, streams) is corrupted.
The document contains device-specific color spaces.
The document’s meta data is either missing or inconsistent or corrupt.
Done.
Based on all that my tentative conclusion is that there is an issue with PDF documents that are being generated by the new API system where the original form was submitted electronically.
This issue manifests itself as some kind of corruption/non-comformation in some PDF rendering engines (including Firefox and Aspose.pdf, pdf-tools.com) but not others (Chrome, IE, Adobe).
The same issue does not seem to effect documents generated by the new API when the original document is scanned or the documents are downloaded through CH direct.
Hope that helps troubleshoot it as this has potential to be a major issue for us.