I am checking the document’s API response and see the parameter “resources” define the format in which the document can be downloaded.
I did some analysis on 2K odd filings and saw the majority of them are in PDF format. Around 1/4 of them have XML+XHTML present. So, my question being
- Why do we have such fewer filings in non-PDF format? What parameters define the content format of the filings?
- If we don’t have non-PDF filing available now would that be made available in the future? Do you guys have any process to convert PDF to non-PDF format and make it available in documents API?
- I also came across Companies House to download daily file. How can we link the individual file from the zip to the response of filings API or documents API if I want to understand more metadata for the filing.
Thanks in Advance.