Document API ignoring requested document format

alex_g · November 12, 2019, 1:36pm

When querying the document api with the ‘Accept = application/xhtml+xml’ often returns a PDF essentially ignoring the requested document type. My current work around is this is below (in python) however it requires multiple attempts (set to 4 but has taken up to 8 on occasion), given the rate limiting this is very time inefficient. Thus I have three questions:

is there a better way of doing this?
Has anyone else encountered this?

(To our friends in the companies house dev team) can this be investigated? (it will reduce the server load if we don’t have to query it multiple time with the same request…)

 request_params = {'Accept': 'application/xhtml+xml'}
 n_api_attempts = 0
 wrong_content_type = True
 while wrong_content_type & (n_api_attempts <= 3):
     n_api_attempts += 1

     doc_output = Companies_House_Api.query_document_api(docs_url, api_key, doc_id, request_params)

     if doc_output.status_code == 200:
         if 'Content-Type' in doc_output.headers.keys():
             if doc_output.headers['Content-Type'] == 'application/xhtml+xml':
                 wrong_content_type = False

     if wrong_content_type & (n_api_attempts <= 3):
         print('retrying...')
         time.sleep(1)

note - this code works so there is no problems with the querying function, you request exactly the same thing multiple time until you get the right answer.

Cheers