Latest Article

  • Looking at Company House Profile stream data

    • Streaming process to extract data from the end point
      • HTTP library for sending GET request and handling response from the server. i.e. response = requests(url, auth=(api_key, ”), stream=True)
    • Process response data and save it to a persistent storage. There are many options:
      • kafka
      • relational database – PostgreSQL
      • cloud storage like Google GS, BigQuery or AWS DynamoDB, S3
    • Deploy streaming ETL script in your local server or your preferred cloud provider using serverless functions such as Google Function or AWS Lamda.
    SQLite 
    sqlite3 -header -csv ./data/shared-local-instance.db "select * from company_profile;" > company_profile.csv
    
    or using AWS command line tool
    aws dynamodb scan --table-name company_profile --endpoint-url http://localhost:8000   --max-items 2 --output json > ./export.json
    DynamoDB output
    • Extract data from DynamoDB LOcal using aws command line tool – or SQLite database. eg. aws dynamodb scan –table-name company_profile –endpoint-url http://localhost:8000   –max-items 2 –output json > ./export.json
    • Read exported file and convert it to normal JSON format
    • Create a dataframe for further use or data analysis – Pandas or Spark (provided you are running Spark session)
    DATA_TYPES = {
      "S": lambda x: str(x),
      "B": lambda x: bool(x),
      "N": lambda x: str(x),
      "L": lambda x: cast_value_by_type(x),
      "M": lambda x: cast_value_by_type(x),
    }
    
    # convert DynamoDB json to normal JSON
    def cast_value_by_type(raw):
      raw_type = type(raw)
      
      if raw_type is list:
        raw_list = []
        for index in raw:
          raw_list.append(cast_value_by_type(index))
        raw = raw_list
        return raw
                    
      elif raw_type is dict:
        for key in raw.keys():
          try:
            raw = DATA_TYPES[key](raw[key])
            break
          except:
            raw[key] = cast_value_by_type(raw[key])
        return raw
    # read DynamoDB Json file
    file_path = "/content/export.json"
    dynamo_data = json.loads(open(file_path, 'r').read())
    # DynamoDB json to json
    dynamo_json_data = cast_value_by_type(dynamo_data)
    
    # create Pandas dataframe
    import pandas as pd
    df = pd.read_json("/content/dump-export.json")
    
    # create another dataframe for nested json field
    df1 = pd.json_normalize(df['Items'])
    DynamoDB json to normal JSON

    Company Profile data items description

    Ref: https://developer-specs.company-information.service.gov.uk/streaming-api/resources/companyprofilestream?v=latest

    The company Profile resource data.
    Company accounts information.
    The Accounting Reference Date (ARD) of the company.
    The Accounting Reference Date (ARD) day.
    The Accounting Reference Date (ARD) month.
    The last company accounts filed.
    The date the last company accounts were made up to.
    The type of the last company accounts filed.
    For enumeration descriptions see account_type section in the enumeration mappings.
    Possible values are:
    null
    full
    small
    medium
    group
    dormant
    interim
    initial
    total-exemption-full
    total-exemption-small
    partial-exemption
    audit-exemption-subsidiary
    filing-exemption-subsidiary
    micro-entity
    no-accounts-type-available
    audited-abridged
    unaudited-abridged
    The date the next company accounts are due.
    The date the next company accounts should be made up to.
    Flag indicating if the company accounts are overdue.
    Annual return information. This member is only returned if a confirmation statement has not be filed.
    The date the last annual return was made up to.
    The date the next annual return is due. This member will only be returned if a confirmation statement has not been filed and the date is before 28th July 2016, otherwise refer to confirmation_statement.next_due
    The date the next annual return should be made up to. This member will only be returned if a confirmation statement has not been filed and the date is before 30th July 2016, otherwise refer to confirmation_statement.next_made_up_to
    Flag indicating if the annual return is overdue.
    UK branch of a foreign company.
    Type of business undertaken by the UK establishment.
    Parent company name.
    Parent company number.
    Flag indicating whether this company can file.
    The name of the company.
    The number of the company.
    The status of the company.
    For enumeration descriptions see company_status section in the enumeration mappings
    Possible values are:
    active
    dissolved
    liquidation
    receivership
    administration
    voluntary-arrangement
    converted-closed
    insolvency-proceedings
    registered
    removed
    closed
    open
    Extra details about the status of the company.
    For enumeration descriptions see company_status_detail section in the enumeration mappings.
    Possible values are:
    transferred-from-uk
    active-proposal-to-strike-off
    petition-to-restore-dissolved
    transformed-to-se
    converted-to-plc
    Confirmation statement information (N.B. refers to the Annual Statement where type is registered-overseas-entity)
    The date to which the company last made a confirmation statement.
    The date by which the next confimation statement must be received.
    The date to which the company must next make a confirmation statement.
    Flag indicating if the confirmation statement is overdue
    The date which the company was converted/closed, dissolved or removed. Please refer to company status to determine which.
    The date when the company was created.
    The ETag of the resource.
    Foreign company details.
    Accounts requirement.
    Type of accounting requirement that applies.
    For enumeration descriptions see foreign_account_type section in the enumeration mappings.
    Possible values are:
    accounting-requirements-of-originating-country-apply
    accounting-requirements-of-originating-country-do-not-apply
    Describes how the publication date is derived.
    For enumeration descriptions see terms_of_account_publication section in the enumeration mappings.
    Possible values are:
    accounts-publication-date-supplied-by-company
    accounting-publication-date-does-not-need-to-be-supplied-by-company
    accounting-reference-date-allocated-by-companies-house
    Foreign company account information.
    Date account period starts under parent law.
    Day on which accounting period starts under parent law.
    Month in which accounting period starts under parent law.
    Date account period ends under parent law.
    Day on which accounting period ends under parent law.
    Month in which accounting period ends under parent law.
    Time allowed from period end for disclosure of accounts under parent law.
    Number of months within which to file.
    Type of business undertaken by the company.
    Legal form of the company in the country of incorporation.
    Law governing the company in country of incorporation.
    Is it a financial or credit institution.
    Company origin informations
    Country in which company was incorporated.
    Identity of register in country of incorporation.
    Registration number in company of incorporation.
    The flag indicating if the company has been liquidated in the past.
    The flag indicating if the company has any charges.
    The flag indicating if the company has insolvency history.
    The flag indicating if the company is a Community Interest Company.
    The jurisdiction specifies the political body responsible for the company.
    Possible values are:
    england-wales
    wales
    scotland
    northern-ireland
    european-union
    united-kingdom
    england
    noneu
    The date of last full members list update.
    A set of URLs related to the resource, including self.
    The URL of the persons with significant control list resource.
    The URL of the persons with significant control statements list resource.
    The URL of the registers resource for this company
    The URL of the resource.
    The previous names of this company.
    The date on which the company name ceased.
    The date from which the company name was effective.
    The previous company name
    The address of the company’s registered office.
    The first line of the address.
    The second line of the address.
    The care of name.
    The country.
    Possible values are:
    Wales
    England
    Scotland
    Great Britain
    Not specified
    United Kingdom
    Northern Ireland
    The locality e.g London.
    The post-office box number.
    The postal code e.g CF14 3UZ.
    The property name or number.
    The region e.g Surrey.
    Flag indicating registered office address as been replaced.
    The correspondence address of a Registered overseas entity
    The first line of the address.
    The second line of the address.
    The care of name.
    The country e.g. United Kingdom.
    The locality e.g London.
    The post-office box number.
    The postal code e.g CF14 3UZ.
    The region e.g Surrey.
    SIC codes for this company.
    The total count of super secure managing officers for a registered-overseas-entity.
    The type of the company.
    For enumeration descriptions see company_type section in the enumeration mappings
    Possible values are:
    private-unlimited
    ltd
    plc
    old-public-company
    private-limited-guarant-nsc-limited-exemption
    limited-partnership
    private-limited-guarant-nsc
    converted-or-closed
    private-unlimited-nsc
    private-limited-shares-section-30-exemption
    protected-cell-company
    assurance-company
    oversea-company
    eeig
    icvc-securities
    icvc-warrant
    icvc-umbrella
    registered-society-non-jurisdictional
    industrial-and-provident-society
    northern-ireland
    northern-ireland-other
    royal-charter
    investment-company-with-variable-capital
    unregistered-company
    llp
    other
    european-public-limited-liability-company-se
    uk-establishment
    scottish-partnership
    charitable-incorporated-organisation
    scottish-charitable-incorporated-organisation
    further-education-or-sixth-form-college-corporation
    registered-overseas-entity
    Flag indicating whether post can be delivered to the registered office.
    Link to the related resource
    Array of fields that have been changed by this event. Nested fields are referenced by dot notation e.g. links.document_metadata
    The date and time the data notification was raised
    The point-in-time identifier for this stream document. Use to re-establish a connection to the stream at this point.
    The type of event denoted by this stream document.
    Possible values are:
    changed
    deleted
    The ID of the resource.
    The type of resource contained within the stream document.
    Possible values are:
    company-profile#company-profile
    filing-history#filing-history
    The URI of the resource.
    Company Profile data
    {
       "Item": [
          {
             "resource_id": "08632930",
             "data.company_status": "active",
             "item": {
                "resource_id": "08632930",
                "resource_kind": "company-profile",
                "data": {
                   "jurisdiction": "england-wales",
                   "type": "ltd",
                   "sic_codes": [
                      "96090"
                   ],
                   "company_number": "08632930",
                   "last_full_members_list_date": "2015-08-01",
                   "confirmation_statement": {
                      "last_made_up_to": "2023-08-01",
                      "next_made_up_to": "2024-08-01",
                      "next_due": "2024-08-15"
                   },
                   "company_name": "AR ART CONSULTING LTD.",
                   "date_of_creation": "2013-08-01",
                   "registered_office_address": {
                      "address_line_1": "111a High Street",
                      "locality": "Harrow",
                      "care_of": "MUNNA MANJI ACCOUNTANTS",
                      "address_line_2": "Wealdstone",
                      "postal_code": "HA3 5DL",
                      "region": "Middlesex"
                   },
                   "company_status": "active",
                   "etag": "a642233483044571b496286e48ec1ff32dc94469",
                   "links": {
                      "self": "/company/08632930",
                      "persons_with_significant_control": "/company/08632930/persons-with-significant-control",
                      "filing_history": "/company/08632930/filing-history",
                      "officers": "/company/08632930/officers"
                   },
                   "accounts": {
                      "next_made_up_to": "2024-08-31",
                      "last_accounts": {
                         "period_end_on": "2023-08-31",
                         "made_up_to": "2023-08-31",
                         "period_start_on": "2022-09-01",
                         "type": "total-exemption-full"
                      },
                      "next_accounts": {
                         "period_end_on": "2024-08-31",
                         "due_on": "2025-05-31",
                         "period_start_on": "2023-09-01"
                      },
                      "next_due": "2025-05-31",
                      "accounting_reference_date": {
                         "month": "08",
                         "day": "31"
                      }
                   },
                   "can_file": {
                      "BOOL": true
                   }
                },
                "event": {
                   "published_at": "2024-01-23T22:21:05",
                   "type": "changed",
                   "timepoint": "72513023"
                },
                "resource_uri": "/company/08632930"
             }
          }
       ]
    }