7 min read

Data Engineering as a Service

Financial data lies at the heart of every investment decision. As the finance industry evolves towards being more technologically advanced with AI-driven investment decisions becoming more commonplace, the ease at which financial data may be accessed and used is becoming increasingly important.  There is a significant lag, however, with major financial data providers still relying on outdated technologies, with little consideration of the end-user experience. This results in organisations spending thousands of man-hours to survey data sources, integrate with legacy data delivery mechanisms, and follow laborious manual processes to get their data.

Let’s consider an extremely simple task. An Alis Exchange client would like to get historical price and market capitalisation data for Apple, Microsoft, Tesla using their Bloomberg Data License subscription. Simple, right? Not so fast.

At Alis Exchange we believe the answer should be a resounding yes. The sad truth of the matter, however, is that the current state of financial data providers left us baffled when attempting to answer this seemingly simple question… We found that legacy systems and poorly designed data delivery pipelines could make even such a rudimentary request feel like a day-job.

Getting data from Bloomberg Data Licence

Bloomberg Data License is the Rolls-Royce of market data providers in terms of coverage, quality and depth. It also provides a useful case study of the many hoops, contortions and escape room like puzzles that consumers of financial data are expected to solve - even from the very best of these types of systems.

The request protocol involves a cryptic request message in the form of a .txt file, that is uploaded to an SSH File Transfer Protocol (SFTP) server via a whitelisted IP address, an undisclosed waiting period and, finally, a .txt response file containing your data in yet another a cryptic format. If you find that sounds slightly complicated, not only are you right, but you’re not yet enlightened to the hidden complexities of this task. Let’s break it up into steps… (to skip the tedious explanation of the status-quo and see our solution, skip ahead to Data Engineering as a Service)

Step 1: Construct the request file

To start off with, you are required to manually construct a text file in a very particular format, indicating which Instruments and Fields you might want to get data for. For our example, we would like to request price and market cap data for Apple, Microsoft and Tesla from 1 Jan 2000 up to today (5 October 2022).

The request file, as seen in the image below is a pure text file with very strict formatting rules, including the use of a START-OF-FILE, START-OF-FIELDS and START-OF-DATA lines, nudging you to start filling in text, with the accompanying END statements, signifying you can stop now.

Outline of a Bloomberg request text file

If a single one of these lines are out of place, or a single typo is present in the request, the entire request becomes void (but this would only become apparent in Step 4). With no linting implemented or documentation to guide you, you are left with a trial-and-error approach until the glorious day when you guessed the correct sequence of numbers and letters.

Example of a Bloomberg request text file

Step 2: Drop the request file on an SFTP server

The next step in the process involves making a Secure Shell (SSH) connection to Bloomberg’s remote server, in order to transfer your request file. Moreover, you can’t just do this from any old computer. Bloomberg requires any requests to be made using an IP address that they have officially white-listed. So this either implies that you either need to spin up a Virtual Machine (VM) or walk to your company’s dedicated “Bloomberg request machine”.

Generally, open-source FTP applications such as FileZilla may be used for dropping the request files. After searching for the email where your colleague sent you the SFTP credentials, an SSH connection is established and you may now manually upload your request file.

Step 3: Wait

That’s all you can do at this point. Go make yourself a coffee. Go reply to that email you’ve been ignoring. Wait for an undisclosed amount of time while Bloomberg processes your request. Every now and then, you can run back to your dedicated Bloomberg computer, SSH back onto the SFTP, and check if your data is there yet.

Step 4: Decipher the data on the reply file

The response came from Bloomberg! Success! Right?

Well, as mentioned in Step 1, your request file might have some typos or missing information, which will only become apparent by seeing Bloomberg’s response, illuminating your shortcomings. So in the trial-and-error cycle, this would be where you get the (albeit quite cryptic) error.

Example of a Bloomberg response text file

Let’s take the optimistic view and assume that your first attempt resulted in a positive response from Bloomberg. Surely now we’ve come to the end of our journey? Well, not quite yet. Next, you have to deal with yet another text file and figure out how this response data may be rendered useful. The infamous START-OF-FILE, START-OF-FIELDS and START-OF-DATA make their come-back, with their END counterparts indicating to us that we have come to the end of the response.

Characters such as | are used to separate the effective date and its corresponding historical value. Once you’ve deciphered the reply file, you can spend your time parsing the file, converting it to the relevant data types, and copying these into your data warehouse.

Step 5: Go home

You’ve done enough work for today. That was exhausting.

Introducing Data Engineering as a Service (DEaaS)

At Alis Exchange, we often have moments where we think: “there has to be an easier way to do this”. This is often followed by the dreaming phase, which is characterised by the phrase “imagine a world where {insert the perfect solution}”. And finally, we immerse ourselves in designing the solution, aiming wholeheartedly at making our client’s lives as easy as possible.

Imagine a world where:

  • Onboarding a data source took minutes not months
  • Data synchronisation never failed
  • Data sources were pre-mapped to a common identifier
  • Data could be consumed safely by developers in their programming language of choice without leaving their IDE

Our Data Engineering as a Service offering (DEaaS), Alis DE, turns that aspiration into a realisable reality.

A core pillar of the DE philosophy is the idea of a proxy on existing methods of data providers. We manage all the complexities of the data providers, and wrap it with a simple-to-use interface, significantly alleviating the effort required to get data into your system.

Let’s put the words into action by showing you the process of making the exact same request as above using DE… (The examples shown are in Go, although all methods are implemented and available in multiple supported languages.)

Step 1: Instantiate your Bloomberg client

Once you’re signed up as a DE client, full access to the Bloomberg methods is as simple as establishing a DE Bloomberg client in your favourite programming language. This is a once-off process, which requires minimal effort. Read more about how to make your first request to an Alis Exchange service in your favourite language.

Step 2: Make the request

Once the client is established, use the automatically generated protobufs to guide the process of constructing your request, and interpreting your response. Each request field is well defined, with documentation providing more context on how to populate the field.

DE is a cloud-native service, which ensures a scalable, readily accessible and secure service. The request will be reliably handled, without a single concern over using the correct, whitelisted computer (this is all managed in the background by DE).

// Instruments for which to request historical data
	instruments := []*pb.GetHistoryRequest_Instrument{
		{
			Id: "BBG000B9XRY4",
		},
		{
			Id: "BBG000BPH459",
		},
		{
			Id: "BBG000N9MNX3",
		},
	}
// Historical data fields requested for all instruments
fields := []*pb.GetHistoryRequest_Field{
	{
		FieldMnemonic: "PX_LAST",
		Type:          pb.FieldType_BB_DECIMAL,
	},
	{
		FieldMnemonic: "CUR_MKT_CAP",
		Type:          pb.FieldType_BB_DECIMAL,
	},
	{
		FieldMnemonic: "PX_HIGH",
		Type:          pb.FieldType_BB_DECIMAL,
	},
}

// Construct the GetHistory request message
req := pb.GetHistoryRequest{
	Headers: &pb.GetHistoryRequest_Headers{
		Firm:        "dl2345",
		ProgramFlag: pb.ProgramFlag_ADHOC,
		StartDate: &date.Date{
			Year:  2022,
			Month: 1,
			Day:   1,
		},
		EndDate: &date.Date{
			Year:  2022,
			Month: 10,
			Day:   5,
		},
	},
	Fields:            fields,
	Instruments:       instruments,
	FtpUploadRequired: true,
}

// Make the GetHistory request
res, err := BloombergClient.GetHistory(context.Background(), &req)
if err != nil {
	// TODO: Handle error
}

Step 3: Make your code do the waiting for you

Now that your request has been made, we wait for the response from Bloomberg. This time, however, DE will poll the SFTP server for you, and parse your response as soon as it is available. No checking on the FTP, and certainly no manual interpreting of the returned file.

// Wait for long-running operation to complete 
// (i.e. wait for response from Bloomberg)
//
// DE manages the polling of the SFTP, mapping the returned file 
// to the original request, and parsing the response file into a 
// well-defined response type.
resOp, err := wait(context.Background(), res)
if err != nil {
    // TODO: Handle error
}

// Marshal the long-running operation into the expected 
// response format
data := &pb.GetHistoryResponse{}
err = anypb.UnmarshalTo(resOp.GetResponse(), data,
proto.UnmarshalOptions{})
if err != nil {
    // TODO: Handle error
}

// Iterate through instruments returned by the GetHistory method
for _, instrument := range data.GetData().GetInstruments() {
    // Iterate through each record returned by the GetHistory method
    fmt.Printf("Instrument %s: Field: %s # Historical records: %v", 
        instrument.GetId(), instrument.GetFieldMnemonic(),
        len(instrument.GetHistoricalRecords())),
}

You are left with a clean response, with well-defined fields, type-safe values, and all within the confines of your IDE.

Conclusion

It is time to help along the evolution of our investment processes. Data can no longer be accessed by constructing obscure request files, performing manual processes, and interpreting unclear response files. Experience having your data to be brought to you, consumed in your language and seamlessly incorporated into your pipelines. Experience having the power of DE as a service.