Normalizing FIX Logs
The FIX (Financial Information eXchange) protocol is a messaging standard used for financial markets and institutions to exchange financial messages securely and reliably. The FIX log file format can be complex and variable in structure, with different fields having different names and values.
In this article, we will explore how to normalize a FIX log file into a CSV (Comma Separated Values) format, complete with headers.
Introduction
Fix Log File Format
A typical FIX log file has the following structure:
8=FIX.4.2|9=435|35=8|34=8766|49=SENDERCOMPID|50=ET1|52=20230228-14:31:17.796|56=TARGETCOMPID|
8=FIX.4.2|9=435|35=8|34=8767|49=SENDERCOMPID|50=ET1|52=20230228-14:31:17.796|56=TARGETCOMPID|
8=FIX.4.2|9=435|35=8|34=8768|49=SENDERCOMPID|50=ET1|52=20230228-14:31:17.796|56=TARGETCOMPID|
In this format, 8 is the message type, 9 is the send party’s session identifier, and so on.
Header Lines
To begin normalizing the FIX log file into a CSV format, we first need to extract the header lines from the file. The header line will contain all the field names in the FIX protocol.
Regex for Header Line
The following regular expression can be used to match the header line:
# regexforheader = re.compile("(?<=\|)(.*?)(?==)")
This regular expression matches any characters (.*?) that come before a = (i.e., the start of a field) and then matches the actual value of the field, which is followed by another =.
Parsing Header Line
To parse the header line, we need to split it into individual fields:
# regexRowData = re.compile="(?<=\=)(.*?)(?=\|)"
This regular expression matches any characters (.*?) that come after an = (i.e., the value of a field) and then matches the next | (which separates fields).
Extracting Field Names
To extract the actual field names from the header line, we can use the following Python code:
# with open('in.csv') as f_in:
# Read the first line
data = f_in.readline()
# Extract the header row
header_row = data.strip().split('|')
# Create a dictionary to store the field names
field_names = {}
# Iterate over each field in the header row
for i, field in enumerate(header_row):
# Split the field at the '=' character and take the first part (i.e., the field name)
field_name = field.split('=')[0]
# Add the field name to the dictionary
field_names[field] = field
print(field_names)
Normalizing FIX Log File
Normalizing the FIX log file into a CSV format involves splitting each line of the log file at the | character, extracting the field values from the resulting strings using the regular expression, and then writing these values to a CSV file.
The following Python code snippet demonstrates how to achieve this:
# import pandas as pd
# Read the input CSV file
df = pd.read_csv('in.csv', sep='|', header=None)
# Stack the DataFrame to create a flat list of rows
stacked_df = df.stack()
# Split each row at the '=' character and take the first part (i.e., the field value)
split_values = stacked_df.str.split('=', n=1, expand=True).droplevel(1)
# Pivot the resulting DataFrame to create a table with headers
pivoted_df = split_values.pivot(columns=0, values=1)
# Sort the index of the pivoted DataFrame by integer values only
sorted_pivoted_df = pivoted_df.sort_index(axis=1, key=lambda x: x.astype(int))
# Write the sorted pivoted DataFrame to a new CSV file
sorted_pivoted_df.to_csv('out.csv', index=False)
Timings
Comparing the performance of pandas and csv versions for different data sizes.
# Import necessary libraries
import timeit
import pandas as pd
import csv
# Define input and output files
input_file = 'in.csv'
output_file = 'out.csv'
# Read input CSV file using pandas version
start_time_pandas = timeit.default_timer()
df = pd.read_csv(input_file, sep='|', header=None)
end_time_pandas = timeit.default_timer()
# Read input CSV file using csv version
start_time_csv = timeit.default_timer()
with open(input_file) as f_in, open(output_file, 'w') as f_out:
for line in csv.reader(f_in, delimiter='|'):
# ...
end_time_csv = timeit.default_timer()
# Calculate timings
timings_pandas = end_time_pandas - start_time_pandas
timings_csv = end_time_csv - start_time_csv
print('Pandas version:', timings_pandas)
print('CSV version:', timings_csv)
Conclusion
In this article, we demonstrated how to normalize a FIX log file into a CSV format using Python. We used regular expressions and pandas library for efficient processing of large datasets.
Last modified on 2023-10-14