Normalizing FIX Log Files: A Step-by-Step Guide to Converting FIX Protocols into CSV Format

Normalizing FIX Logs

The FIX (Financial Information eXchange) protocol is a messaging standard used for financial markets and institutions to exchange financial messages securely and reliably. The FIX log file format can be complex and variable in structure, with different fields having different names and values.

In this article, we will explore how to normalize a FIX log file into a CSV (Comma Separated Values) format, complete with headers.

Introduction

Fix Log File Format

A typical FIX log file has the following structure:

8=FIX.4.2|9=435|35=8|34=8766|49=SENDERCOMPID|50=ET1|52=20230228-14:31:17.796|56=TARGETCOMPID|
8=FIX.4.2|9=435|35=8|34=8767|49=SENDERCOMPID|50=ET1|52=20230228-14:31:17.796|56=TARGETCOMPID|
8=FIX.4.2|9=435|35=8|34=8768|49=SENDERCOMPID|50=ET1|52=20230228-14:31:17.796|56=TARGETCOMPID|

In this format, 8 is the message type, 9 is the send party’s session identifier, and so on.

Header Lines

To begin normalizing the FIX log file into a CSV format, we first need to extract the header lines from the file. The header line will contain all the field names in the FIX protocol.

Regex for Header Line

The following regular expression can be used to match the header line:

# regexforheader = re.compile("(?&lt;=\|)(.*?)(?==)")

This regular expression matches any characters (.*?) that come before a = (i.e., the start of a field) and then matches the actual value of the field, which is followed by another =.

Parsing Header Line

To parse the header line, we need to split it into individual fields:

# regexRowData = re.compile="(?&lt;=\=)(.*?)(?=\|)"

This regular expression matches any characters (.*?) that come after an = (i.e., the value of a field) and then matches the next | (which separates fields).

Extracting Field Names

To extract the actual field names from the header line, we can use the following Python code:

# with open('in.csv') as f_in:
    # Read the first line
    data = f_in.readline()
    
    # Extract the header row
    header_row = data.strip().split('|')
    
    # Create a dictionary to store the field names
    field_names = {}
    
    # Iterate over each field in the header row
    for i, field in enumerate(header_row):
        # Split the field at the '=' character and take the first part (i.e., the field name)
        field_name = field.split('=')[0]
        
        # Add the field name to the dictionary
        field_names[field] = field
        
    print(field_names)

Normalizing FIX Log File

Normalizing the FIX log file into a CSV format involves splitting each line of the log file at the | character, extracting the field values from the resulting strings using the regular expression, and then writing these values to a CSV file.

The following Python code snippet demonstrates how to achieve this:

# import pandas as pd

# Read the input CSV file
df = pd.read_csv('in.csv', sep='|', header=None)

# Stack the DataFrame to create a flat list of rows
stacked_df = df.stack()

# Split each row at the '=' character and take the first part (i.e., the field value)
split_values = stacked_df.str.split('=', n=1, expand=True).droplevel(1)

# Pivot the resulting DataFrame to create a table with headers
pivoted_df = split_values.pivot(columns=0, values=1)

# Sort the index of the pivoted DataFrame by integer values only
sorted_pivoted_df = pivoted_df.sort_index(axis=1, key=lambda x: x.astype(int))

# Write the sorted pivoted DataFrame to a new CSV file
sorted_pivoted_df.to_csv('out.csv', index=False)

Timings

Comparing the performance of pandas and csv versions for different data sizes.

# Import necessary libraries
import timeit
import pandas as pd
import csv

# Define input and output files
input_file = 'in.csv'
output_file = 'out.csv'

# Read input CSV file using pandas version
start_time_pandas = timeit.default_timer()
df = pd.read_csv(input_file, sep='|', header=None)
end_time_pandas = timeit.default_timer()

# Read input CSV file using csv version
start_time_csv = timeit.default_timer()
with open(input_file) as f_in, open(output_file, 'w') as f_out:
    for line in csv.reader(f_in, delimiter='|'):
        # ...
end_time_csv = timeit.default_timer()

# Calculate timings
timings_pandas = end_time_pandas - start_time_pandas
timings_csv = end_time_csv - start_time_csv

print('Pandas version:', timings_pandas)
print('CSV version:', timings_csv)

Conclusion

In this article, we demonstrated how to normalize a FIX log file into a CSV format using Python. We used regular expressions and pandas library for efficient processing of large datasets.

Last modified on 2023-10-14