Most small embedded devices do not have enough storage to justify a file system abstraction.  So, if you have to store binary data like an image or a sound waveform, the most common method is to embed it in the code as a linear array. In this article, I show you a Python script that can convert any binary file into a C array encoded in hex.

The Need

I came across the problem in the very first major embedded software project I undertook. The hardware platform consisted of a Freescale microcontroller connected to a using some GPIO lines Xilinx FPGA. The FPGA booted up blank and had to be programmed from scratch. So, the FPGA configuration had to reside in the microcontroller flash ROM. The solution was to encode the FPGA configuration code as an integer array in C and store it with the firmware as data. Then, at bootup, you just march out the bytes over a GPIO line passing off as a serial data line with another GPIO line acting as the clock line.

In my latest project, I have to play back 2 sound waveforms through a TI TLV320 AIC26 Stereo Audio codec, and I’m adopting the same method – converting the 2 WAV files into integer array in the C source code. So, I wrote a moderately robust Python script to convert any file into a C file that can be compiled with your project.

Structure and Usage

The Python script is structured as a class – BinToArray, with only one method in it – ConvertFileToArray. Here is the pydoc documentation for the method:

ConvertFileToArray(strInFile, strOutFile, integerSize, ignoreBytes, endianNess)
Reads binary file at location strInFile and writes out a C array of hex values
    Parameters -
        strInFile - Path and filename of binary file to convert
        strOutFile - Path and filename of output. Suggested extension is .c or .cpp
        integerSize - Size in bytes of output array elements. Array generated is always
            of type uint8, uint16, uint32. These types would need to be defined using
            typedef if they don't exist, or the user can replace the type name with the
            appropriate keyword valid for the compiler size conventions
        ignoreBytes - Number of bytes to ignore at the beginning of binary file. Helps
            strip out file headers and only encode the payload/data.
        endianNess - Only used for integerSize of 2 or 4. 'l' for Little Endian, 'b' for
            Big Endian

You pass in the path to the binary file and the output file name as the basic parameters. The next 3 parameters determine how the hex encoded array is generated from the binary file. ‘integerSize’ determines the word size of each element of the array – 8 bit (1), 16 bit (2) or 32 bit (4). For my current purpose (WAV file storage), I needed 16bit words, so I passed the value 2. ‘ignoreBytes’ is used to skip any header at the beginning of the file. You will have to know the binary header format and determine where the actual data starts. One of the most important things to get right is endian-ness. You have to know whether the binary file you are using has data encoded in little-endian or big-endian format. You also have to know how you are using the data in the embedded device. According to the intended use, you may have to flip the order of bytes in your firmware code. Refer to the binary file format, the target microcontroller architecture, communication bus protocol and the hardware peripheral architecture to determine the most efficient method for storing the array (i.e., whether to store the array in big- or little-endian format in the firmware).
The array in the output source code is always named dataArray – rename it to whatever you want. Also, the array is declared as types uint8, uint16 or uint32. These are not native C types, and you will have to define them using typedef to correspond to an unsigned integer of size 8 bits and so on. The native type corresponding to the pseudo types above will depend upon the compiler you are using, so look up the compiler programmer’s manual/user guide. Here is a sample for Microchip’s C30 compiler:

typedef unsigned long	uint32;
typedef unsigned int	uint16;
typedef unsigned char	uint8;

The Code

Below is the Python script to perform the conversion. Feel free to use it as you please. Of course, if you do use it, use it at your own peril – I am not responsible if your billion dollar rocket explodes because you used this code to store its GPS based route and goofed up!

# Convert binary file to a hex encoded array for inclusion in C projects
 
import os
import struct
 
import logging
 
class BinToArray:
    def __init__(self):
        pass
 
    def ConvertFileToArray(self, strInFile, strOutFile, integerSize, ignoreBytes, endianNess):
        """ Reads binary file at location strInFile and writes out a C array of hex values
            Parameters - 
                strInFile - Path and filename of binary file to convert
                strOutFile - Path and filename of output. Suggested extension is .c or .cpp
                integerSize - Size in bytes of output array elements. Array generated is always
                    of type uint8, uint16, uint32. These types would need to be defined using
                    typedef if they don't exist, or the user can replace the type name with the
                    appropriate keyword valid for the compiler size conventions
                ignoreBytes - Number of bytes to ignore at the beginning of binary file. Helps
                    strip out file headers and only encode the payload/data.
                endianNess - Only used for integerSize of 2 or 4. 'l' for Little Endian, 'b' for
                    Big Endian
        """
        # Check integerSize value
        if integerSize not in (1, 2, 4):
            logging.debug("Integer Size parameter must be 1, 2 or 4")
            return
        # endif
        # Open input file
        try:
            fileIn = open(strInFile, 'rb')
        except IOError, err:
            logging.debug("Could not open input file %s" % (strInFile))
            return
        # end try
        # Open input file
        try:
            fileOut = open(strOutFile, 'w')
        except IOError, err:
            logging.debug("Could not open output file %s" % (strOutFile))
            return
        # end try
        # Start array definition preamble
        inFileName = os.path.basename(strInFile)
        strVarType = "uint%d" % (integerSize * 8)
        fileOut.write("// Array representation of binary file %s\n\n\n" % (inFileName))
        fileOut.write("%s dataArray[] = {\n" % strVarType)
        # Convert and write array into C file
        fileIn.seek(ignoreBytes)
        if integerSize == 1:
            bufChunk = fileIn.read(20)
            while bufChunk != '':
                fileOut.write("        ")
                for byteVal in bufChunk:
                    fileOut.write("0x%02x, " % ord(byteVal))
                # end for
                fileOut.write("\n")
                bufChunk = fileIn.read(20)
            # end while
        else:
            if   endianNess == 'l' and integerSize == 2:
                endianFormatter = '<H'
                maxWordsPerLine = 10
            elif endianNess == 'l' and integerSize == 4:
                endianFormatter = '<L'
                maxWordsPerLine = 6
            elif endianNess == 'b' and integerSize == 2:
                endianFormatter = '>H'
                maxWordsPerLine = 10
            elif endianNess == 'b' and integerSize == 4:
                endianFormatter = '>L'
                maxWordsPerLine = 6
            # endif
            bufChunk = fileIn.read(integerSize)
            i = 0
            fileOut.write("        ")
            while bufChunk != '':
                wordVal = struct.unpack(endianFormatter, bufChunk)
                if integerSize == 2:
                    fileOut.write("0x%04x, " % wordVal)
                else:
                    fileOut.write("0x%08x, " % wordVal)
                # endif
                i += 1
                if i == maxWordsPerLine:
                    fileOut.write("\n        ")
                    i = 0
                # endif
                bufChunk = fileIn.read(integerSize)
            # end while
        # end if
        # Close array definition
        fileOut.write("\n    };\n")
        fileIn.close()
        fileOut.close()
 
if __name__ == "__main__":
    logging.basicConfig(level=logging.DEBUG)
    converter = BinToArray()
    converter.ConvertFileToArray(
        "Bang8KHz_Mono_16Bit.wav", "bang_uint32_lilend.c", 4, 16, 'l')

Sample Output

Here is part of a sample hex encoded array that was generated using the script.

// Array representation of binary file Bang8KHz_Mono_16Bit.wav
uint32 dataArray[] = {
        0x00000010, 0x00010001, 0x00001f40, 0x00003e80, 0x00100002, 0x61746164, 
        0x00001520, 0x7ff41ba9, 0x36e05f6f, 0xb426e2ee, 0xd1b0e6b0, 0x4bcc30ba, 
        0x21d727da, 0xc2dcf050, 0xb132b174, 0xe702bfff, 0xff76efec, 0xfaaa0bd7, 
        0xf068f61e, 0x0bacf6d4, 0x4371246c, 0x07da135b, 0x14080a75, 0xfca82df9, 
        0x0321f673, 0x3c822af9, 0x1ddb2cd7, 0xffe70504, 0xed47e8d4, 0xdbdcf5eb, 
        0x0c3cebcc, 0x0bef180a, 0xeeadfe2a, 0xe8eadbe6, 0x0ffdfcb9, 0x04e1fc8a, 
        0xeba3f779, 0xee0ffdae, 0x14a00537, 0xfc0e16fe, 0xea1be839, 0xf989dc68, 
        0xfa3d04c0, 0xf778fd64, 0xfbeb1285, 0xca22d096, 0xf1cccdd8, 0x2d52149f, 
        0x11141b01, 0x12a50e1f, 0x15f81a12, 0xdc830325, 0xd6fcdd5f, 0xd0ebd67b, 
    };

Epilogue

I had to toil through a lot of documentation and empirical verification to get a TLV320 AIC26 audio codec to work with a dsPIC33 microcontroller. I did not find any readily available reference implementation of the TLV320 AIC26 device for any microcontroller. So, I have good intentions of posting a code sample for any interested parties. If you are one such party, and you want me to post it, please let me know. Knowing that someone is waiting on me makes me get things done faster.