Embedding a binary file as an array in Firmware
Most small embedded devices do not have enough storage to justify a file system abstraction. So, if you have to store binary data like an image or a sound waveform, the most common method is to embed it in the code as a linear array. In this article, I show you a Python script that can convert any binary file into a C array encoded in hex.
The Need
I came across the problem in the very first major embedded software project I undertook. The hardware platform consisted of a Freescale microcontroller connected to a using some GPIO lines Xilinx FPGA. The FPGA booted up blank and had to be programmed from scratch. So, the FPGA configuration had to reside in the microcontroller flash ROM. The solution was to encode the FPGA configuration code as an integer array in C and store it with the firmware as data. Then, at bootup, you just march out the bytes over a GPIO line passing off as a serial data line with another GPIO line acting as the clock line.
In my latest project, I have to play back 2 sound waveforms through a TI TLV320 AIC26 Stereo Audio codec, and I’m adopting the same method – converting the 2 WAV files into integer array in the C source code. So, I wrote a moderately robust Python script to convert any file into a C file that can be compiled with your project.
Structure and Usage
The Python script is structured as a class – BinToArray
, with only one method in it – ConvertFileToArray
. Here is the pydoc documentation for the method:
ConvertFileToArray(strInFile, strOutFile, integerSize, ignoreBytes, endianNess) Reads binary file at location strInFile and writes out a C array of hex values Parameters - strInFile - Path and filename of binary file to convert strOutFile - Path and filename of output. Suggested extension is .c or .cpp integerSize - Size in bytes of output array elements. Array generated is always of type uint8, uint16, uint32. These types would need to be defined using typedef if they don't exist, or the user can replace the type name with the appropriate keyword valid for the compiler size conventions ignoreBytes - Number of bytes to ignore at the beginning of binary file. Helps strip out file headers and only encode the payload/data. endianNess - Only used for integerSize of 2 or 4. 'l' for Little Endian, 'b' for Big Endian |
ConvertFileToArray(strInFile, strOutFile, integerSize, ignoreBytes, endianNess) Reads binary file at location strInFile and writes out a C array of hex values Parameters - strInFile - Path and filename of binary file to convert strOutFile - Path and filename of output. Suggested extension is .c or .cpp integerSize - Size in bytes of output array elements. Array generated is always of type uint8, uint16, uint32. These types would need to be defined using typedef if they don't exist, or the user can replace the type name with the appropriate keyword valid for the compiler size conventions ignoreBytes - Number of bytes to ignore at the beginning of binary file. Helps strip out file headers and only encode the payload/data. endianNess - Only used for integerSize of 2 or 4. 'l' for Little Endian, 'b' for Big Endian
You pass in the path to the binary file and the output file name as the basic parameters. The next 3 parameters determine how the hex encoded array is generated from the binary file. ‘integerSize’ determines the word size of each element of the array – 8 bit (1), 16 bit (2) or 32 bit (4). For my current purpose (WAV file storage), I needed 16bit words, so I passed the value 2. ‘ignoreBytes’ is used to skip any header at the beginning of the file. You will have to know the binary header format and determine where the actual data starts. One of the most important things to get right is endian-ness. You have to know whether the binary file you are using has data encoded in little-endian or big-endian format. You also have to know how you are using the data in the embedded device. According to the intended use, you may have to flip the order of bytes in your firmware code. Refer to the binary file format, the target microcontroller architecture, communication bus protocol and the hardware peripheral architecture to determine the most efficient method for storing the array (i.e., whether to store the array in big- or little-endian format in the firmware).
The array in the output source code is always named dataArray – rename it to whatever you want. Also, the array is declared as types uint8, uint16 or uint32. These are not native C types, and you will have to define them using typedef to correspond to an unsigned integer of size 8 bits and so on. The native type corresponding to the pseudo types above will depend upon the compiler you are using, so look up the compiler programmer’s manual/user guide. Here is a sample for Microchip’s C30 compiler:
typedef unsigned long uint32; typedef unsigned int uint16; typedef unsigned char uint8; |
typedef unsigned long uint32; typedef unsigned int uint16; typedef unsigned char uint8;
The Code
Below is the Python script to perform the conversion. Feel free to use it as you please. Of course, if you do use it, use it at your own peril – I am not responsible if your billion dollar rocket explodes because you used this code to store its GPS based route and goofed up!
# Convert binary file to a hex encoded array for inclusion in C projects import os import struct import logging class BinToArray: def __init__(self): pass def ConvertFileToArray(self, strInFile, strOutFile, integerSize, ignoreBytes, endianNess): """ Reads binary file at location strInFile and writes out a C array of hex values Parameters - strInFile - Path and filename of binary file to convert strOutFile - Path and filename of output. Suggested extension is .c or .cpp integerSize - Size in bytes of output array elements. Array generated is always of type uint8, uint16, uint32. These types would need to be defined using typedef if they don't exist, or the user can replace the type name with the appropriate keyword valid for the compiler size conventions ignoreBytes - Number of bytes to ignore at the beginning of binary file. Helps strip out file headers and only encode the payload/data. endianNess - Only used for integerSize of 2 or 4. 'l' for Little Endian, 'b' for Big Endian """ # Check integerSize value if integerSize not in (1, 2, 4): logging.debug("Integer Size parameter must be 1, 2 or 4") return # endif # Open input file try: fileIn = open(strInFile, 'rb') except IOError, err: logging.debug("Could not open input file %s" % (strInFile)) return # end try # Open input file try: fileOut = open(strOutFile, 'w') except IOError, err: logging.debug("Could not open output file %s" % (strOutFile)) return # end try # Start array definition preamble inFileName = os.path.basename(strInFile) strVarType = "uint%d" % (integerSize * 8) fileOut.write("// Array representation of binary file %s\n\n\n" % (inFileName)) fileOut.write("%s dataArray[] = {\n" % strVarType) # Convert and write array into C file fileIn.seek(ignoreBytes) if integerSize == 1: bufChunk = fileIn.read(20) while bufChunk != '': fileOut.write(" ") for byteVal in bufChunk: fileOut.write("0x%02x, " % ord(byteVal)) # end for fileOut.write("\n") bufChunk = fileIn.read(20) # end while else: if endianNess == 'l' and integerSize == 2: endianFormatter = '<H' maxWordsPerLine = 10 elif endianNess == 'l' and integerSize == 4: endianFormatter = '<L' maxWordsPerLine = 6 elif endianNess == 'b' and integerSize == 2: endianFormatter = '>H' maxWordsPerLine = 10 elif endianNess == 'b' and integerSize == 4: endianFormatter = '>L' maxWordsPerLine = 6 # endif bufChunk = fileIn.read(integerSize) i = 0 fileOut.write(" ") while bufChunk != '': wordVal = struct.unpack(endianFormatter, bufChunk) if integerSize == 2: fileOut.write("0x%04x, " % wordVal) else: fileOut.write("0x%08x, " % wordVal) # endif i += 1 if i == maxWordsPerLine: fileOut.write("\n ") i = 0 # endif bufChunk = fileIn.read(integerSize) # end while # end if # Close array definition fileOut.write("\n };\n") fileIn.close() fileOut.close() if __name__ == "__main__": logging.basicConfig(level=logging.DEBUG) converter = BinToArray() converter.ConvertFileToArray( "Bang8KHz_Mono_16Bit.wav", "bang_uint32_lilend.c", 4, 16, 'l') |
# Convert binary file to a hex encoded array for inclusion in C projects import os import struct import logging class BinToArray: def __init__(self): pass def ConvertFileToArray(self, strInFile, strOutFile, integerSize, ignoreBytes, endianNess): """ Reads binary file at location strInFile and writes out a C array of hex values Parameters - strInFile - Path and filename of binary file to convert strOutFile - Path and filename of output. Suggested extension is .c or .cpp integerSize - Size in bytes of output array elements. Array generated is always of type uint8, uint16, uint32. These types would need to be defined using typedef if they don't exist, or the user can replace the type name with the appropriate keyword valid for the compiler size conventions ignoreBytes - Number of bytes to ignore at the beginning of binary file. Helps strip out file headers and only encode the payload/data. endianNess - Only used for integerSize of 2 or 4. 'l' for Little Endian, 'b' for Big Endian """ # Check integerSize value if integerSize not in (1, 2, 4): logging.debug("Integer Size parameter must be 1, 2 or 4") return # endif # Open input file try: fileIn = open(strInFile, 'rb') except IOError, err: logging.debug("Could not open input file %s" % (strInFile)) return # end try # Open input file try: fileOut = open(strOutFile, 'w') except IOError, err: logging.debug("Could not open output file %s" % (strOutFile)) return # end try # Start array definition preamble inFileName = os.path.basename(strInFile) strVarType = "uint%d" % (integerSize * 8) fileOut.write("// Array representation of binary file %s\n\n\n" % (inFileName)) fileOut.write("%s dataArray[] = {\n" % strVarType) # Convert and write array into C file fileIn.seek(ignoreBytes) if integerSize == 1: bufChunk = fileIn.read(20) while bufChunk != '': fileOut.write(" ") for byteVal in bufChunk: fileOut.write("0x%02x, " % ord(byteVal)) # end for fileOut.write("\n") bufChunk = fileIn.read(20) # end while else: if endianNess == 'l' and integerSize == 2: endianFormatter = '<H' maxWordsPerLine = 10 elif endianNess == 'l' and integerSize == 4: endianFormatter = '<L' maxWordsPerLine = 6 elif endianNess == 'b' and integerSize == 2: endianFormatter = '>H' maxWordsPerLine = 10 elif endianNess == 'b' and integerSize == 4: endianFormatter = '>L' maxWordsPerLine = 6 # endif bufChunk = fileIn.read(integerSize) i = 0 fileOut.write(" ") while bufChunk != '': wordVal = struct.unpack(endianFormatter, bufChunk) if integerSize == 2: fileOut.write("0x%04x, " % wordVal) else: fileOut.write("0x%08x, " % wordVal) # endif i += 1 if i == maxWordsPerLine: fileOut.write("\n ") i = 0 # endif bufChunk = fileIn.read(integerSize) # end while # end if # Close array definition fileOut.write("\n };\n") fileIn.close() fileOut.close() if __name__ == "__main__": logging.basicConfig(level=logging.DEBUG) converter = BinToArray() converter.ConvertFileToArray( "Bang8KHz_Mono_16Bit.wav", "bang_uint32_lilend.c", 4, 16, 'l')
Sample Output
Here is part of a sample hex encoded array that was generated using the script.
// Array representation of binary file Bang8KHz_Mono_16Bit.wav uint32 dataArray[] = { 0x00000010, 0x00010001, 0x00001f40, 0x00003e80, 0x00100002, 0x61746164, 0x00001520, 0x7ff41ba9, 0x36e05f6f, 0xb426e2ee, 0xd1b0e6b0, 0x4bcc30ba, 0x21d727da, 0xc2dcf050, 0xb132b174, 0xe702bfff, 0xff76efec, 0xfaaa0bd7, 0xf068f61e, 0x0bacf6d4, 0x4371246c, 0x07da135b, 0x14080a75, 0xfca82df9, 0x0321f673, 0x3c822af9, 0x1ddb2cd7, 0xffe70504, 0xed47e8d4, 0xdbdcf5eb, 0x0c3cebcc, 0x0bef180a, 0xeeadfe2a, 0xe8eadbe6, 0x0ffdfcb9, 0x04e1fc8a, 0xeba3f779, 0xee0ffdae, 0x14a00537, 0xfc0e16fe, 0xea1be839, 0xf989dc68, 0xfa3d04c0, 0xf778fd64, 0xfbeb1285, 0xca22d096, 0xf1cccdd8, 0x2d52149f, 0x11141b01, 0x12a50e1f, 0x15f81a12, 0xdc830325, 0xd6fcdd5f, 0xd0ebd67b, }; |
// Array representation of binary file Bang8KHz_Mono_16Bit.wav uint32 dataArray[] = { 0x00000010, 0x00010001, 0x00001f40, 0x00003e80, 0x00100002, 0x61746164, 0x00001520, 0x7ff41ba9, 0x36e05f6f, 0xb426e2ee, 0xd1b0e6b0, 0x4bcc30ba, 0x21d727da, 0xc2dcf050, 0xb132b174, 0xe702bfff, 0xff76efec, 0xfaaa0bd7, 0xf068f61e, 0x0bacf6d4, 0x4371246c, 0x07da135b, 0x14080a75, 0xfca82df9, 0x0321f673, 0x3c822af9, 0x1ddb2cd7, 0xffe70504, 0xed47e8d4, 0xdbdcf5eb, 0x0c3cebcc, 0x0bef180a, 0xeeadfe2a, 0xe8eadbe6, 0x0ffdfcb9, 0x04e1fc8a, 0xeba3f779, 0xee0ffdae, 0x14a00537, 0xfc0e16fe, 0xea1be839, 0xf989dc68, 0xfa3d04c0, 0xf778fd64, 0xfbeb1285, 0xca22d096, 0xf1cccdd8, 0x2d52149f, 0x11141b01, 0x12a50e1f, 0x15f81a12, 0xdc830325, 0xd6fcdd5f, 0xd0ebd67b, };
Epilogue
I had to toil through a lot of documentation and empirical verification to get a TLV320 AIC26 audio codec to work with a dsPIC33 microcontroller. I did not find any readily available reference implementation of the TLV320 AIC26 device for any microcontroller. So, I have good intentions of posting a code sample for any interested parties. If you are one such party, and you want me to post it, please let me know. Knowing that someone is waiting on me makes me get things done faster.
Thanks~!
I will use it very conveniently~!
Thanks, works like a charm: I especially like the endianess switch as my target platform is big-endian! 🙂