Python Structs and Stuff.

by Chanku

Two thing to note: this will be a bit rambly, and this post probably won’t be long. Lately I have been working with Python 3 structs… they are a bit weird. While I don’t know much C, they are apparently meant to be like Structures in C. Here are basically the things I have learned so far:

1) If you want to write an organized binary file format, you want to use Python 3’s structs. It’s really the only version that has ‘types’, so if you want to use an unsigned long you can use it here. This is also used for SNTP clients as well, allowing you to get the time.

2) They can be used to ‘pack’ basic data and transmit it over the network. So if you are working with networking you can have data that uses structs to help with that. So let’s say that you are making an application which takes commands, and may also take user input. When sending this:

s.send(("say-[user]-[text]").encode('utf-8'))

or

s.send(("/say-[user]-text]").encode('utf-8'))

You could do the following:

data = struct.pack("s L%ds L%ds" %(len(s1),len(s2)), "say".encode('utf-8'), len(s1) s1, len(s2), s2)
s.send(data)

While this may look harder (and reading the data would be a bit more complicated as you would have to read it a byte at a time), it allows users to use text without having to have delimiters. Also note that the code is doing the following: defining the format as a String, Unsigned Long, then defining a string with a value that is added when the code is run, then another Unsigned Long, and then a similar string. It then says to replace the first %d with the len(s1) and the second with the length of (s2).

This is done so it reads the proper amount, as the strings are variable in length. The Unsigned Longs define the length of the string so it’s easier to read. it then encodes, for example, the first string, then the len(s1) as the first long and s1 as the first string. It repeats it for s2. Also note that you need to convert s1 and s2 to bytes first, so you should use bytes(s1, ‘utf-8’) and bytes(s2, ‘utf-8’) (or whatever encoding you want to use) to cast them to bytes. Alternatively you can use .encode(‘utf-8’) on the strings instead.

(Unpacking that is a bit harder requiring byte counts, however for some things it’s worth it.)

3) They are a bit finniky. For example I ran into an issue when I was transmitting a packed structure over the internet. The layout was an Unsigned Long-String, Unsigned Long-String, Unsigned Long-String, Unsigned Integer, Unsigned Long, Unsigned Integer, and I ran into an issue. According to the Python 3 docs an Unsigned Long is 4 bytes in length, however the program decoding it was requesting 8. I’m still trying to debug it! I had previously tested this out without sending it over the network…

4) While nice to use, they can be a bit of a headache. See 3

5) If you want to write to a custom binary file format, you need to use these, so you better get comfortable with them.

In the end Python Structs are interesting, but at the same time irritating. If you are working with a custom protocol or something (or working with something like SNTP), then learning it is a must. Note that actual structs are not implemented in Python completely, there are some ways to get the functionality using classes and what not, but generally the structs module is only for packed data.


Comments

By: Glenda (Fri Aug 28 08:24:00 EDT 2015)
It seems you're working with a binary network protocol. Have you considered using Erlang or Elixir for that?