Saturday, January 05, 2013

Parsing NMEA sentences from GPS with Python + PySerial

I've had a need to parse some NMEA output on my Raspberry Pi for a project I'm working on. In essence, it is pretty trivial to read from a serial port and parse ASCII data in any programming language, but to build some resiliency and efficiency in need to be handled with some care.

I happen to interfacing with an EM-408 GPS module with my Raspberry Pi off the GPIO Rx/Tx USART GPIO pins.

If you need a quick reference for NMEA sentence standard, go here.

Working with PySerial

Below is a quick and dirty code sample to interface with a USART/serial interface. The biggest thing to take into consideration is the 'timeout' option when creating your serial.Serial() object.

From my trial and error process, specifying timeout=0 (e.g. no blocking at all), while makes some sense in a GPS NMEA sentence polling application to return immediately and keep reading output, it causes serious amounts of CPU overhead (almost 100% utilization).

Eliminating the timeout altogether (wait forever) isn't a great idea either because your code will endlessly block/wait for output from the GPS module; not good if the module ever dies/power loss/etc.

Setting a gracious timeout of 5-10 seconds (e.g. timeout=5 or timeout=10) seems to help out as well and end up being the best of both worlds.

Here's a snipit of my class for the EM-408:

import serial 

class EM408GPS:
    def __init__(self, serialport, baudratespeed):

        self.gpsdevice = serial.Serial(port=serialport, baudrate=baudratespeed, timeout=5)

    def init(self):
        if self.isOpen():
            return True
        return False

    def open(self):
    def isOpen(self):
        return self.gpsdevice.isOpen()

That rough class sketch should be a perfect class wrapper to get you going with interfacing with a GPS via serial port or USART pins on the Pi.

Reading data with PySerial: Buffer or Newline?

This was the most interesting piece so far with. PySerial has a handful of methods for reading data that I tested with:

  • read(): This method reads the size of bytes from serial port input buffer.
  • readline(): This method reads serial port data down until a "\n" (newline) character is observed, then returns back a string.

    To be clever and witty, you'd generally want to use something like readline() since each NMEA sentence that it output to the serial port is terminated with a CRLF, right? I mean, why the hell wouldn't you? The answer is wrong the second you notice the very high CPU utilization happening when reading data.

    The good thing is this isn't a new problem, as it's a documented quite extensively on stack overflow amongst other places.

    The better way I found to attack this CPU utilization problem, is to take advantage of another method that PySerial offers:

  • inWaiting(): Return the number of bytes currently in the input buffer.

    ...and used this in combination with reading just '1' byte with read() then read whatever is left in PySerial's input buffer, then return for me to parse.

    Here's my class method called 'readBuffer()' partly solves this issue:

    def readBuffer(self):
                data =
                n = self.gpsdevice.inWaiting()
                if n:
                    data = data +
                return data
            except Exception, e:
                print "Big time read error, what happened: ", e

    The next part to deal with is now that we are reading everything out of the input buffer, our NMEA sentences aren't exactly in sentence order anymore.

    Now we have to leverage a bit of coding to properly find the start and end of a NMEA sentence. It's not too bad of an effort since we know a NMEA sentence starts with a '$' and ends with 'CRLF'. The key point is to find the CRLF in your read data buffer, then ensure to use the right end of that CRLF split (which is the start and some data of your other NMEA sentence) as the new start of the data buffer to construct the next line until you find the next CRLF, and so on...

    Here's the code snipit from my main() area that shows the initialization of the GPS and the read out of the NMEA sentences from my readBuffer() method:

    import re
    def main():
        device = EM408GPS("/dev/ttyAMA0", 4800)
        newdata = ""
        line = ""
        while device.isOpen():
             # If we have new data from the data CRLF split, then 
             # it's the start + data of our next NMEA sentence.  
             # Have it be the start of the new line
             if newdata: 
                 line = newdata
                 newdata = ""
             # Read from the input buffer and append it to our line 
             # being constructed
             line = line + device.readBuffer()
             # Look for  \x0d\x0a or \r\n at the end of the line (CRLF) 
             # after each input buffer read so we can find the end of our 
             # line being constructed
             if"\r\n", line):
                 # Since we found a CRLF, split it out
                 data, newdata = line.split("\r\n")
                 print "----" + str( + "----"
                 print data
                 # Reset our line constructer variable
                 line = ""

    Below is graphed output from 'vmstat' on the Raspberry Pi (in 2 second intervals) showing the performance benefit from using readBuffer() approach with read() + inWaiting() vs. using PySerial's readline():


    Anonymous said...

    This was exceptionally helpful, I'm currently developing a project on the Pi using GPS and this was something I completely didnt even think to do, and I was getting annoyed by readline()!

    Anonymous said...

    What happens if the data returned includes 2 CRLF? The split function will fail the way you have set it up. Is there some reason that your returned data stream would never be that long?

    Anonymous said...

    This write-up saved us many hours of work on our CNC controller code - thanks!

    Anonymous said...


    Very nice write up, thanks. I tried the above but basically found worse results with your approach than with pyserial. Would you be willing to post a small but complete code using your approach (vs pyserial's readline) that gives you the results plotted?