Difficult formats

While it is excellent that we’re seeing greater online publication of statistics from government one disappointing factor is that usually they’re not published in a digitally processable means such that you could convert them to a spreadsheet for calculations.  

This is especially the case with this years employment brief where I’d like to do some solid analysis against the data but find myself facing hard options of purchasing expensive OCR (Optical Character Recognition) software or typing out every stat by hand. 

Sometimes the statistics are scanned to PDF and the OCR setting is turned on allowing you to copy and paste the text.  Through some tricky conversion via table imports in word/writer you can usually rework the data into a spreadsheet without a great deal of work, which is usually how I process the Caribbean Tourism Organization’s statistics.

Sadly, no such luck with the survey tabulations from the last few years employment surveys.  I’ve been hoping for a bit now to compile charts on job growth by industry to see where most of the growth has been coming from but no luck.

Time to write the stats department, hopefully they’ll be able and willing to be kind enough to forward the data in a digitally friendly format.

This entry was posted in Uncategorized by . Bookmark the permalink.

1 thought on “Difficult formats

  1. I have some OCR software somewhere, so give me a call and I’ll try and find it.
    Radical solution – Hire an outsourcing firm (ie. GlobalTask) and pay them to do the rote work of putting the numbers into a spreadsheet. Rates are about $15 an hour and they seem to work quickly if you give good instructions.

Comments are closed.