[Ometa] parsing data files?

Don Dwoske don at loraxis.com
Thu Jul 16 12:15:53 PDT 2009


Where I work, we have a need to parse output files from various
instruments.  The instruments are used to measure analytes in
biological samples - levels of Insulin, IL-6, etc.  We'd like to read
these raw files and convert the information into an easier to consume
format - say XML or JSON.

Currently, the parsers we use are homegrown for each type of file,
using if/then statements, regular expressions, simple state machines,
etc... messy and inconsistent.   So, we are thinking of rewriting the
slew of custom parsers using the same parsing toolkit or framework.  I
was wondering if we could use OMeta for this task.

Most of these files have key/value pairs in the "header" of the file,
then the body is usually a list or grid of results. There are perhaps
100 different raw formats we'd eventually want to parse, many of which
have commonalities... but let's start with one.

Raw input:
--------------
                                         MasterPlex QT Report By Analyte
    Report Date: 12/9/2005
      Run Date: 02/11/2005
    Report Time: 1:07:21 PM
      Run Time: 12:05:02 PM
    Data File: Output.csv
      Hardware Serial No. : XXXX2112
    Plate Name: 2005February11
      Operator:
    MasterPlex QT Version: 2.0.7.103 adipokines.mlx
                          Analyst: A. Kay

                           Background: 8.5
                                                  Analyte Name: IL-6
       Well                  Sample Name                      MFI
       Concentration           Unit   Count
         1             <unnamed> (11849)                       18
                8.08          pg/mL      57
         2             <unnamed> (11850)                       17
                7.06          pg/mL      82
         3             <unnamed> (11851)                       21
               11.10          pg/mL      69
         4             <unnamed> (11852)                       18
                8.08          pg/mL      60
         5             <unnamed> (11853)                       19
                9.09          pg/mL      73
         6             <unnamed> (11854)                       20
               10.10          pg/mL      62

Potential output :
--------------------

<readout date=2005-0912' plate-name='2005February11' analyst='A. Kay' ....  >
   <test name='IL-6'>
      <measurement well='1' sample-name='<unnamed> (11849)'  mfi='18'
concentration='8.08' unit='pg/mL' count='57' />
      <measurement well='2' sample-name='<unnamed> (11850)'  mfi='17'
concentration='7.06' unit='pg/mL' count='82' />
 ...

or it's JSON equivalent.

suggestions for better / alternative approaches are also welcome.

Cheers,
  Don

-- 
--------------------------------------
Donald Dwoske
http://don.dwoske.com/



More information about the OMeta mailing list