[Ometa] parsing data files?
Don Dwoske
don at loraxis.com
Thu Jul 16 12:15:53 PDT 2009
Where I work, we have a need to parse output files from various
instruments. The instruments are used to measure analytes in
biological samples - levels of Insulin, IL-6, etc. We'd like to read
these raw files and convert the information into an easier to consume
format - say XML or JSON.
Currently, the parsers we use are homegrown for each type of file,
using if/then statements, regular expressions, simple state machines,
etc... messy and inconsistent. So, we are thinking of rewriting the
slew of custom parsers using the same parsing toolkit or framework. I
was wondering if we could use OMeta for this task.
Most of these files have key/value pairs in the "header" of the file,
then the body is usually a list or grid of results. There are perhaps
100 different raw formats we'd eventually want to parse, many of which
have commonalities... but let's start with one.
Raw input:
--------------
MasterPlex QT Report By Analyte
Report Date: 12/9/2005
Run Date: 02/11/2005
Report Time: 1:07:21 PM
Run Time: 12:05:02 PM
Data File: Output.csv
Hardware Serial No. : XXXX2112
Plate Name: 2005February11
Operator:
MasterPlex QT Version: 2.0.7.103 adipokines.mlx
Analyst: A. Kay
Background: 8.5
Analyte Name: IL-6
Well Sample Name MFI
Concentration Unit Count
1 <unnamed> (11849) 18
8.08 pg/mL 57
2 <unnamed> (11850) 17
7.06 pg/mL 82
3 <unnamed> (11851) 21
11.10 pg/mL 69
4 <unnamed> (11852) 18
8.08 pg/mL 60
5 <unnamed> (11853) 19
9.09 pg/mL 73
6 <unnamed> (11854) 20
10.10 pg/mL 62
Potential output :
--------------------
<readout date=2005-0912' plate-name='2005February11' analyst='A. Kay' .... >
<test name='IL-6'>
<measurement well='1' sample-name='<unnamed> (11849)' mfi='18'
concentration='8.08' unit='pg/mL' count='57' />
<measurement well='2' sample-name='<unnamed> (11850)' mfi='17'
concentration='7.06' unit='pg/mL' count='82' />
...
or it's JSON equivalent.
suggestions for better / alternative approaches are also welcome.
Cheers,
Don
--
--------------------------------------
Donald Dwoske
http://don.dwoske.com/
More information about the OMeta
mailing list