New file format(s)?

March 2, 2010 12:11PM

Probably something of a heresy, but...

We've started to do a lot of ETL-type work using Data Manager (in 2.6) and are very impressed by the potential that it offers for complex model management. My preferred approach is to build a 'Data Map' in Data Manager and conclude with an output step which produces a combined / cleaned / matched / transformed data file for use in Omniscope... this means that I have 3 files for a single analysis model:

1. An IOK which is only used to define and execute a 'Data Map'
2. An IOK file which is only used as a 'Data Store' for...
3. An IOK file which is used for the 'Analysis Model'

This approach gives a lot of fine grained control - for example:

* A 'developer' can build and test new versions of the 'Data Map', whilst an 'analyst' explores the results and prepares visualisations
* New formulae built into the 'Analysis Model' can be reviewed and taken back into the 'Data Map', where appropriate, to improve performance in the 'Analysis Model'
* Multiple end users can save personalised 'Analysis Models', all based on a consistent 'Data Store' - with central control over both the contents of the 'Data Store' and its rebuild schedule

All in all, this offers a very scalable approach for departmental/enterprise solutions, and works *very well* with our Omniscope Model Manager platform.

BUT

Storing all 3 of these as IOK files is very inefficient - we often have models which are 30MB+ in size, and files 1 & 3 don't really *need* the data to be contained within them (certainly model 1 never does - there is, of course, an argument that model 3 is more self-contained if it does retain its source data). It would be good to think about 3 file formats here, for example:

1. .ODM - Omniscope Data Map - a Data Manager model which holds all of the metadata, but doesn't store any data itself (always rebuilds itself from source data)
2. .ODS - Omniscope Data Store (or .ODB - Omniscope Data Base) - an IOK which contains data only, and can be loaded into Omniscope but has no visuals etc.
3. .IOL - "IOK Lite" - an IOK file which relies on a link to an ODS file for its source data

I'm sure this presents plenty of minor headaches, but it would help to encourage a best practice approach... unless there are significant flaws in the approach highlighted above?!

I'm interested in views of other 2.6DM experimenters here...

Guy

steve · March 2, 2010 6:07PM

Guy,

It's a good idea, although I don't think multiple file formats is the way to go. Your first file, though, does not need any data in it, assuming you never try to "View" a block. As for removing the data from the last file, while we haven't explicitly built in support for this, you can configure a file to "refresh on open" and providing you can trick Omniscope into refreshing it with empty data before you save, you'll have solved this problem. A "save empty" tickbox in the save dialog would make this easier.

Steve

March 2, 2010 10:29PM

Steve

The reason for suggesting different file formats - if the premise of the different uses is sound - is to support easy identification, management and cataloguing... dare I even suggest, open source (or at least public API) access to the file formats so that other tools (e.g. OMM!) can view and catalogue metadata; extremely useful in large Omniscope users where lots of IOK files exist (and could be content managed).

steve · March 3, 2010 10:31AM

We'll implement an open file format when there is significantly more demand for it largely due to resource constraints. I'm not convinced re the different file extensions, though.

Welcome!

Categories

Ideas Parade