Visokio website     Downloads     Video tutorials     KnowledgeBase  
Idea: Metadata extraction from IOK files? - Visokio Forums
Idea: Metadata extraction from IOK files?
  • VoteVote Up3Vote Down     steve March 22, 2012 6:05AM
    This idea concerns a programmatic way of extracting metadata from an IOK file by a 3rd party application, e.g. for use in a content management system.

    Since this discussion began, it has been partially implemented with the following architecture:

    • An Omniscope Server XML action describes the request for metadata.
      This can be executed in the scheduler or watch folder, or through command-line invocation. In future we plan to update the Server/Scheduler to also provide a network-based API (perhaps HTTP/REST), and perhaps a corresponding Java API for use in a 3rd-party application, potentially in a different process or on a different server, which provides a Java facade for that network API.

    • You "execute" that XML action on a licensed Omniscope Server, permissioned to access that IOK file (the IOK file must not be locked by another license).

    • The metadata is returned in a publicly defined XML format. This is either written to a specified file or to the process/console output.

    The metadata comprises the bold items in the list below, with non-bold indicating possible future additions, to be implemented depending on complexity and usefulness.

    • Data structure:
      • Column count, row count
      • Field names, types, null/non-null count, min/max/mean/median/sum/std dev/range, unique values for category fields, first 100 unique values for text fields, default function, formatting etc. and formulae
      • Formula analysis: dependencies tree, formula structure, recalc settings
      • Named queries

    • Tabs:
      • Names, advanced types
      • Titles, annotations, branding
      • Thumbnails (if introduced into IOK files in future), as base64-encoded data
      • Colour and style settings
      • Active filters

    • Views:
      • Types, aggregations, subsets
      • Headers, footers
      • Content view's content
      • Some view settings
      • In-use fields

    • Branding:
      • Cover, back and help pages
      • Branding/logo images, as base64-encoded data

    • Data sources:
      • Referenced sources (types, summaries, file/urls)
      • DataManager model (block names, types, dependencies and file paths)
      • Edit actions repeated on refresh (e.g. ad-hoc merges)
      • Global auto-refresh settings

    • Origin / other:
      • Creator details / version
      • Window size when saved, and restore option
      • Version compatibility requirements, if available
      • Viewer empowerments
      • File security options
      • Compression options, file size
      • Configured image sets, links & web services
      • Last edit/save dates
      • Publisher details


    Related ideas:
    • Extend the "Folder of file metadata" data source to include the more simpler information from the list, allowing you to obtain this data in IOK format for a batch of files (complete)
    • A GUI front-end for some of the analysis, e.g. formula dependency graph, in-use / not-used fields
    • A document containing the above information in report form

    See also: http://forums.visokio.com/discussion/comment/3364/#Comment_3364

    Please comment below.
  • 14 Comments
  •     steve March 27, 2012 6:24AM
    (Keesup, I have updated this to include the issues you phone in about - please comment anything further below.)
  •     steve March 28, 2012 10:48AM
    This is now implemented in 2.8, but this version isn't available yet (it will be in the next few weeks).

    See attached: an IOK file used as a test case, the Server XML action to generate the metadata, and the output.

    This output can be considered a schema by example, and is designed to be short and simple. Providing no changes are made to it in the next few weeks, it will be a frozen schema, and only additional nodes and attributes will be added in future to support further metadata extraction.
  •     steve March 28, 2012 10:51AM
    All bold items in the list above are included in this first version.
  •     steve April 10, 2012 12:19PM
    Keesup asked via email why we chose to capture the "first 100 unique values" for non-category text fields. This is because we wanted a simple approach which yielded *something* for non-category fields but didn't return gigabytes of "data" rather than "metadata". If you want to retrieve all values, use Aggregate then export the resulting column in CSV format. Or force Category.
  •     steve May 3, 2012 4:45AM
    This feature was updated today to include summary details of all sources; I have updated the bold items, above.
  •     steve May 3, 2012 4:57AM
    Attached, an updated example IOK file with the resulting metadata. This can be considered a schema by example.
    Attachments
    Metadata example.zip 94K
  •     andy_white May 16, 2012 8:35AM
    Please extend the model for metadata xml generation to include data manager config blocks.

    Specifically we want to know whether an iok file contains an output block and whether it is a batch output block.

    For batchoutput blocks, we would like the meta data of the batch output block (the path/name of the config file).

    The general functionality of extracting the meta data of the data manager iok file including all of the blocks and connections would also be useful.
  •     steve May 17, 2012 9:15AM
    As requested, we've extended the metadata to include the full DataManager model including file paths and block connections. The original post has been updated accordingly.

    Attached, an updated example IOK file with the resulting metadata. This can be considered a schema by example. This now includes explanatory comments included in the generated XML.
    Attachments
    Metadata example.zip 125K
  •     shaji_o October 28, 2012 6:49PM
    Any update with the GUI?

    For a client, we have set up a dashboard that collates many sources into a single view.
    The client would like to extract this data to be used for additional analysis and I was wondering could we setup a GUI that allows them to extract this?
  •     tjbate October 29, 2012 7:11AM
    Shaji - To clarify, you have created an Omniscope solution for a client, and the client wants the extracted meta-data for additional analysis? This would be an XML file and would not contain all the data...why do they need a GUI to see an XML file, and what additional analysis did they want to do? If instead they want the data rather than the meta-data, can we move this to a new thread?
  •     bfromson1 May 21, 2013 8:09AM
    We are trying to use the metadata extraction to generate the list of databases and tables used by all our reports. All our data is coming from SQL server tables.

    The block metadata gives the database name when custom sql is used, but doesn't indicate that it was a custom sql block.
    It gives the table name when a single table is selected but doesn't give the database name.

    Any chance of getting both Database name and database table listed, or Database name and SQL statement?

    Thanks


    Source report for single table selection:


    Database "DynamicMortgage"
    April Data
    Database table


    Source report for custom SQL query:


    Database "invldnm22sql3 / cdwmonthly (SQL Server)"
    Latest Data
    Database table

  •     steve May 23, 2013 4:51AM
    Bernard, we have updated 2.9 to include this. When we make 2.9 available to alpha partners we'll post an announcement.

    The output has been augmented as follows:


    ...
    <source>
    <!--A summary of this source.-->
    <summary>Database "localhost:6000 / mydb (MySQL)"</summary>
    <!--The name of this block. All block names are unique within DataManager, including grouped blocks. May have been renamed by the user.-->
    <blockName>localhost:6000 / mydb (MySQL)</blockName>
    <!--The type of source, operation or publisher.-->
    <type>Database table</type>
    <!--Type-specific details of this block, where provided-->
    <details>
    <e>
    <k>port</k>
    6000
    </e>
    <e>
    <k>host</k>
    localhost
    </e>
    <e>
    <k>sql</k>
    SELECT * FROM MyTable ORDER BY Something
    </e>
    <e>
    <k>database</k>
    mydb
    </e>
    </details>
    </source>
    ...
This discussion has been closed.
← All Discussions

Welcome!

It looks like you're new here. If you want to get involved, click one of these buttons!

Sign In Apply for Membership

Tagged