Visokio website     Downloads     Video tutorials     KnowledgeBase  
DataManager: Custom Alerts for Data Quality? - Visokio Forums
DataManager: Custom Alerts for Data Quality?
  •     JfJf October 22, 2012 12:28PM
    Hi, - It would be very nice to add a block just for the purpose of data-quality. The Data Validation block is there, however it cant relate different blocks in to an operation, only fields data.

    For example, If I have a data stream lets say Shoe Models from sales, and I wanna cross it over with an static price excel sheet so I would be getting a merge between the Shoe Models & Shoe Prices to get my sales data. If a new model is released, it would be registered in the sales, but I need to include it In the Shoe prices so it would be successfully merged and counted; these unsuccessfully merged records would be a part of non-merged records, or in the case that I get a many:many relationship, it would be very useful to have a block or an extension of the data Validation block that would flag custom issues between blocks, so these errors could be fixed faster and easier.

    In this shoes example, the solution would be IF I get less records of a particular field in the the merge output than the amount of records from that same field in a previous block, then it would be flagged as [missing records] or simply error.

    It would be like adding an additional layer to the formulas that could include the Block name. So a formula which is usually [Function][Field][Record], could be [Function][Block][Field][Record]. I don't know if this is already possible, but it would be useful for data integrity and improved automation.

    Thank you, JfJf

    Juan FMV.
  • 3 Comments
  •     tjbate October 22, 2012 1:20PM
    Juan - Omniscope solutions are best developed as multi-file 'chains' of IOK files, each serving a defined role:

    1.) timeslicing and archiving IOK snapshots of remote sources,
    2.) pre-aggregating to reduce granularity/row count,
    3.) integrating multiple time-sliced, pre-aggregated sources (using merge/join & appends) into an in-memory 'datamart'
    4.) transforming/recasting datamart data sets in preparation for end-user visualisation
    5.) ...and finally, refreshing the contents of any number of personalised, templated, highly-visual, multi-tab, multi-view Report IOK 'mashboards'.

    Each of the 4 types of server-side files should all include data quality formulae, checks for duplication and branching to multiple flows and multiple output files. Visualisations in these 4 types of files should highlight data quality alerts. It is usually possible to test and trap (and alert) for many data quality issues using existing features and a few tricks.

    In the case you cite above, 2 or even 3 duplicate merge/join blocks could be used, each one in turn defining what to do with:
    1.) the matching records (flow on through to Datamart/Transform/Report IOK files)
    2.) the unmatched records from the left side of the merge,
    3.) and/or the unmatched records from the right side of the merge,

    Each of the 2-3 merge/join test blocks would define what to do with the matching and non-matching records e.g. channel the non-matching off to a separate IOK or database table output file for checking, with e-mail alerts triggered and copy of unmatching records attached.
  •     steve October 23, 2012 4:26AM
    Juan, multi table formulas are currently only possible by appending (with "source" field) all relevant blocks before validating. Then used SUBSET to reference different blocks by source field.
  •     JfJf October 23, 2012 6:53AM
    Thomas, Steve.

    Thanks again, will try a bit of both by using the source field option and automating an
    email output that would send the non matching records (or errors). The data-set doesn't have many sources, however is very large and needs constant update.

    Cheers,

    jfjf
    Juan FMV.

Welcome!

It looks like you're new here. If you want to get involved, click one of these buttons!

Sign In Apply for Membership