Visokio website     Downloads     Video tutorials     KnowledgeBase  
Sources: Importing only the most recent file in a folder? - Visokio Forums
Sources: Importing only the most recent file in a folder?
  • briansj January 3, 2013 10:20AM
    I have files that I receive daily that have the file date in the filename. Is there a way to have Omniscope use the most recent file with the base name from that directory?

    For example, if I have:
    DailyFile 2013-01-01
    DailyFile 2012-12-30

    I would want to have DataManager pick up the most recent file with "DailyFile" in the file name.
  • 6 Comments
  •     paola January 3, 2013 11:51AM
    You can use Batch append source - please see the video below
    http://tc.visokio.com/videos/?name=DataManagerBatchAppend&title=Batch+append&lang=gb
  •     chris January 3, 2013 1:19PM
    Hi Brian,

    There is currently no way to read in the most recent file in a folder. As suggested by Paola, you could maybe use the batch append to read in ALL the files, and then use a combination of formulas and record filters to strip out the records from the old files.

    Would it be possible to modify your process so that you either delete all the old files or move them to an archive directory - so the directory will only contain the most recent file?

  • briansj January 4, 2013 3:02PM
    I will try that. Thanks for your help.
  •     tjbate January 8, 2013 7:45AM
    Brian - This is one of the reasons that it can be best to impement Omniscope solutions as multi-file chains, with different IOK files playing different roles. In this case, your incoming spreadsheet folder could be monitored by an IOK 'timeslice' file whose job it is to import regular updates, perform all error-checking and transformations relating only to this source, and save a series of time-stamped imported adn re-exported data sets to another folder you control and can establich the naming convensions for as much smaller and more efficient, pre-processed IOK data files. No need to keep all the old copies of someone else's CSV files in folders controlled by others, etc.

    The next IOK file in the chain (usually a multi-source integrating 'datamart' IOK file) can then look at the folder of pre-processed IOK 'timeslice' files as a batch append source, and using your own IOK file-naming convention will load only the current data faster and with more intelligence, and you can use seaparte IOK 'timeslice' source files for all the sources to be integrated.

    Remember, never try to do too much in just one IOK file...use a 'chain' of smaller, more specialised IOK files, with plenty of data typing, field duplications, value-deduplications, pre-aggregation and other pre-processing data checks being performed only in these specialised source-monitoring files.
  • briansj January 8, 2013 7:52AM
    That sounds exactly like what I'm trying to do. I am using various IOK files in my process, but I'm not familiar with how to have an IOK file check for updates, etc.

    Is there a video tutorial related to this functionality?:

    "In this case, your spreadsheet folder could be monitored by an IOK 'timeslice' file whose job it is to import regular updates, perform all error-checking and transformations relating only to this source, and save a series of time-stamped imported data sets to a folder in the much smaller and more efficient, pre-processed IOK data file format"

  •     tjbate January 8, 2013 11:28AM
    Brian - A typical Source folder-monitoring, time-slicing IOK file will have a single Batch Append Source block pointing to the folder where the data files are arriving. Depending on the naming convention of the arriving files, you may be able to narrow the files being imported/appended to, say a given year, using the wildcard file filtering, i.e. import all files containing *2012.csv will exclude any incoming data file that does not have 2012.csv in the filename.

    Each time a new file is added to the folder being monitored, if the refresh behaviour is set to auto-detect, the new incoming file will be automatically imported and appended to the pre-processing data set.

    Inside the pre-processing timeslicing file, you can add formula fields that classify and time-bucket each record/row the the incoming data, filtering out older records, standardising bucketing and marking only those records which should be classified as 'current' all using formulae logic. Automatic recalculation of the formulae whenever a new data timeslice arrives and is appended will result in some appended records from the previous period being re-classified from 'current' to 'not current'.

    A typical timeslicing IOK file has a data file Output block, which specifies a naming and timestamping convention that you control, with end result being a folder of IOK data files with many potential errors already flagged, consistent start and end periods for each file and a naming convention that suits your subsequent analysis and reporting.

    In this way, you move from an incoming pattern of large inefficient .csv files you do not control, to your own folder of compact fast-loading IOK files perfectly suited to what you are doing and totally under your control.

    One or more of these timesliced IOK file folders, with the appropriate wildcard file filter settings, i.e. *current.iok are then sources for the 'downstream' IOK files performing subsequent stages of your analysis and reporting.

    Using Server/ServerPlus Editions running 24/7 ensure that this process is totally automated.


Welcome!

It looks like you're new here. If you want to get involved, click one of these buttons!

Sign In Apply for Membership