Performance: Batch append operations with many files?

adam_varney · January 17, 2012 5:24AM

I have been using the Batch Append function in Omniscope 2.7 and currently it is required to append 150 Excel templates into one table, however this functionaly does exist and currently work in omniscope the speed in which it takes this is over an hour, in the future it is looking like 400+ templates will be needed to batch appended and have other opertations carried out on them. Is thier a better way or a solution to increase the performance of this. I am currently using a 64 bit system with 8 Gb RAM so my machine should be able to handle it.

Thanks

chris · January 17, 2012 5:28AM

Hi Adam,

A couple of quick questions:

- How big are the Excel files your trying to append? What is the size (rows/columns) of the finished data. What are you doing with the data after appending?

- Which "Reader" is selected (this is shown in the "Worksheet" section of the batch append block?

Chris

adam_varney · January 17, 2012 5:37AM

Hi Chris,

Each excel template is no more than 500kb and it is loading around 250 rows from each and 100 columns

After appending all the data we are merging the results with another database.

the reader selected is Excel as Text(if available)

Thanks

Adam

chris · January 17, 2012 5:51AM

Hi,

It would be worth changing the reader to "Omniscope". Both the "Excel as text" and "Excel direct" use Excel to read the data. The downside of this is that Excel is opened and closed for every file that you want to append, increasing the time to append the files. We could add an optimisation here where if you are batch appending only Excel files then Excel is kept open for the duration of the append. The Omniscope reader does not use Excel, so should be quicker. Can you try switching to the Omniscope reader and let me know if this is any quicker?

Chris

adam_varney · January 17, 2012 5:53AM

Ah this would explain why excel becomes impossible to use when i am running the append i will give this a try and let you know if it speeds up the process.

Thanks!

chris · January 17, 2012 6:37AM

Ok great. A quick guideline to the different readers, let us know if you have any further questions:

Excel as text

Omniscope starts Excel and opens the file in Excel. Omniscope then tells Excel to save the file as CSV and the CSV file is read into Omniscope.

This reader only works on Windows and requires Excel to be installed. It is the most efficient method for reading large Excel files. Excel is opened/closed for every file. It provides better support for some data-format's than the Excel reader.

Excel direct

Omniscope starts Excel and opens the file in Excel. Omniscope then reads the data directly from Excel.

This reader only works on Windows and requires Excel to be installed. It is fairly slow, however because the data is read directly from Excel it is sometimes the best method to use for some data formats.

Omniscope

Omniscope reads directly from the Excel file.

You do not need to be running on Windows or have Excel installed to use this reader.

This reader requires more memory since the entire file is loaded into memory prior to reading.

adam_varney · January 17, 2012 6:50AM

Hi Chris,

Thanks for the information on the different readers, however after trying the omniscope method it seems to load the data differently resulting in different results.

using "excel as text" it loads 99 fields and 31533 records. then with the "omniscope" reader it loads 70 fields and 31527 records.

I don't understand why depending on the reader option would result in different data being loaded?

Thanks.

chris · January 17, 2012 6:55AM

Hi Adam,

There are likely to be discrepencies between the data read from the different readers, however this sounds fairly major. Can you determine which fields are missing and which file they come from (the "source" column should tell you that), and if possible send me the Excel file (i've sent you an email so you should have my work email address).

Thanks

Chris

Welcome!

Categories

Ideas Parade

Tagged