Published on Visokio (http://kb.visokio.com)

Home > Operations > Operations

Operations

Operations

 

 

Append

Append 

The append operation allows you to append the data from two or more data-sets to create a single data-set.

Options 

  • Field selection. Allows you to select the fields from the source data that you want to include in the appended data.
  • Source field. Select this option to create an additional field that lists the name of the input data-set each row in the data originated from. For example, if you are appending a series of time-sliced data-sets selecting this option will allow you to track which time-slice each row of data was otained from.

Example

In this example we are appending the data from 3 data-sets together. Each data-set contains spending data from selected days a single month. The first data-set contains data from January, the second contains data from February and the third contains data from March. The expense data is shown below. 

We use an append operation to combine the data from each of these three data-sets. In this example we select all fields and change the name of the Source field to 'Date'. The DataManager configuration for this operation is shown below.

 

Merge/Join

Merge/Join

The Merge/Join operation allows you to merge the data from two data-sets together to create a single data-set.

In most scenarios merging two data-sets requires that the two tables have at least one common field, and that at least one of these fields contains no duplicate records. If you intend to merge on a Date/Time field the storage/display formatting of both merge fields must be identical. 

 

Join by matching records where...

This allows you to specify the join criteria by selecting the matching fields from both datasets. You can define multiple join criteria; each criteria specifies a single match. To add a new join criteria click on the Add join criteria button. When you select a field to match on the number of unique records in that data-set are shown. In most cases one or both of the match fields should contain no duplicate values. If both fields contain duplicate values the merge may result in a huge number of records. Use the Accent sensitive and Case sensitive options to determine whether accent characters or case have a bearing on whether a value from the first data-set matches a value from the second. 

Which record sets do you want to retain?

This determines which records are included in the merged data based on the join criteria you specified. You can select any combination of these options. 

  • Non-matching records from data-set 1. Select this option to include all records from the first data-set that did not match any records from the second data-set.
  • Merged-records. Select this option to include all records that matched in both data-sets.
  • Non-matching records from data-set 2. Select this option to include all records from the second data-set that did not match any records from the first data-set. 

If there are any conflicting field names...

This alllows you to determine what action Omniscope should take if there are any fields outside of the join criteria with matching names. 

  • Add new field names. If a duplicate field name is found then a new field will be created with a new unique name.
  • Keep values from data-set 1. If a duplicate field name is found only the field from the first data-set will be retained.

Source field

Select this option to create an additional field that lists the name of the input data-set each row in the data originated from. For merged records both data-sets will be listed.

Example

In this example we will be merging the data of two tables. The first table contains a list of customers. The second table contains a list of transactions made by the customers during January. The input data for the Merge operation is shown below.

In order to merge these two data-sets we need to identify the join criteria. In this case both tables have a common field: Customer ID. We therefore need to create a merge operation with a single Join criteria, matching on records where Customer ID from the customers table matches Customer ID from the transactions table. 

The DataManager configuration for this operation is shown below.

 

 

 

 

Field organiser

Field organiser

The Field Organiser operation allows you to manage the fields inside a data-set. You can add, delete, re-order and edit the properties of the fields. 

 

The Field Organiser lists all of the fields in the data-set in the order that they appear. Each row of the list represents a single field.

You can add a field by clicking on the Add field button at the bottom of the operation. New fields will be added at the bottom of the list. 

You can rename all of the fields by clicking on the Rename all button.

You can delete all of the fields by clicking on the Delete all button. 

For each field Omniscope provides two sets of options: 

  • Top level options. These represent the most common operations that are executed on a field. You can change the field type, delete, re-order and duplicate the field and reset all of your changes.
  • Advanced and field-specific options. These options can be shown by clicking on the expand field icon , or by clicking on the field name. They represent options that are less common or options specific to a particular data type.

Top level field options 

  • Field type. The Field type dropdown displays the current data type. Omniscope classifies all data columns as either Integer numbers, Decimal numbers, Dates and Times, or Text. You can change the field to a different data type by clicking on the dropdown and selecting a new value, however doing so may result in some or all of the values being lost if the data cannot be converted. For example, If we have a Text field containg the values "Red","Green","23" and we attempt to convert it to an Integer number field, the first two values in this data will appear blank, since they cannot be converted to a number.
  • Rename. Click on this bttuon to change the name of the field. Field names must be unique. You can also rename the field by double-clicking on the field name.
  • Delete. Click on this button to delete the field from the data. If the field was in the source data you can click the Un-delete button to bring it back. If the field was one that you created then the field cannot be un-deleted.
  • Duplicate. Click on this button to create a duplicate of the selected field. The duplicate field will appear immediately bellow the source field. The name of the field will be a unique name prefixed by he name of the source field. All other options in the duplicated field will be identical to the source field.
  • Reset. Click on this button to reset all changes you have made to the selected field back to their original values.
  • Move field . This allows you to change the order of the field in the dataset. Click and drag to change the position.

Advanced and field specific options. 

Common field options

These options are available for all field types. 

  • Use formula. This allows you to enter a formula that will be used to generate the values inside the field. Immediately after selecting this option the formula editor dialog will be displayed. After entering the formula you can edit it at any time by clicking on the Edit... button. If the field previously contained some values these will be overwritten by the values generated by the formula. 
  
  • Replace values. This allows you to replace all of the values inside the field with a single value. The value you enter must be compatible with the selected field type. If the field previously contained some values these will be overwritten by the value specified.
  
Decimal number and Integer number options

These options only appear in Decimal or Integer number fields. 

 

  • Show thousand separator. This determines whether the thousand separator character is displayed.
  • Show as percentage. This determines whether the number is shown as a percentage value.
  • Decimal places. This option can be used to specify the number of decimal places shown for every value in the data. If left blank the number of decimal places is automatically calculated.
  • Prefix. This option allows to add a text prefix that is displayed at the beginning of the number. For example, you may choose to add the prefix "£".
  • Suffix. This option allows you to add a text prefix that is displayed at the end of the number.
Text options

These options only appear in Text fields. 

  • Tokenised. You can select this option if the data in the Text field contains a delimited list of multiple values. If you select this option you must also select the separator character.
  • Graded.  The graded option classes the data as sequential values, similar to numbers, such that they are coloured using a continuous colour scheme (shades of colour between start and end colours) rather than category colours (unrelated colours for each unique value).  See Style > Data colours.  For example, a rating field with values AAA, AA, A, BBB, BB, B, CCC, CC, C would have a custom value order and would be graded, so AAA appeared first and in the "max value" colour, and "C" appeared last in the "min value" colour.
  • Value order. By default, category values are sorted alphabetically. For many categories, alphabetical order is not meaningful. This option allows you to specify the default sorting order of the values. This order will be shown if you select the field in the Device bar in DataExplorer and will effect the colouring sequence. 
Date options

These options only appear in Date fields. 

  • Date format (input). This option shows the original date format of the field. You cannot edit this value unless you have converted from another field type to a date.
  • Date format (output). This option can be used to specify the date format applied to the data.
  • Time zone. This option can be used to specify the timezone.

 Go Back to data-operations [1].

Summarise fields

Summarise fields  

The summarise fields operation allows you to create a "Summary" field that combines the values from one or more fields into a single field. 

 

 Options

  • Fields. The field chooser you to select which fields from the input data will be included in the summary field.
  • Show field names. This option sllows you to select which fields from the selections made in the field chooser include their names in the summary data.
  • Summary field name. This option allows you to enter the name of the summary field.
  • Summary field separator. Allows you to select the characters that are used to separate the values of each field in the summary data.
  • Delete summary fields. If selected, the fields selected for inclusion in the summary data will be removed from the data-set. If unselected the fields will be retained.
  • Include empty values. If selected, blank values will be included in the summary data. If un-selected blank values will be skipped.  

Example

The Summarise fields operation is useful when importing data-sets with a large number of fields. Trying to analyse such a data-set can be inefficient, particularly when the data-set also contains a large number of records. If these fields aren't required for analysis, but you still want to retain the data it may be beneficial to create a single summary field instead.

In this example we are importing a data-set that contains information about a set of employees in the company. A sample of this data is shown below.

This data contains some fields that are useful for analysis: Name, Date of birth and Position. The data also contains some fields that we don't need to analyse: Performance evalutation and Academic qualifications. We could simply remove these fields using a field organiser, but we still want to be able to view the data. In this scenario we could use the Summarise fields operation to create a single field containing the values from these fields. Doing so will improve the overall performance of Omniscope once we have loaded the data.

The DataManager configuration for this operation is shown below. 

  

 

Go Back to data-operations [1]. 

Record filter

Record Filter

The Record filter operation allows you to generate a subset of the rows in a table by applying one or more filter rules.

 

Filter rules

Each filter rules defines a single condition for selecting a set of records in the data. You can create multiple conditions by defining multiple filter rules.

To add a new filter rule click on the Add rule button in the bottom toolbar. There is no limit to the number of rules you can add in a single Record filter operation.

To remove a single rule click on the Remove button or click the Remove all button to remove all of the rules in the operation.

To view or edit the rule click on the expand button  or click on the rule name. Each rule is comprised of 3 elements: 

  1. The name of the field you want to filter by.
  2. The rule you want to apply to the selected field.
  3. The value you want to filter against.
Match criteria

 

The match criteria options are shown at the top of the operation. They determine how the filter rules should be applied to the data. You can choose to either accept or reject the records that match all or any of the filter rules that are defines in the record filter operation.

Example

In the following example we have a data-set containing a list of company employees. The data is shown below.

We will use the Record filter operation to retrieve all Female employees based in London. To achieve this we need to create two filter rules. In the first filter rule we want to obtain all employees based in London. This rule is configured as folllows: 

  1. Field: Location
  2. Rule: =
  3. Value: London 

The result of applying this rule is shown below.

Now we want to add another filter rule. In this rule we want ot obtain all female employees. This rule is configured as follows: 

  1. Field: Sex
  2. Rule: =
  3. Value: London  
In the match criteria section we choose to Accept all records that match all of these rules. By combining these two rules together we are filtering the data to only accept records where Sex=Female and Location=London. The result of applying both of these rules together is shown below.

 

 

Go Back to data-operations [1].

 

 

Random sample

Random sample

The Random Sample operation generates a data-set containing a random sub-set of rows from the input data. This can be useful when you are working with very large data-sets, allowing you to work with a smaller sample of data while preparing and testing additional operations that need to be applied to the data. 

Options 

  • Records. The Records field allows you to select the number of records in the sample. The records are chosen at random, so each time you execute the operation you will obtain a different sample.

Example 

The Random sample operation can be useful when you are working with very large data-sets. You can use the Random sample operation to generate a small sample of the data.This is useful because some operations can take a long time to execute on large data-sets. By working with a smaller data-set you can create, configure and test additional operations that you want to apply to the data much more quickly.

In this example we are working with a fairly large data-set containing approximately 1,000,000 records. We want to use a combination of the Random sample operation and the Input switch operation to switch the data between a small sample of 1,000 records and the full data-set without having to reconnect our workflow. A configuration that allows us to do this is shown below.

 
If we want to change the data from the sample data to the full dataset we simply need to click on the switch in the Input switch operation.
 
 

 

More on Operations [1] 

Input switch

Input switch

The Input switch operation allows you to switch between two input data-sets.

Options

The Input switch operation contains only a single option: the switch. Clicking on the switch allows you to select the data-set you want to use.

Example

An example of using the Input switch operation can be found in the Random sample [2] operation documentation.

De-duplicate

De-duplicate 

The De-duplicate operation allows you to remove or retain the duplicate records in a data-set. 

 

 Options

  • Field selection. The field selection deterimes how the duplicate records are calculated. You should select a set of fields which together uniquely identify a single record. 
  • Minimum group size Define by 2 if duplicates, 3 if triplicates, etc.
  • Case sensitive
  • Output    


Pivot

Pivot

De-Pivot

De-Pivot


Transpose

Transpose

Aggregate

Aggregate

The aggregate options allows you to define an aggregated transformation of the input data and to define the aggregation functiona applied accross the values in each field.

 

 

 


Search/replace

Search/replace

The Searh/replace operation allows you to replace all occurrences of one value inside one or more fields with another.

Each Search/replace operation allows you to define one or more search actions. The actions appear as a list and are executed on the data in the order they appear in the list. You can change the order of the search actions by clicking and dragging the hand icon. You can change the name of the search by double-clicking on the name. 

  • Add search. Click this button to add a new search action.
  • Auto-generate. This allows you to auto-generate a series of search actions. When you click on this option the Auto-generate search dialog will be displayed. This dialog allows you to select a single field in the input data and to select which values inside that field you want to create searches  
 
  • Remove all. Click this to remove all the searches.  

Search options

You can view and edit individual search options by clicking on the name of the search action or by clicking on the expand button. 

  • Fields. This allows you to choose which fields you want to execute the search in.
  • Search for. This allows you to select the text you want to search for.
  • Match entire cell.
  • Replace with. This allows you to enter the text you want to replace the search text with.
  • Case sensitive.



Scramble

Scramble

The Scramble operation allows you to scamble the text in one or more fields. This allows sensitive data to be removed without affecting the structure of your file.

 

 


Expand values

Expand values

The Expand values operation allows you to expand the values inside a single field into one or more new fields.

 

 

  • Field selection.
  • Expand by.
  • Separator.

Expand by Separator example 

 

 

 

Expand by fixed width example 

 

 

Collapse values

Collapse values

The Collapse values operation allows you to combine all of the values in one or more fields into a single new field.

Options 

  • Fields. Allows you to select the fields you want to collapse.
  • Separator. Specifies how the values in the selected fields are separated.
  • Collapsed field name. The name of the field containing the collapsed values.
  • Skip empty cells. Determines whether empty or null values are included.
  • Delete collapsed fields. Determines whether the selected fields should be deleted. 

 

Tokenise

Tokenise 

 


De-tokenise

De-tokenise

Sort

Sort

The Sort operation allows you to sort your data by one or more fields. 

 

Options  

  • Field selection. Allows you to select which fields you want to sort by.   

 


Text mine

Text mine

The Text mine operation is used to extract and derive useful information from text in your data. The text mine operation currently provides three different mining algorithms; Top words, Word count and Sentiment. 

Field selection.

Select which text fields you want to mine.

Top words

The Top Words algorithm extracts the most popular words from the selected fields. The popularity of a word is determined by the number of occurrences in a text cell. Short words such as "the", "and" and "or" are ignored. 

Word count

The Word count algorithm counts the number of words and the number of unique words in a text cell.

Sentiment

The Sentiment algorithm attempts to determine the sentiment of the text. Sentiment is determined by counting the number of positive and negative words and calculating an overall score. A positive score indicates a positive sentiment whilst a negative score indicates a negative sentiment. The higher the score, the higher the sentiment. 

 


Google Translate

Google translate

The Google translate operation allows you to translate the text in one or more fields of a data-set from one language to another. It uses the Google translate service to perform the translation. You must have access to the internet to use this operation.

 

 

Options 

 

  • Fields. Select the text fields in your data that you want to translate. All values in the selected fields will be translated unless you specify an exclusion.
  • Source language. Select the language of the data in the selected fields.
  • Target language. Select the language that you want to translate the text to.
  • Exclude. Use this option to specify text that you want to exclude from the translation using a regular expression. Leave this option blank if you want to translate all text.

© Visokio | Privacy Policy | Terms of Use | Contact Us


Source URL (retrieved on 11/01/2017 - 21:18): http://kb.visokio.com/node/606

Links:
[1] http://kb.visokio.com/data-operations
[2] http://kb.visokio.com/node/661