Visokio website     Downloads     Video tutorials     KnowledgeBase  
Basket Analysis: finding category overlaps - Visokio Forums
Basket Analysis: finding category overlaps
  • tgcrouch December 21, 2012 7:12AM
    Hi,
    I have data at transactional level, where several transactions can form a single invoice. Assuming one invoice has 3 transactions with a different 'product group' (out of perhaps 10 available product groups), I would like to understand how many invoices contain which combinations of product groups. e.g.

    invoice 1
    trx1 - 3 apples = FRUIT product group
    trx2 - 2 twix bars = SNACK product group

    invoice 2
    trx1 - 1 orange = FRUIT product group
    trx2 - 1 newspaper = NEWSPAPER product group

    invoice 3
    trx1 - 1 apple = FRUIT product group
    trx 2 - 3 mars bars = SNACK product group
    trx 3 - 2 oranges = FRUIT product group

    I would like to count how many invoices have both FRUIT and SNACK product groups, in this case the overlap should equal 2 (because invoices 1 and 3 have this combination). So far, I have managed to get the number of invoices for each of the product groups separately, but with no overlap between any of them (just 0 values).

    Aggregation at invoice level doesn't seem to change anything.
    Any ideas?
    Thanks in advance!
  • 11 Comments
  •        daniel December 23, 2012 6:57AM
    I can't seem to get Venn diagram to aggregate how you want, this is probably because the aggregation part has to be applied before you can input it into the Venn view.

    So first aggregate the raw transaction data set to be one row per invoice with the remaining fields as unique value lists, then tokenise the type field and create queries based off the tokenised field.

    Published as a link as can't get the file to attach on this post:

    http://my.visokio.com/uploads/ExampleVennAggregate/
  •        daniel December 23, 2012 7:02AM
    Also as a note, there is a maximum number of 5 type queries the venn diagram can show at one time. This may create a problem if you have ten product groups.
  • tgcrouch January 2, 2013 7:16AM
    Thanks very much for that Daniel, it seems to be working! My only bottlenecks now (and resolving this may be tricky) are:
    - I'd love to keep the transaction breakdown if ever I wanted to examine an anomalous finding (presumably I'd have to do a manual delve into the relevant transactions)
    - One of the categories is called "OTHER SNACKS" and another "OTHER"; my query containing "OTHER" returns both. Is there a way to get round this?

    Thanks again, you've been a tremendous help so far!
    Tim
  •        daniel January 2, 2013 10:05AM
    Keeping the transaction is not too difficult, what you can do is merge the aggregated data to the transaction data based on the invoice id. You will get extra columns which are based on the invoice, in the example I called them "Agg fields".

    As for the query question, you'll have to elaborate what type of queries you are using. In Omniscope I didn't get an issue when setting queries between similar words.

    Attachments
    VennEx2.iok 9K
  • tgcrouch January 2, 2013 11:07AM
    Thanks Daniel - I've got it to show the transactional breakdown of a particular group of invoices. Fantastic...

    For the query front though, the "OTHER" query essentially gets all of the "OTHER SNACKS" and "OTHER" purists, whereas I only want the latter. Otherwise there ends up being a double count of "OTHER SNACKS" in the "OTHER SNACKS" and "OTHER" queries. Is there a way of writing... "OTHER" but not "OTHER SNACKS" ...in a query?
  •        daniel January 2, 2013 6:53PM
    I still not exactly sure where the query is being formed, is it within Omniscope or is it a query extracted from a database? When I form the query in Omniscope I use exact filtering by setting the query parameters using the search filters on the Agg Type field, if you click through the saved queries in my example file you will see it is separated out and 'Other' does not mix with 'Other Snacks' nor 'Other Snacks' with 'Snacks'. I am using 2.8 and not 2.7, perhaps there is an issue I am not seeing with queries definitions.

    However it may just be a non-issue by simply resolving it by using a completely different term by replacing 'Other' with 'Miscellaneous'. I think that would be the most obvious differentiation you can make and least likely to cause overlap issues:

    Fruit
    Newspaper
    Snackfoods
    Miscellaneous
    Other Snacks

    BTW out of curiosity where is your transaction data from? You don't have to answer if you have to safeguard your data!
  • tgcrouch January 3, 2013 4:21AM
    I think it's because I took a quicker approach of tokenising the 'type' field manually in Excel before Omniscope (so as to avoid having to sum/mean any figures that are already averages). So I think the best option is indeed to replace the 'other' category name!

    Regarding the data I'm using, I don't mind you asking but it's actually just some sample data that I've created to assess the power of various data analysis tools. So it doesn't mean much by itself, but it's meant to be similar to any potential projects that Omniscope might be used for in the long-term :)
  •     paola January 3, 2013 6:12AM
    Venn View is powerful, but also limited, when it comes to analysis of large number of transactions, as Daniel said.
    I would suggest Pivot View instead, where 'baskets' are identified by the unique Invoice number, and it is possible to display large number of product combinations.
    Please see below file to give you some ideas of what is possible to do with Pivot view, Variables or custom queries.
    You can easily tokenise 'basket' field inside Omniscope - please see the formula, also show the basket without the target product, so you can analyse products purchased with it.
    Attachments
    Basket sample.iok 19K
  •        daniel January 3, 2013 9:02AM
    Pivot is good for two group combination, though the more effective and flexible is in Paola's example's content view which uses custom queries/formulas. If you spend some time writing the formulas and then reformatting it into a table it wouldn't have any combination restrictions.

    You basically define for all the possible combinations and have the custom formulas calculate the number and even write out a list of them. However it is not possible to select and delve into them but you get the count and a list if necessary. It may get a quite long if you have a huge number of transactions and invoices.
    Attachments
    Basket samplePLUS.iok 19K
  •     paola January 3, 2013 10:56AM
    Pivot View is providing effective diagnostic tool for product pairs and more... If products A, B and C are frequently purchased together, Pivot table will register this with high B and C purchases against the target product A . If we then want to diagnose whether that is because A, B and C are purchased together (rather than in pairs A,B, and A, C), we can then proceed with more in-depth queries and subset formulas.
    To avoid having to list all product combinations - Variables can be used, as in the demo below, for dynamic calculations.
    To create a time series, showing only purchases of A,B and C, queries can be used, also Venn View to show the overlaps.
  • tgcrouch January 4, 2013 4:43AM
    Thanks to both of you for your additional ideas, OS seems to offer a lot of potential! I'll be playing around with it for a little bit and come back to you if I have any further questions.
    Thanks again!
This discussion has been closed.
← All Discussions

Welcome!

It looks like you're new here. If you want to get involved, click one of these buttons!

Sign In Apply for Membership