Visokio website     Downloads     Video tutorials     KnowledgeBase  
IP Address Lookup - Visokio Forums
IP Address Lookup
  •     sherriff April 20, 2011 7:39AM
    I wish to display the location of IP addresses on a Map View within Omniscope.
    Can you please suggest the most efficient way to do this.
    So far I have researched two possible methods but am unable to complete either one due to lack of knowledge in how to implement:
    1) API for web lookup:
    2) Import a large IP address database into Omniscope (see file attached) this requires conversion of IP address format into Integer number and then some clever lookup which I am unable to figure out.
  •     steve April 20, 2011 7:56AM
    1) is possible, but somewhat tricky and probably not worth trying. It will also suffer from performance issues due to bandwidth and server capability of processing large numbers of records.

    2) Why do you need to convert to Integer? If the IP addresses in your data and in the lookup are both text, a merge (2.5) or Join/Merge operation (2.6) should handle text matching correctly. Post your own data here too, and we'll take a look at the merge operation needed.
  •     sherriff April 20, 2011 9:42AM
    sample of my data attached. It has IP address in text format: 123.345.456.678
    whereas the IP lookup has data in integer ranges.
  •     steve April 20, 2011 11:45AM
    You can convert between IP addresses from dotted-quad form to 32-bit integer form easily (duplicate field, expand values by separator '.', convert to integer, then use formula "256*(256*(256*([quad1])+[quad2])+[quad3])+[quad4]").

    But the problem here is that your IP lookup file specifies starts and ends of IP address ranges, and Join/Merge only supports exact key matching.

    You can append the two datasets together then use SUBSET formulas to do the lookup. This is fairly complicated. See attached. In your sample data, only 1 IP address block matches, but I faked another record to match also.

    I've done this ad-hoc inside Omniscope. Ideally you'd do all this using a sequence of repeatable DataManager operations, so you can refresh with new data against new IP address data and see the merged results, daily.
  •     sherriff April 20, 2011 12:02PM
    OK thanks I will take a look and see if I can figure it out
  •     sherriff April 20, 2011 12:21PM
    I cannot see any of your workings from that file
    can you send a full version so I can understand the SUBSET formula procedure
  •     sherriff April 20, 2011 12:44PM
    I found the formula now and am working on it. Thanks. Have a good evening
  •     sherriff April 22, 2011 6:07AM
    I am delighted that the SUBSET formula works. Thank you very much for that. It does seem to require the latest daily build as of 20th April in order for the INTERSECTION command to work.

    However now the problem that I have is the huge computational task when I apply the formula to the whole IP table with 3.5m records. 100,000 records takes a few minutes and 300,000 a few hours. 3.5m has run over 24 hours so far and has moved from 71% complete to 73% in that time. For comparison I am running it on two machines: one a server with an Intel Xeon 3470 Quad core with 16GB RAM the other a desktop with Intel Core i7-950 8 way multitasking processor also with 16GB RAM.

    Both are running at close to 100% CPU utilisation for many hours and RAM utilisation fluctuates between 11 - 15GB on both machines. So both are working hard on the computations but making little progress. So adding more/faster processors or more RAM is unlikely to make much of an improvement.

    Therefore we need to look for an improvement in the computational architecture. Do you have any suggestions? I will send the IOK file for you to take a look.

    One thought: since most of the records in the Appended table are IP address lookup table records (3.5m records of IP vs 42,000 of data to lookup) most of the computational effort (99%) is going into running the 3 lookup formulas in the 3.5m IP records which is unnecessary and wasted effort. Whereas we really only need the formulas to be run on the 42,000 data records. The way it is constructed makes it necessary to run the formulas across the whole database which is 99% wasted effort. So if we could find a data architecture that would only have to run the lookup formulae on the data records and not on the lookup table that would reduce the effort hugely.
  •     steve April 22, 2011 9:31AM
    Surrounding the formula like:
    IF(Source='My ipdata', {original formula}, null)
    might help?
    If not, I suspect some performance improvements will need to be developed which would be very specific to this subset configuration.
  •     sherriff April 23, 2011 2:06PM
    Your suggestion works: reducing the calculation time from around 16 hours to 1.5 hours (Core i7-950 3Ghz desktop with 24Gb RAM) . Still rather too long for daily calculations on an average machine but just workable using Scheduler. Many thanks.
  •     steve April 24, 2011 3:02PM
    Great. We'll be looking at further performance improvements in version 2.7.


It looks like you're new here. If you want to get involved, click one of these buttons!

Sign In Apply for Membership