Tuesday, April 24, 2012

A Refresher on "Big O" Notation (Python)

It's well known that asymptotic notation is used to convey the speed and efficiency of code blocks in computer programs. I haven't used them very much while working with Python, so I needed to refresh my memory before trying to use this great tool.


Cardinal Rule: Focus primarily the largest value in the equation of time complexity. All other factors in the time complexity equation are essentially trumped.


O(n^4+n^2+n^3+nm+100) ~= O(n^4)
Update: assuming m is linear.


Trump Rules for Time Complexity:

  • Notes
    • Remember that we care most about the upper bound and are not so concerned with the lower (in general)
    • The smaller the upper bound number the better (and consequently, faster)
  • The Ladder
    • Constants are less than logarithms
    • Logarithms are less than polynomials
    • Polynomials are less than exponentials
    • Exponentials are less than factorials

Notation and Hierarchy (Smaller Is Better):

Constant Θ(1)
Logarithmic Θ(lg n) 
Linear Θ(n) 
Loglinear Θ(n lg n) 
QuadraticΘ(n^2
Cubic Θ(n^3
Polynomial O(n^k
Exponential O(k^n
Factorial Θ(n!)

Quick Examples:

[i for i in list] {linear}
Functions that generally operate on lists or generators (sum, map, filter, reduce, min, max, etc) tend to be linear in time complexity

[i+k for i in list for k in list] {quadratic}
[i+k for i in list1 for k in list2] O(list1*list2) {quadratic I think, since it's linear*linear}
Add 1 to the exponent value for each nested loop. For example. [j+i+k+n for j in list1 for i in list1 for k in list1 for n in list1] would have a time complexity of O(n^4)

Note: Some programmers reduce quadratic time complexity a bit when using nested loops with sorted lists by ensuring that calculations aren't performed more than once. Consequently, that code block runs faster and faster and less and less has to be evaluated through each iteration of the loop. For example:

list1 = [i for i in range(10)]
size = len(list1)
number=100
for n in range(size-1):
    for k in range(n+1, size):
        print number*(n+k)


4 comments:

  1. Hello,

    It is a good idea to talk about BigO notation. But IMO there are two mistakes in what you say:

    - O(n^4+n^2+n^3+nm+100) is equal to O(n^4), not approximately equal.
    - You don't reduce the quadratic complexity by the trick you present.

    BigO is a mathematical concept. We may discuss on whether it is useful or not for programmer. But IMO it is clearly not useful if not used correctly: you add confusion instead of clarifying things.

    Cheers,
    markolopa

    ReplyDelete
  2. Hello,

    It's a good idea to talk about bigO notation. IMO there is however two errors in what you say.

    1. O(n^4+n^2+n^3+nm+100) is not approximately equal to O(n^4). If you make no assumption on m you can't say this. And if assume for instance that m is asymptotically smaller than n or n^2, than the two big O's are equal.

    2. You don't reduce the quadratic time complexity of the nested loops by doing the trick you describe.

    BigO notation is a mathematical concept. One can argue on whether it is useful or not for a programmer. But IMO if it is used approximately it adds more confusion than clarification.

    Regards,
    markolopa

    ReplyDelete
  3. Following up on markolopa's accurate comment, the equivalent to your big big-O expression is:

    O(n^4 + nm)

    if one makes no assumptions about the size of "m" compared to the magnitude of "n". And it would make no sense to even have a variable "m" if you knew it was somehow related to "n". :)

    ReplyDelete
  4. Hi Nikhil, thanks for the nice post within the context of Python. There's another page that helped, that is in the context of Java, good help as well:

    Big O Notation Java

    ReplyDelete