Top products from r/computervision

We found 24 product mentions on r/computervision. We ranked the 40 resulting products by number of redditors who mentioned them. Here are the top 20.

Next page

Top comments that mention products on r/computervision:

u/csp256 · 1 pointr/computervision

Now you've doubled your memory usage. That is not necessary.

Computers do things at a set speed. If you want fast code, you have to tell the computer to do fewer things. That means you need to be efficient: get a lot done with fewer instructions and less waste. If you just minimize the wasted effort you'll end up in a good spot.

That's going to be hard to do if you don't know what instructions actually are and don't understand basic computer architecture.

It's also why "threading will make it go faster" is so fallacious. Threading doesn't make things faster, per se, it just attempts to minimize latency by using more resources while often introducing overhead that decreases total system throughput. Using 8 cores to get a 6x speed up might seem like a good idea, until you realize you're introducing latency to every other process and your total CPU time is 33% higher. This also introduces all sorts of weird cache effects which can end up violating assumptions relevant to performance in both your code and other code running on the same system. Unless you need to minimize latency, focus on writing the best single threaded code you can.

Do not write performance sensitive code, such as image processing primitives, in Python. Write it in modern C++. You can call your compiled C++ function from Python. You can reasonably expect to see an order of magnitude improvement just from doing this. It also opens all sorts of doors when it comes to performance tuning.

Allocating and deallocating memory (I'm not very good at Python but I think that's what you're doing with temp=[]) takes time. You know what takes zero time? Declaring an array with a certain fixed size, up front. Alternatively you can use a std::vector<> and immediately .reserve() the size you need. (Don't use .resize() unless you need it.) At the very least you don't need to clear and reappend to your buffer.

You do not need to sort the elements within the 3x3 window. You need to find their median. nth_element() can do this. As can a heap of 5 elements (for a 3x3 filter) you conditionally insert into. (That is the solution to a classic interview question, actually.)

It is unlikely that .sort() is optimized for your specific case. With so few elements it is likely that something like insertion sort, or even bubble sort, will be faster.

Your current formulation can not take advantage of SIMD, which is leaving a lot of performance on the table. Since images are often single channel, 8 bit resolution and common vector sizes are 128 bits you could be leaving 16x performance on the table by not exploiting SIMD instructions. If you don't know what SIMD is you need to go fix that.

(Nitpick: "filter_size // 2" is correct, but why not just do a bitshift? I'm not sure about Python's ability to make the conversion for you.)

You are biasing the filter near the border of the image. By inserting zeros into your buffer instead of only looking at valid pixels you are biasing your filter towards darker values at the borders. You could do some tricky things to find the median of the valid pixels only, but I would recommend just not having the filter be defined there. In computer vision maximizing reliability is often a core focus, so it is often better to just let the output be smaller than the input. Zero-bias error is a really, really nice property to have: don't accidentally lose it over something so trivial.

I'm not that savvy with Python but I'm pretty sure that in "for j in range(len(data[0])):" the len() is being evaluated in each iteration of the "i" loop around it. Compute this once and cache it.

You have multiple if statements in your inner loop. You are guaranteeing that you will get multiple branch mispredictions here. Even if you somehow avoided them, you're checking for an edge condition on every single pixel.

There are a couple of ways to avoid your boundary conditions. The most obvious is to just zero pad your data. This is what most people do, and it can be the right thing. But it makes you use more memory and can introduce an image copy. What I like to do is explicitly write the boundary conditions, then go into a loop for the bulk of the image. This increases lines of code but you don't have to compromise on performance.

I had to solve a similar problem recently. It was single channel, uint16_t data from a very noisy sensor on a system with 128 bit vector width. I needed a 5x5 median filter and decided to use the median of medians approach. Median of medians gives a result whose position is guaranteed to be within 10% of the position of the median in a sorted list. That is, for a list L of size S which has been sorted to give a list K it will return an element between K[0.4*S] and K[0.6*S]. Here is how I implemented it:

The image width size was already a multiple of the vector width. I created a buffer of size 5*row_width. I treated this as a cyclic buffer of 5 rows (such that row n would be evicted once I added row n+5 to it). I was provided a separate output buffer.

Before I tell you the next part, realize that C=A<B, where A and B are SIMD vectors, will fill each element of C with all 0 bits or all 1 bits depending on if the result is true or false. This is useful as a bit mask. Perhaps you don't have a vector min instruction and need to synthesize C=min(A,B) like so:

M = A < B;
C = (A & M) | (B & (~M));

I first prefill the buffer with 5 rows which have had horizontal median filtering applied. Here is how I do that filtering on each row:

I used 5 overlapping vector loads (unaligned loads are performant on this system) to create 5 vectors of 8 elements each (128/16=8). I then run a parallel median finding network on each (look up "sorting networks"). StackOverflow has some example code:

template<class V>
inline V median(const V &a, const V &b, const V &c)
{
return max(min(a,b),min(c,max(a,b)));
}

template<class V>
inline V median(const V &a, const V &b, const V &c, const V &d, const V &e)
{
V f=max(min(a,b),min(c,d)); // discards lowest from first 4
V g=min(max(a,b),max(c,d)); // discards biggest from first 4
return median(e,f,g);
}

Of course if you are lucky enough to have a med3 instruction you should use that instead of the 3 argument median function.

I write the result to the buffer and skip down 8 elements, repeating this process until I fill a full row into the buffer.

After the initial 5 are filled into the circular buffer, I am then ready to output a row of final results. I do this 8 at a time by loading from each of the 5 rows in the circular buffer and running that through the same median finding network. The result is written back in place to the input image. This introduces no RAW hazard because I am reading from 2 rows below it.

I then add another row to the buffer, and then immediately compute one more row of final results (as in the previous paragraph). This continues until I run out of output rows.

Of course I also tweaked how the loops were unrolled.

(Actually, I interleaved these horizontal and vertical median finding operations so I could trade some register pressure for better performance by dodging some vector loads. I only bothered because I was already used to writing ASM on this platform.)

This runs at full resolution, high frame rate on a single (<1 GHz) core while leaving plenty of time for the rest of processing of that frame before the next one comes in. It's runtime is within a few percent of optimal. I haven't timed your code but I'd be willing to bet it is more than 100x slower.

I suggest learning at least the basics of computer architecture if you want to write performant code. Tools like Godbolt are indispensable. (See also.) You're likely not getting within an order of magnitude of optimal if you stick with Python.

u/SupportVectorMachine · 2 pointsr/computervision

I quite liked Charu Aggarwal's Neural Networks and Deep Learning: A Textbook and found it far more useful than the Goodfellow text (which still deserves credit for being the first comprehensive book on the topic). The Aggarwal text is also from Springer (I promise I don't work for them!), so the same things I mentioned earlier apply here.

On the practical side, it is hard to beat Francois Chollet's Deep Learning with Python. He is the creator of Keras, and especially if you are interested in moving to TensorFlow 2.0, learning the Keras API will prove very helpful. But the major selling point of this book is the platform-independent insights and best practices he offers. It's a really well-written and well-presented book and probably has one of the biggest payoff-to-page ratios out there.

You may find the Aggarwal book a bit repetitive in spots, but that is likely because he wrote it to allow readers to easily dip in and out of sections they need and use it as a reference. The repetition only becomes evident if you read it cover to cover (which I don't regret doing).

u/tyggerjai · 1 pointr/computervision

Doing it optically is a fun project, but worth pointing out that the various rotations are so well known and studied that once you calibrate for position, a purely mechanical positioning system is more than adequate, and computer control has been a solved problem for decades. You might want to read Trueblood and Genet to get a feel for the control and automation side, and start with a mathematical/physical solution - you can add vision later.

Edit: specifically https://www.amazon.com/Microcomputer-Control-Telescopes-Mark-Trueblood/dp/0943396050 - the hardware is much easier these days, but the control theory and algorithms are the same.

u/jimduk · 2 pointsr/computervision

If you have the time and money or library access, this book Color Appearance Models - M Fairchild, is pretty good and comprehensive. Colour is a bit of a rabbit-hole topic, it goes quite a long way down (for instance different camera manufacturers have different color models and as I understand sell modified versions in different geographies -so point a Canon and a Nikon at an X-rite and you get different results)
Book
https://www.amazon.co.uk/Appearance-Models-Imaging-Science-Technology/dp/1119967031

Also this guy's blog is pretty good http://www.strollswithmydog.com/perfect-color-filter-array/

u/EfficientStranger · 16 pointsr/computervision

I understand you’re trying to step back and generalize about color perception, but as someone who works in this field, I find the foundational theories problematic. There are a significant number of “beliefs” and “assumptions” in the writing that overlook the biological realities affecting our optical and neurological systems.

For example, you assume that “color itself is the consequence of the brain's distinctions between the possible distributions of luminosity across different wavelengths of light”, which it isn’t as our visual systems are significantly more complex than this reduction.

A few things off the top of my head consider:

  1. differences across the visual field of 1 eye
  2. structural differences between the left and right eyes
  3. color blindness
  4. the ability to learn color sense (much like “getting an ear” in music, with serious practice in the visual arts you will see more and more richly)
  5. changes in the organic visual system over time (simple baby to geriatric, but other things like cataracts, lens elasticity, and optical tumors)
  6. Human visual system evolved for attention to pattern & change in the environment
  7. “Average” human system is tuned for highest sensitivity in Green so colors have different sensitivities as do luminance ranges
  8. Optical illusions!

    Here’s a quick overview: https://www.pantone.com/color-intelligence/articles/technical/how-do-we-see-color

    A lot of what you’re investigating is related to the common observer tests, which you should be aware of if you’re not! It also makes many assumptions, generalizations, and simplifications from shared reality to individual perception: https://en.m.wikipedia.org/wiki/CIE_1931_color_space

    And this is a great introductory book on color science before you get into textbooks:
    https://www.amazon.com/Color-Science-Visual-Arts-Conservators/dp/1606064819
u/Geoe0 · 8 pointsr/computervision

I can recommand SLAM for mobile robotics: https://www.amazon.de/dp/1466621044/?coliid=I2IP24301C8HSY&colid=1N3R5MT3K6FKJ&psc=1&ref_=lv_ov_lig_dp_it Its not for aerial robots but its really good book for SLAM. For aerial specifically I would suggest the works for Cremers et al. from the TU München. There is also a Maters Thesis from one of his students about Dense VO. Its very well written. Also the research of Scaramuzza from the ETH Zürich is very good. His tutorial paper is a good starting point https://www.ifi.uzh.ch/dam/jcr:5759a719-55db-4930-8051-4cc534f812b1/VO_Part_I_Scaramuzza.pdf

u/astebbin · 2 pointsr/computervision

I'd say that the answer to your question depends on the problem. For certain problems, such as detecting faces, there are functions out there that do everything for you. For other problems, such as circle detection, combinations of existing functions will get the job done (as MakingMacaroni describes in another comment). Then for some problems, such as abandoned luggage detection in airports, you really do need to be up on the current research and have a solid grasp of the mathematics involved.

I'd say that the task you're describing is probably in the second or third category. You might try thresholding optical flow over time, as RGKaizen suggests. Depending on how much training data you have to work with, you might also try training a machine learning classifier on one or more visual features to generate profiles of "normal" and "emergency" situations. If you expect big green tanks to appear or fires to break out, blob detection with color histogram analysis might even do the trick. They key is to make the problem as easy for the computer as possible, and figure out which of the functions OpenCV gives you are best suited for your particular situation.

Best of luck! If you go forward on this project, please let us know what you come up with!

EDIT: Here are a few resources for figuring out which functions to use, what math to apply, etc.:

u/4gn3s · 4 pointsr/computervision

The most famous textbook on 3D reconstruction is Multiple-View Geometry in Computer Vision, but I'm afraid it's much too complex for you now.

You need to know how matrix and vector operations work, understand projective geometry in 3D (like rays, intersections, planes etc.); learn epipolar geometry and understand the difference between the essential matrix and fundamental matrix; also take a look at some numerical methods and estimation methods (eg homogeneous least squares).

In the meantime, I suggest you take a look at this book: Programming Computer Vision with Python (you can download it for free), there's a chapter explaining the basics of 3D reconstuction with sample code, this can be good motivation for you.

And be patient, this is a pretty complex field, so better learn the basics first!

u/Berecursive · 3 pointsr/computervision

Pretty difficult to guess to be honest. They may just end up asking 'Google' style questions in which case I would recommend Elements of Programming Interviews which I think is much better than Cracking the coding interview.

u/ToCommit · 2 pointsr/computervision

a quick search on Amazon:

https://www.amazon.com/dp/B07VQWXGSQ/

Whether the claims are true or not, you'll have to test

u/245_points · 1 pointr/computervision

I learned everything from this book http://www.amazon.com/Learning-OpenCV-Computer-Vision-Library/dp/0596516134 (which might be a little outdated by now). And yes, the inter-occular range is dependent on the range you want to measure. Wider separation gives you better depth accuracy, but also can require increasing the amount of image that is searched over ("max_disparities") that can slow things down, so it's a balance. The cameras should be roughly parallel and the calibration process will determine their exact orientation in order to process the images properly.

u/deliverator_011 · 1 pointr/computervision

We used this one for a computer vision class I took last winter Trucco & Verri

u/ivorjawa · 2 pointsr/computervision

http://www.amazon.com/Making-Things-See-Processing-MakerBot/dp/1449307078

It's processing, not OpenCV-based, but it's entirely built around the kinect.

u/fingerflinger · 1 pointr/computervision

Yep, you've got it! Pick up Multiple View Geometry if you really want to get your hands dirty.