scalability

Answered

Comments

36 comments

  • Avatar
    Wishgranter
    Hi Jonathan_Tanant

    First take a look on hw + sw specification here OS and hardware requirements

    So RAM is mostly limiting on ALIGNMENT step, but not so much on other processing steps. RAM is in most cases is used only as CACHE, to speedup the process, not so much as requirement.

    What computer can I get to improve the speed of processings ? SLI of 1080TI ? 128 GO RAM or even much (256 GO ?)


    For recommendation, please click on MODEL in 1Ds view, so we can see processing times ( DEPTH MAPS, MESHING, Tx and etc ) on particular steps.
    0
    Comment actions Permalink
  • Avatar
    Götz Echtenacher
    Hehe, "a little bit low" is quite subjective! :-)
    I would also like to know what you are looking at exactly.
    Have you tried the same set with other SW?
    0
    Comment actions Permalink
  • Avatar
    Jonathan_Tanant
    Thanks for the advices and the link.

    So if I understand well, memory is only an issue on alignment, it shouldn't be on reconstruction and texturing ?
    Then GPU is at the work during reconstruction.

    What I am basically trying to do is raise the quality of reconstruction and at the same time lower the processing time (yes, I know, raise quality should raise the processing time, and that is why I need a more powerful hardware).
    And I know that the same alignment on Photoscan would have taken maybe x10 the time.
    With 50MP or 42MP pictures, I have about 400,000 features per image (low sensitivity - small overlap). Maybe that is a little bit too much and I should align with bigger overlap and/or a x2 downsampling ? I already tried (this is much faster) and I still have to do more tests to look at the quality impact.

    So the hard memory swap I had during reconstruction can't be explained by a lack of memory ? It happened 2 times, at the end of the processing (when the very last parts are reconstructed with a small number of vertices (a few hundreds) - this was on a 400 parts model, calculation duration was about 8 hours).


    What I am looking at is real configs for sale in the real world. I just can't find a config with 128 GB/256 GB RAM with multi CPU (up to 32 cores max (including hyper-threading ?)) and 3xGPU for sale... I guess I should build it or purpose. Any advices on motherboards with 128 GB/256 GB RAM and 3xPCI fast express welcome !

    Thanks !
    0
    Comment actions Permalink
  • Avatar
    Jonathan_Tanant
    I found this motherboard :
    https://www.asus.com/Motherboards/Z10PED8_WS/

    -2 CPUs (with 2 8-cores 16-threads i7 this would give us 32 cores ?)
    -512 GB RAM.
    -4 fast (x16) PCI express ports, let's put 3x 1080 Ti.
    Would be quite a monster, what do you think ?
    0
    Comment actions Permalink
  • Avatar
    Götz Echtenacher
    Hey Jonathan,

    Don't have the time right now to follow up on that but if you find a nice setup with good value/cost ratio, let me know!
    BTW, what is your budget? Because feeding those 512 GB will leave a big hole in it! :lol:
    0
    Comment actions Permalink
  • Avatar
    Götz Echtenacher
    I think I'd rather dump it onto one or 2 GPUs with as many CUDA cores as possible.
    But sadly, a second GPU will only result in about 30-40% more speed...
    0
    Comment actions Permalink
  • Avatar
    Jonathan_Tanant
    Yes, you are right, 512 GB of RAM is going to be a little bit costly ;-)
    So you think that double the GPU does not double the processing power ? only 30-40% increase ? Can someone from CR confirm ?
    0
    Comment actions Permalink
  • Avatar
    Jennifer Cross
    Hi Jonathan,
    I'm new to RC, but my experience so far is that the GPU processing portions of the workload seem to get done very quickly and then we are back stuck waiting for the cpu bound tasks.
    I'm using dell t7910 (dual 4C xeon@3ghz, 128gb ram, twin gtx1080 NOT sli configured) and atm pushing 300ish photos in per model.
    With this workload/configuration RC seems to use peak 50gb ram, generally more like 20gb most of the time. My modelling runs are taking 30 minutes ish using the "workflow" creating reality defaults. Aligning images only usually takes 5 minutes or so (I'm still getting used to the types of pictures and required control points so this may change.

    Given the GPU's get their part of the job done very quickly anyway, I'd suggest spending more on ram and cpu than lots of gpu for a better balance for the whole workflow.

    Just my $0.02
    Jen
    0
    Comment actions Permalink
  • Avatar
    Jennifer Cross
    Jonathan_Tanant wrote:
    I found this motherboard :
    https://www.asus.com/Motherboards/Z10PED8_WS/

    -2 CPUs (with 2 8-cores 16-threads i7 this would give us 32 cores ?)
    -512 GB RAM.
    -4 fast (x16) PCI express ports, let's put 3x 1080 Ti.
    Would be quite a monster, what do you think ?


    For the record - not i7 there - you need xeon chips for this. But on the upside, with this design you get more processor attached PCIe lanes. But for most CUDA processing, the bus bandwidth isn't so important so whatever you can get. High clock speed xeons tend to be pricy and server mainboards prefer ECC ram (also pricy). Having said that - you can stick 18C processors in each socket, 4xgtx1080 (or even 4x titan Xp) and up to 265gb ram. Much go fast... almost as fast as it would depreciate :D
    0
    Comment actions Permalink
  • Avatar
    Götz Echtenacher
    Hey, somebody with serious HW knowledge!
    Watch out Jennifer, soon you'll get PMs with advice requests! :lol:

    Jonathan, these numbers don't come from me but *drumroll and fanfare* the Internet.
    Some of it in this forum.
    It's almost impossible to get a real number because it varies greatly with the setup and image set etc.
    But as a general rule you get a notable increase with the second, a tiny one with the 3rd and from then on it's homeopathic.
    So it's nowhere near double or triple.
    I have no statistics, this is from memory of my own research(es).

    I would be inclined to agree with Jennifer that mor CPU is probably more sensible.
    But maybe you are lucky and Wishgranter will leave a post...
    0
    Comment actions Permalink
  • Avatar
    Götz Echtenacher
    But then again, I am doing sizeable stuff (upt to 2500 imgs) and my setup is pittyful, as you can see.
    Yes, the calculation times can get long, but do you want to spend 10 grand only so your model will be finished in 5 instead of 12 hours? Computers don't get paid overtime or weekend bonus... ;-)
    0
    Comment actions Permalink
  • Avatar
    Jonathan_Tanant
    Wow, this is getting serious...
    Thanks Jennifer and Götz, these informations are very valuable !
    The Dell T7910 looks really good, I am trying to see what would be the best value/cost ratio.
    Putting my numbers I am getting a nearly 20k€ config... Maybe a little bit expensive :-)

    So I guess that based on what you said, a good compromise (that would still be a beast) would be :
    256 GB RAM
    DUAL XEON E5 16C
    DUAL 1080 Ti (I can't see the option on the website, did you install them yourself Jennifer ?)

    My typical project in RC :
    -1000-2000 pictures to align at once (42 MP - 50 MP DSLR/Hybrid)
    -merge about 10 to 20 components and reconstruct around 10 000 - 30 000 pictures models.

    Milos, any idea ?
    0
    Comment actions Permalink
  • Avatar
    Götz Echtenacher
    Jees, 20k?
    What are you still thinking about?
    For that money you can't do so much wrong.
    Surely you know this page already?
    https://www.pugetsystems.com/recommende ... mendations
    It's for a competitor but I guess the general requirements are similar...
    0
    Comment actions Permalink
  • Avatar
    Vlad Kuzmin
    256 GB RAM
    DUAL XEON E5 16C
    is enough for 100K images project with components workflow.

    Check Nvidia presentation where Capturing Reality show such project with 100K images.
    0
    Comment actions Permalink
  • Avatar
    Jennifer Cross
    Jonathan_Tanant wrote:
    Wow, this is getting serious...
    Thanks Jennifer and Götz, these informations are very valuable !
    The Dell T7910 looks really good, I am trying to see what would be the best value/cost ratio.
    Putting my numbers I am getting a nearly 20k€ config... Maybe a little bit expensive :-)


    Hi Jonathan... yes the GTX1080's were installed after purchase.
    The base machine was about $AUD 6000 and the graphics cards were about $1k each.
    Generally Dell sell these systems to companies via value adding suppliers who get a much better price than the one listed on the website. You can also watch their factory outlet if they have one in your country for good deals on dell systems.

    You can't see the option for the gtx graphics because these machines are usually provided with Quadro series cards (professional graphics rather than gaming). If you just want GPU compute performance, the TitanX pascal /Titan Xp give you better compute and a bit more memory on a per-slot basis. PSA: you can't close the T7910 case with a full height double slot card in the upper bank of PCIe slots. For what it's worth, my system is configured for DeepLearning, but turns out it works pretty well for VR and this image processing stuff too.

    Jonathan_Tanant wrote:
    So I guess that based on what you said, a good compromise (that would still be a beast) would be :
    256 GB RAM
    DUAL XEON E5 16C
    DUAL 1080 Ti (I can't see the option on the website, did you install them yourself Jennifer ?)

    My typical project in RC :
    -1000-2000 pictures to align at once (42 MP - 50 MP DSLR/Hybrid)
    -merge about 10 to 20 components and reconstruct around 10 000 - 30 000 pictures models.

    Milos, any idea ?


    Not much compromise in there :D The high core count xeons are adding a lot to your cost. You could probably pull back to dual 6-8 cores and a slightly higher clock rate and still be happy. (and save $10k) But if you don't need the extra PCIe bandwidth, maybe even wait a bit and see how well AMD does with the threadripper/x399 systems.

    Happy shopping :)
    Jennifer
    0
    Comment actions Permalink
  • Avatar
    Wishgranter
    HI all

    Highly recommend to wait on AMD offers ( Threadripper and EPYC ) as it will shake INTEL prices and we will see some interesting solution ( price/performance )

    for now AMD solution is focused on storage ( https://www.supermicro.nl/products/nfo/ ... .cfm?pg=SS ) and they will release Workstation/HPC solution soon ( 1-2 months max away )

    But its better to have lower core count at higher clock as many core and low frequency ( low singlethread performance )
    For GPUs for now is the sweetspot with 3 GPU, adding 4th add just approx 10-12 % speedup, and this is very well know issue with PCI-E latency what is hard to overcome sometimes.
    0
    Comment actions Permalink
  • Avatar
    Jennifer Cross
    Though we are getting out of the commonly available hardware, the multi-cpu systems to help with PCIe loading since you have busses distributed around the cpus (providing your affinity code works correctly anyway).
    And not all video cards are the same when it comes to cuda processing.. the new pascal cards are much faster for GPU processing than previous generations. And the Titans have more memory and better bandwidth per card. P100's are better yet.

    Just depends on how much your time is worth... as Götz said - sometimes better to just let it run over a weekend than over-capitalise. Also, no one has mentioned using cloud resources... All the big cloud vendors have GPU accelerated instances. You can just pay for your hardware by the minute rather than spending $10K on a box you only keep busy 5% of the time.
    Cheap as chips and just pay for what you need that day, or hour by hour even.. save more if you can wait and get spot priced instances. (Also a great way to get into machine learning)

    For me, this is a basic deep learning system that I'm just hijacking to do some extra prototyping for some workflows around the company. If we do more/lots, I'll probably foist it off on azure rather than buy hardware.
    Fun discussion...
    Jennifer
    0
    Comment actions Permalink
  • Avatar
    Götz Echtenacher
    Hey Jennifer,

    yeah, some nice info here!
    With Jonathans estimated project size I think he needs to dip a bit into the higher end though... :-)

    How does that cloud GPU stuff work? I've been looking on and off for something like that because I also thought wouldn't it be nice to just rent some processing capacities? No worries about copyright or anything since only bits and bytes will be flowing through the pipes instead of whole models (good for paranoid users like me).
    But I guess the software needs to be able to support it from within, right?
    0
    Comment actions Permalink
  • Avatar
    Jennifer Cross
    Götz Echtenacher wrote:
    Hey Jennifer,
    ...
    How does that cloud GPU stuff work? I've been looking on and off for something like that because I also thought wouldn't it be nice to just rent some processing capacities? No worries about copyright or anything since only bits and bytes will be flowing through the pipes instead of whole models (good for paranoid users like me).
    But I guess the software needs to be able to support it from within, right?


    Most of the current cloud providers just rent you "systems" of various sizes and optimizations (compute, storage bandwidth, gpu, etc). So you could grab a tiny instance (1 core, couple gb of ram but good network allowance) to load your images, maybe do some control pointing. Then get a large instance with lots of cpu and ram for image alignment (24 cores, 600gb ram?) then swap to a GPU enhanced instance for meshing. They make it easy by having preconfigured systems so you can grab a windows10 pre-made (no worries about OS licence or configuration). We're a microsoft house so tend to use azure, but a lot of the machine learning people are using google or AWS and there are pre-made configs for most of the machine learning platforms.
    ATM the GPU optimized instances tend to use K80 GPUs and you pick a size (1/2 k80 - 2xk80) for the job. Google are promising P100 accelerated instances (~40% faster). For the more expensive instance types, if you aren't in a hurry, you can wait and pick up spot pricing (spare capacity on sale if someone hasn't booked it). There are apps to help you save money on your instance bookings if you are doing this a lot. (The machine learning beginners are always looking to suck up all the low cost compute they can get :) )
    Yell if you want specifics.. I'm hesitant to get too far off the mainstream in Jonathan's discussion
    Jennifer
    0
    Comment actions Permalink
  • Avatar
    Vlad Kuzmin
    Interesting Jennifer.

    Let's skip that moment that RC did not have float license.
    But can you roughly estimate how much will cost usage of cloud computing for project.

    For example 1000images 24Mpx
    If this is perfect dataset and not required any experiments with settings, in my PC with i7 and GTX960, 64Gb RAM whole process required about 12-24 hours of work.

    Which instance suitable for such RC work ?

    All i tried clouds estimate before was cost 3-5 times more in 1-2 years timeline comparing to just a good PC.
    May be i wrong?
    0
    Comment actions Permalink
  • Avatar
    Jennifer Cross
    Vladlen wrote:
    Interesting Jennifer.

    Let's skip that moment that RC did not have float license.

    Use a Steam licence - that floats ...
    Vladlen wrote:
    But can you roughly estimate how much will cost usage of cloud computing for project.

    For example 1000images 24Mpx
    If this is perfect dataset and not required any experiments with settings, in my PC with i7 and GTX960, 64Gb RAM whole process required about 12-24 hours of work.

    It would depend a lot on which i7 - some here are using laptop i7 and other may be using overclocked K series.
    but for comparison lets use my my dual xeon (each cpu is ~30% slower than a fast i7 because they only clock to 3ghz) with 8 real cores (8 virtual) which RC uses all 16cores pretty much 100% during image alignment. Doing the default "start" on a scene with ~600images (21mpx) takes about 25 minutes (board room interior, lots of alignment fails because everything is so repetitive). Meshing goes pretty quickly with 2xgtx1080.
    On a smaller run, the model said 12 minutes for alignment and 8 for meshing.

    Vladlen wrote:
    Which instance suitable for such RC work ?

    All i tried clouds estimate before was cost 3-5 times more in 1-2 years timeline comparing to just a good PC.
    May be i wrong?

    https://aws.amazon.com/ec2/instance-types/

    if you are keeping costs down, have a look at https://aws.amazon.com/ec2/spot/bid-advisor/
    for image load and doing control points, you might get away with a t1.micro ($0.02/hr) or use a m1.small ($0.09/hr)
    For image alignment, maybe a c4.4xlarge (16 cores 122gb ram) for $1.50/hr but you can grab a c4.8xlarge(36 core, 60gb) for $3/hr
    If you are in a rush, how about a x1.32xlarge (128 cores, 2tb ram) for $15.60/hr?
    (these are defined duration prices, you can save ~50% if you can make the spot pricing work for you.)
    When meshing, you probably want GPU, so you need to swap to a p2 or g2 instance (p2 for compute only, g2 if you want to spin your models around and have a look). These cost anywhere for $0.70 to $18/hr depending on how much GPU you want.
    Keeping in mind these are billed by the minute - if you finish in 10 minutes, the expensive instances may end up costing less overall.

    Just always always always remember to shut the instance down when you finish. Quite a few people have just logged off and leave the instance running (idle) and come back to find they had run up quite a large bill. (cheap per hour, but leave it running over a weekend still runs up a bill)

    Hope this helps
    Jennifer
    0
    Comment actions Permalink
  • Avatar
    Götz Echtenacher
    I've also been quite aware of off-topic-avoiding until a drop-in pointed aout that those are usually the most interesting - including the one we were off of... :-) And after all most people will find threads by search machines.
    After all it is related in so far as the TO is looking into expanding HW and this could be an alternative.

    So: YELL! :lol:

    I've been trying to find more specifics (on azure) but have no time right now to get through the "this is all so great" and " we are fantastic" animated pages...
    I figure it works that you install your SW on some virtual PC on their cloud, right?
    Did you try it with RC yet? I would be a bit concerned about their licencing system.
    0
    Comment actions Permalink
  • Avatar
    Jennifer Cross
    Götz Echtenacher wrote:
    ...
    Did you try it with RC yet? I would be a bit concerned about their licencing system.

    Hi Gotz!
    I'm using a steam licence... We need/have steam infrastructure anyway for the VR stuff and having a floating (though account tied) licence does make life easier. So as long as you had an internet enabled instance, steam and hence RC should be good to roll!
    And yes - there are pre-made instances for steam (quite a few people use the cloud servers to host games - turn them on just for your clan practice times (pick a zone handy to your players and they usually have very good network connectivity).

    For azure, you use the azure "store" to get your pre-made instance (usually free). On azure, microsoft covers all the various licence/CAL costs (really cheap way to host sql database instances!). Amazon call them Amazon machine images (AMI) and also have a marketplace for them (https://aws.amazon.com/marketplace/ref= ... ct?b_k=291).
    So you could grab an AMI (https://aws.amazon.com/marketplace/pp/B ... duct_title) install it to your instance (few seconds), then load steam and RC.
    There is also a very low storage cost ($0.10/gb/month) for non-instance storage (belongs to the account, not the instance) but your files are available on all your active instances.
    All the cloud services have a free tier or new user free time to get you started. Have a play, watch the videos - see which one works for you! :D
    0
    Comment actions Permalink
  • Avatar
    Götz Echtenacher
    Thanks a lot Jennifer!
    That is something for the next rainy afternoon... :D
    0
    Comment actions Permalink
  • Avatar
    Castlenock
    Hey all,
    I've been away a bit, quickly scanned this - I did some AWS GPU instancing right after the steam release (spot instances are a good way to test and keep costs down!). Long story short, works in a bad pinch, but I got better performance out of my 4Ghz i7 paired with a 980 ti. Any service that would get close to putting proper CPU clock speeds and GPU resources that match your rig aren't really available except on obscure cloud services and they are prohibitively expensive. Nvidia was going to release a reserved instance of cloud gaming Geforce Now instances (so a 4x core count CPU at above 3.5 Ghz and a 1080) that I was going to play with but that never materialized.

    My primary hope is to hear back on when RC is going to support network node clustering or give more details. I'm curious about potential pricing on it as well.

    In the interim, I'm hoping to upgrade a 64GB 4 core, 4ghz, 2 x 980 ti machine with a 128GB, 16 core at 4ghz, 2 x 1080 ti. From my napkin math I think I could get as much as a 50 percent increase in my render times, but I don't know how the efficiency scales on multiple GPUs. My guess is after 12 CPUs your margin of return isn't great, so I'll use the remaining 4 CPUs for work in Modo or other modeling programs (probably wrapped in a VM so as to not bump into things). Does anyone's experiences match up this? Do you think the speed bump would be as high as 50 percent from going from 4x4Ghz to 12x4Ghz and a 35% boost to GPU?

    Thanks!
    0
    Comment actions Permalink
  • Avatar
    Jennifer Cross
    Castlenock wrote:
    Hey all,
    I've been away a bit, quickly scanned this - I did some AWS GPU instancing right after the steam release (spot instances are a good way to test and keep costs down!). Long story short, works in a bad pinch, but I got better performance out of my 4Ghz i7 paired with a 980 ti. Any service that would get close to putting proper CPU clock speeds and GPU resources that match your rig aren't really available except on obscure cloud services and they are prohibitively expensive. Nvidia was going to release a reserved instance of cloud gaming Geforce Now instances (so a 4x core count CPU at above 3.5 Ghz and a 1080) that I was going to play with but that never materialized.

    Hi Castlenock,
    You are absolutely right that the cloud GPUs are slower than latest gamer/AI GPU boards. (well until you get into the expensive stuff anyway)
    But the advantage is you can only use what you need - for example RC only uses the GPU for meshing (as far as I can tell).
    So most of the time, most of my $8000 machine is sitting idle, depreciating and sucking power. If you use it a lot, local is good, if you want flexibility for stuff you do time to time, then cloud is better.
    And the cloud providers do have very large systems for those rush jobs!
    Castlenock wrote:
    My primary hope is to hear back on when RC is going to support network node clustering or give more details. I'm curious about potential pricing on it as well.

    Cluster processing is really hard (message passing latency for off the shelf hardware is a killer)
    But as Wishgranter mentioned yesterday, the AMD EPYC/threadripper (half an EPYC) should help bring the high core count cpu prices down. Still going to be stilling idle most of the time though :(
    Just my $0.02
    Jennifer
    0
    Comment actions Permalink
  • Avatar
    chris
    keep in mind epyc isn't really suitable for rc.

    you need would need 4 x rc licenses to run one dual 32 core machine.

    I don't think you can even get that going on the steam version.

    so you probably need 4 x cli license.

    threadripper is as high as you can go before you need multiple licenses. or disable hyperthreading.
    0
    Comment actions Permalink
  • Avatar
    Götz Echtenacher
    This is probably a total noob question but what about the bottleneck of the internet connection for cloud processing?
    Don't laugh but in my area it is still notable to get over 10 mbit!
    Everything higher (glass fibre) up to 50 is not reliable, we read about streaming-TV-outages all the time in the local newspapers.
    So if I imagine having to heave all those images through there, it would probably still be quicker to calculate it locally.
    May be different if it only sends the raw data through, but that is still a lot.
    0
    Comment actions Permalink
  • Avatar
    Jennifer Cross
    Götz Echtenacher wrote:
    This is probably a total noob question but what about the bottleneck of the internet connection for cloud processing?
    Don't laugh but in my area it is still notable to get over 10 mbit!...
    .

    Yes, slow links will make the batch load painful... 10mbit is about 1mbyte/sec or ~5-10 secs/picture. Hours for a batch!
    Faster speeds even if a bit flakey would be better - you don't need the consistency people look for for streams - just average bandwidth/hour. Cloud certainly isn't the answer for everyone.. but can save investing in a lot of hardware if you can make use of it. We would never buy a DGX-1... but I can budget a couple hours a month on one :)
    Have a good weekend all!
    Jen
    0
    Comment actions Permalink
  • Avatar
    Castlenock
    ...
    So most of the time, most of my $8000 machine is sitting idle, depreciating and sucking power. If you use it a lot, local is good, if you want flexibility for stuff you do time to time, then cloud is better.
    ...

    That has always been my thoughts as well, I tested on AWS immediately after the steam release and didn't do a lot of testing so a lot of this is speculation from my behalf. I do think that the lower CPU clocks really put a damper in performance. My question to you as you have a machine that is close to what I am gunning for on my next iteration is I wonder how the inefficiencies are after CPU core 6 or 7 for RC. I know it's a different program, but photoscan has good articles on CPU/GPU scaling and show CPUs become rather insignificant after core 8.

    I'm hoping to VM Modo and a few other programs to give RC about 8-10 cores at 4Ghz, and up to 8 cores for Modo, Lightroom, Photoshop, etc. via VMWare (as they don't really need the GPUs) to run a few workflows simultaneously.

    Again, you seem to have a machine (and doing VR scenes) and specs that are close to mine - could I PM / e-mail you a few more questions if you're willing?

    Thanks!
    0
    Comment actions Permalink

Please sign in to leave a comment.