Hardware Optimisation & Benchmarking Shenanigans




  • Avatar

    Hi benjy

    I'm glad it worked this time, thank you for posting the results.

    Some applications create profiles on a per user basis,  and some don't.   It's fine either way.  What that does help rule out that your user profile is a issue.   

    Another possibility is that the uninstaller hasn't/doesn't fully uninstall everything.  And leaves certain values behind.   This again is often not a problem, however at times it can be if your software plays up.  

    The issue here if trying to fix is playing with the registry can cause more harm than good.

    It maybe worth using cc cleaner to uninstall, and run it's registry cleaner tool (may need ot be ran a few successive times).  -  It is a well respected tool, be sure to get it form the official pirform site.    If in doubt don't.   Sometimes it can cause more harm than good.

    Comment actions Permalink
  • Avatar

    Thanks Ivan, great advice. At least without a second PC, benchmark proved it's not the photographs and it's not my PC generally speaking. I'll give cc cleaner a whirl.


    Comment actions Permalink
  • Avatar
    Tom Foster

    I'd like to add my appreciation for this project. If I could significantly understand the detail of the expert discussion, then I would be right in there, keen to participate and learn. As it is I can only watch and be amazed at the amount of effort needed to prepare to collect the resultant data, let alone to analyze and make sense of it.

    And hope and pray that the eventual distillation of guidance will be made public. Because, fast as RC is, speed is going to be make-or-break for an independent operator to get established offering initially a modest local service, via an efficient reliable workflow, without access to any high-finance render-farm. Having the optimum standalone machine will be one of the keys.

    Comment actions Permalink
  • Avatar

    You are welcome.  
    Don't be fooled into thinking we are experts or have the remotest clue what we are talking about or doing.
    You are more than welcome to add your 2cents.

    Fear not, the intention of the benchmark is as you hope for.

    There has been a lot of talk from me, and not much evidence of my web-based results page.  I have struggled immensely with that part.  So many solutions which claimed to be able to offer the ability to upload and then show data were failures and did not deliver.   I think I have it cracked.. Mostly.

    This is now as a good a time as any to share where I am.  There are currently 2 parts.
     1)  the upload, and 2) the public results.

    Getting the publicly viewable results shown in a clear and presentable manner that can be analysed and interrogated was a important part.
    Here is where I am with that.  The data is drawn live from the google spreadsheet from which results are uploaded too.
    And updates accordingly.   The pie charts etc are not final and I will change the metrics displayed/used.  It's just a test to get it working.  And will have more useful data shown for your viewing pleasure..

    Note. the contents are fabricated by me changing the rawresults.txt files that have been uploaded each time, and don't represent real results yet.  


    The uploading part is currently not as pretty. (which will change.)  is here

    I'd very much appreciate anyone to try uploading some data.   Using the rawresults.txt that is generated by the benchmark.  please use that file rather than the results.txt as it will add garbage to spreadsheet, I have not yet added code to reject the incorrect file.   Yes the results will be kind of useless as we are all using different datasets for now,  however for the moment I need help with checking that the uploaded process is working correctly, and the results are displayed properly.

    known issues.

    1) Results are show instantly on the upload page, however can take a min+ to appear on the pretty public results.   And you will need to manually refresh the page for your uploaded data to appear.  This is a limitation of the platform.  It caches data on the server to save on resources.  Poor google and their lack of resources...

    2) Works on chrome, I do not know about other browsers.

    3) The Rawresults.txt must be selected for the upload or terrible things may happen  (not that terrible, but will make a mess on the spreadsheet with garbage data)

    4) The chart is full of made up data .   For now the fact that results can be generated, uploaded,  displayed and analysed is the important part.

    5) You can download the results for your own analytical pleasures, there is a hidden button next to the word " Total."


    As always.  Feedback is really appreciated.



    Comment actions Permalink
  • Avatar
    Tom Foster

    For me, Christmas then doing year-end accounts always wipes out a month from my 'working' life, so can understand why this vital and timely topic has gone quiet.

    Any update? Shopping for some 'ideal' hardware is getting closer so I'm looking forward to 'the answer'!

    Comment actions Permalink
  • Avatar
    George Jarvis

    Hi Folks,

    Here are the result from the benchmark I just ran with 1159 images at 17mpx. I have uploaded the rawresults.txt to the spreadsheet.

    Comment=1159 images
    Version= Demo RC
    Depth (GPU)=15.272
    Total Time=266
    CPU=Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz * 2
    GPU=GeForce GTX TITAN X * 2
    Cache Drive=INTEL SSDSC2BA400G3 ATA Device
    RAM (Bits)=206158430208
    Ram Speed=2133


    CPU1: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
    CPU2: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
    Number Of Processor Packages (Physical): 2
    Number Of Processor Cores: 36
    Number Of Logical Processors: 72
    Motherboard: Supermicro X10DRG-Q
    Chipset: Intel C612 (Wellsburg-G)
    Memory: 256 GBytes @ 933 MHz, 13-13-13-31, DDR4-2132 / PC4-17000
    GPU1: NVIDIA GeForce GTX TITAN X 12288 MBytes of GDDR5 SDRAM [Hynix]
    GPU2: NVIDIA GeForce GTX TITAN X 12288 MBytes of GDDR5 SDRAM [Hynix]
    Drive: INTEL SSDSC2BA400G3 400GB, 381,554 MBytes, Serial ATA 6Gb/s @ 6Gb/s
    Network: Intel Ethernet Server Adapter I340-T4
    OS: Microsoft Windows 7 Professional (x64) Build 7601

    There is some confusion with the RAM specs here as hardware info shows what is physically installed - 256gb, although the OS can only use 192gb. It's currently clocked at 933Mhz not the 1066Mhz suggested by the benchmark test.

    Comment actions Permalink
  • Avatar

    Hello George,

    That's one mighty PC you've built there. It's beyond me to extrapolate from your benchmark results to my world, i.e. working with a different image count @ 42 MP and very different system resources, so I'm interested more in your general response to some questions geared at evaluating things like dual socket CPU and performance of the Titans. 

    Ivan (thanks again, Ivan, for putting this conversation and utility on its feet, see how we benefit?) suggested early on that he'd take fewer faster cores than more slower. I see you went with dual 2.7 GHz CPUs, 18 cores each. Maybe, you had other considerations in your choice, e.g. running two CPUs cooler, needs of other apps, but did you run any comparisons with single CPU and/or with faster CPU to inform that perspective? 

    It's been my impression that a Tesla verges into the case of diminishing returns relative to a 1080 Ti, the cost/value being hard to justify. Time is money, so I assume your projects yet justify your pulling out a sledge hammer - correction - two sledge hammers. What kind of memory pressure are you seeing in Resource Monitor on the first GPU, the second?

    How about RAM? At a fraction of your whopping 256 GB I rarely see memory pressure in RC, really only after reconstruction and texturing, where GPU memory doesn't support Sweet display quality. I suppose that's at least one area your Teslas rock.

    I'm still relatively new to Windows PCs. I came up with Macs mid-80s, do miss the construction quality, and in the case of the Mac Pro "trashcan" I've yet to see a slicker approach to thermal regulation. I'm currently running a i7-7820X in an ITX case, the smallest form factor supporting the 980 Ti short of a laptop (didn't want the heat). This ITX is now gasping its last breath, thanks to a Delta agent who performed an atomic body slam when dropping my well-padded storm case onto the belt from chest height. (I later found the GPU loose inside the case, rear metal mounting plate peeled back with mounting screws ripped clean out! So many loose connections, had to rebuild from scratch 3 times to finally get video up and operating system to load.) I'm now getting BSODs, believe a weakness on the motherboard has now broadened into a tiny bridging gap hiding somewhere. So, I'm now going with a Boxx laptop I can take carry on, use in field to validate a day's capture and for demos, but for desktop I'm pulling out my tower ready for parts. I'd like to repurpose my i7-7820X and possibly bring in a second one. Thanks for your thoughts.


    Comment actions Permalink
  • Avatar
    Tom Foster

    Wondeful if this important thread comes back to life

    Comment actions Permalink
  • Avatar

    Hello Ivan,

    Anybody home? I'd like to revisit this topic, am looking at spec'ing a new PC, am curious what you think about any possible conclusions one could draw from the albeit limited participation you worked so hard to make possible. What you did with this benchmark utility is really important work, now gathering dust. Ugh.

    It was my impression that number of cores wasn't as valued by RC as clock speed, right? What was the actual evidence for that? It appears the i7 processors base frequency topping off at 4.0 GHz (i7-8086K) would then win over the fastest of the i9 class topping off at 3.6 GHz (i9-9900K), though overclock speed is the same at 5.0 GHz. To then account for the difference between their respective core-thread count, 6-core 12-threads for the i7-8086K v. 8-core 18-threads for the i9-9900K. 

    I then wonder about a cpu with slower base frequency than either of those, the i9-9960XE at 3.1 GHz, but sports 16-core 32-threads. Multiplying the base frequency by the thread count for these three processors you get:

    • i7-8086K — 4 x 12 = 48
    • i9-9900K — 3.6 x 18 = 64.8
    • i9-9960XE — 3.1 x 32 = 99.2

    Those totals reflect the proportion between Passmark numbers:

    • i7-8086K — 16,684
    • i9-9900K — 20,168
    • i9-9960XE — 30,641

    So, I'm curious about the reality of how these three cpus stack up in RC. Is there a sweet spot, as suggested before, and would this be the middle guy, the i9-990K? Do we see diminishing returns with the i9-9960XE? Note, there's one model with yet more cores-threads, the i9-9980XE, slightly lower base frequency at 3.0 GHz, and even with its 18 cores/36 threads Passmark shows it coming in beneath the i9-9960XE, which seems to underscore the diminishing returns principle. Even so, the 150% jump in performance between the i9-9900K and i9-9960XE would seem to outweigh whatever is lost apples for apples with this diminishing returns thing. No?

    Many thanks for your noggin.


    Comment actions Permalink
  • Avatar

    Hi All & Benjy

    I have been hiding.  I am working on some new fancy tools to help.  I have not forgotten you all.  My license expired too which didn't help :D

    To answer your question is tough.

    From my testing (which isn't concluded).  There is no easy answer. 

    The issue is as follows.
    The initial benchmark, gives a rough idea of performance to be had.  Of course more cores & more mhz the faster things go.

    However it isn't so simple.
    The software behaves differently depending on the dataset given.
    As you know the computational stages are split into:

    a) alignment - a predominantly fast stage
    b) point cloud creation - this is where most of the calculations are done
    c) texturing
    d) mesh export

    However there are multiple stages within the point cloud creation, such as depth calculation (gpu accelerated).  Some of these interim stages are single threaded and some are multi threaded.  So some stages win out with core count, and other stages win out with pure ghz.

    Using different settings within the application, image resolution, number of images, quality of images (not just meaning sharp etc, but enough so the software has a easy time), all can throw the weighting one way or the other.

    Roughly speaking when dealing with small datasets low res, mhz is king. 

    If dealing with many images - 200+40mp images for example, core count wins.

    There are diminishing returns with multi core systems.  Even more so with dual cpu systems.  My old world record breaking dual 14core xeon system is slower than my new one at almost half the cores.   There are so many variables in system architecture that make a difference.

    I have the i9 980xe all cores are @ 4.5ghz (some motherboards allow all core turbo as default).    However it was silly expensive, hot and power hungry and is wasted most of the time I love it.   Enthusiast things have drawbacks.

    AMD maybe worth considering too.  They will be releasing the 7nm parts this summer and are taking the lead over Intel for the price performance ratio when high core count is key.  I'd be cautious of the current gen, but the new ones on the horizon look very interesting.

    The GPU processing part does not tax the GPU to the fullest, and you'd be much better getting a regular gaming card.  Perhaps a 1080/ti or better.  There maybe circumstances where multi gpu can help,  however It's such as small amount of the overall calculation time, It would depend on the project your working on.    I don't believe RC takes into consideration your video ram, and the sweet setting is hard coded, more memory is wasted (I could be wrong here).

    When waiting a day+ for a test to come out 5% here and 10% there start to make a significant difference.
    Tweeking bios settings, ram timings, motherboard choice, windows setup, all make a difference too.

    SSD's are essential in this modern day, however past a certain point make little difference to performance.   No need to get some fancy nvme expensive thing.  However be sure it can sustain long term writes, many drives fall to about 30MBps after the cache is full.

    So to conclude the best system will be different for different people, depending on the data sets they commonly use.

    What type of datasets are you throwing or planning at throwing at RC ?





    Comment actions Permalink
  • Avatar

    Hello Ivan,

    All very interesting. You deserve a medal for all this work and sharing it. 

    I'd like to respond with more questions but let's begin at the end of your post by answering your question, what am I throwing at RC. Most my work begins with at least 400 images, so maybe right there you'll say my case benefits from multi-threaded cores. My projects easily run past 2500, and I've had one up at 10,000. These images are 42 MP and I'm growing increasingly interested in running them at High. For the same amount of manual input I get four times the amount of data than at Normal (downscale is 2:1, so twice vertically and horizontally), availing myself to move a virtual camera as close to surfaces as I want, in the case of virtual cinematography, or as the user wants, as in interactive applications. We enter a virtual environment, look around, are swept up in the believability. Curiosity urges us to explore, for humans that means we want to approach things and get within arms reach, proprioception urging us to want to touch surfaces. Even if we can't, mirror neurons in our brain taking in stereoscopic information of finely detailed texture information allow us to feel those textures without actually touching them simply based on memory of how many times past we've touched similar surfaces. Okay, a long way to say I want all the data my system can handle, especially if it's doing the heavy lifting, nothing more from me.

    It's hard to imagine my case wouldn't benefit from as much of everything any component can deliver, but not to say I don't hear you on the no easy answers. There is a point, rather points, where values come into contention. For instance, offline rendering for film-grade animations favors a very different GPU than does online rendering, such as required by game engines, with big things happening in that space and something I'm more focussed on. To your statement:

    "The GPU processing part does not tax the GPU to the fullest, and you'd be much better getting a regular gaming card.  Perhaps a 1080/ti or better.  There may be circumstances where multi gpu can help,  however It's such as small amount of the overall calculation time, It would depend on the project your working on.    I don't believe RC takes into consideration your video ram..."

    I had suggested wanting the RTX 2080 Ti, not sure what you mean with "regular gaming card", I know the 2080 Ti isn't regular compared to everything downstream, but before its recent release, you could say the same thing for the GTX 1080 Ti. Both have the same amount of video RAM, 11 GB, but the memory speed differs, GDDR5X at 11 Gbps for the 1080 Ti, 14 Gbps for the 2080 Ti. My only thought about the value of video RAM capacity would be how large of a clipping box your system can handle after texturing a model. Add to that, Passmark shows 17,100 on the 2080 Ti versus 14,241 on the 1080 Ti, about a 20% boost. I'd think the GPU-intensive depth map calculations would clearly warrant the RTX 2080 Ti, no matter what you're working on in RC. No? That might not cost justify, another matter, but I'm thinking simply in terms of performance. I hear what you're saying about even small performance improvements acting as a multiplier to advantage larger projects spanning days or even weeks.

    I'm yet interested to learn from you what you mean with "no easy answers", never mind my specific case, but just generally. We know that during any particular stage there are sub-stages that swing between CPU and GPU. We keep Resource Monitor and GPU-Z open to see all that. To the extent camera orientation seems purely CPU-intensive, am I missing something to think that this could be an easy one? If you have many images with lots of pixels you'll benefit from fast read/write SSDs, lots of system RAM, and many fast cores, there possibly being a sweet spot where core speed shouldn't be allowed to go but so low relative to core count, as would appear to be the case with Xeon processors. Might this yet apply even to the case I pointed to with the i9-9960X being a bit cheaper but outperforming the i9-9980XE, where the former has slightly faster, albeit fewer, cores?

    You talk about quality of images, e.g. sharp, and I would add to that the roughness of the subject matter, the two combining to pose different problems to solve in finding a solution that keeps projection errors in check when returning a component. I've observed this while feeding my system manmade subject matter, simple planar shapes, versus extremely folded, like a plant. I've observed the sharp/occluded subject matter factor takes longer processing times on both CPU and GPU, which stands to reason. My brain has an easier time forming a map of a box on a floor than a pencil jade plant. If that's true, why in your mind would this not be simple to keep separate in thinking about optimizing a system for RC? I'm thinking of it this way. If say every RC user ran two of the same datasets with your benchmark utility, one featuring a low number of images, containing not optimally sharp pixels of simple subject matter, the other having the converse, and assuming we all had a nice spread of system specs ranging from low end to high along all those lines you describe, I'd expect to see the folks running the low end dataset on the low end systems scoring lower than those running the same dataset on high end systems, but that spread being X. I'd then expect that spread to be far greater in the case of folks running the high end dataset on low end systems compared to the high end systems. Yes?

    I realize with so many variables in the imagery and the system specs that those final numbers in RC's reporting will be all over the map, but I'm eager to preclude a forest for the trees situation. If as a starting point we can know that the above statement holds true (up to that finer point about diminishing returns for core count), where would you place the monkey wrench in that understanding? So far I'm hearing, high end datasets benefit from many cores, but not so many that you lose speed. High end data sets benefit from strongest Cuda cores in GPU, video RAM is nice but not essential, and more than one GPU quickly goes into diminishing returns. High end datasets love fast SSDs and lots of system RAM, speed of RAM doesn't make much of a difference. 

    I so value whatever monkey wrenches you bring, but I also lack understanding about what to look for in a motherboard, power supply, thermal regulation, the rest of the system. At this stage I'm only able to ensure it's the right socket type for CPU, am oblivious to much of it. Big thanks for all you do!



    Comment actions Permalink

Please sign in to leave a comment.