Hardware Optimisation & Benchmarking Shenanigans
AnsweredHi
As we all know taking as many good pictures as possible is the best way to ensure things go well, and ultimately will save lots of time back in the office.
However time is allways against us. As fast as Capturing Reality is, waiting is inevitably part of the game. And it can be a long time, days+ to only find out your photo shoot was not good enough.
hardware requirements are listed here https://support.capturingreality.com/hc/en-us/articles/115001524071-OS-and-hardware-requirements however it is very vague. From experience, with rendering & video encoding etc, a badly configured system which looks great on paper can perform 1/2 as fast as a similarly priced system with carefully selected components and optimised appropriately. Throwing more money at the problem is not always the answer. And at times can slow things down.
Various stages of the calculations, stress different parts of the system. However to what amount I am struggling to figure out. how can I/we optimise a system that will perform the best with the software.
I have recently got rid of my dual Xeon v3 28 core workstation, which for rendering was awesome, however in reality capture it was painfully slow. A much higher clocked, new architecture consumer skylake system is not hugely different in reality capture yes a little slower, yet 4x+ slower for rendering (cinebench). cost 5x+ less and has 4 vs 28 cores.
Below are the areas which I know can make a difference. Unfortunately as with many things we cant have our cake and eat it. Cost has a big influence and also technological system restrictions mean you can have 32gb of very fast ram, or 128gb+ of slow ram. You can have a 5.2ghz 6core cpu or a 3.6ghz 16core cpu.
- Cpu speed Mhz (IPC) - More is better always.
- Core count (threads) more is better - to a extent, and not at the cost of IPC. From my experience a dual cpu system worked awesomely with some applications, however the system architecture did not agree with other applications such as capturing reality and underperformed. I have a feeling that in the same manner as GPU's you get increasingly mitigating returns with reality capture when increasing core count, even if the cpus are maxed 100%.
-CPU - Instructions support (AVX etc) does reality capture take advantage ? or will it soon and to what extent ? I see you are looking for a avx coder. Has avx been enabled when the software is compiled or tested ?
I personally am wishing to build a new system, AMD offer good value CPU's at last with Threadripper and EPYC, however do not support AVX well at all. It would be a disaster to invest in the wrong architecture. I am aware that amd hardware does not perform ideally with other software due to lack of AVX. Is this/will this be true with capturing reality ?
-GPU count - 3 is max and as with most things, you get diminishing returns.
-GPU speed/cuda performance - 1080ti/titan/quadro etc are the goto cards with ti's being best bang for buck. The new Tesla V100's are compute monsters with a cost to match. Soon* we should have the consumer volta titans and gaming cards available.
-GPU memory, is 12gb enough ? Reality capture complains I do not have enough video memory frequently - maybe this is a bug, as my monitoring software says only around 1gb of memory is used.
-RAM amount - Reality capture is fantastic that in theory it doesn't require massive amounts like competitors, however it does have it's limits. - what impact does maxing the ram, and requiring swap file usage on performance have ?
I have encountered out of memory in reality capture many times, is throwing more ram at the system the best solution?
-RAM Speed, 2666mhz or 4400mhz ?
-RAM Latency - ties into the above, some apps love faster speed or tighter timings. ? from my experience, optimising cache and memory performance for cpu/ram can double the speed of certain applications. has this been tested ? - there sure is a lot of data being passed about.
-HDD for cache/virtual memory. latency vs speed. I expect this is less important, however every bit will count to a extent. I assume when ram limitations are hit this becomes more valuable.
From all the above it's easy to choose the best, but you can't you'l have to sacrifice one area to get the max performance in another.
So the solution
Benchmark datasets - I searched the forum and found others have mentioned the availability of a benchmark and even stated they will be creating one, however this was a year+ ago and nothing came of it.
Unless a integrated benchmarking tool is to appear in the software very soon (would be best) I propose to do the following.
Have 2 different datasets available to run to reflect varying workloads. (I can make some, - we could utilise data provided capturing reality, or maybe someone can suggest something suitable)
a) light dataset - will be fast
b) Heavy dataset - will take longer, however may give more accurate results.
Users will then shift start the application, and hit start. Theoretically everyone should be on the same level.
Users will be required to upload the contents of the logs created to either the forum thread, or ideally a google form I create.
The easy part - RealityCapture.log this is basically a duplicate of the console window and logs the timestamps for the various stages that complete. it should be located here: c:\Users\USER\AppData\Local\Temp\
It pumps out the following as a example.
RealityCapture 1.0.2.3008 Demo RC (c) Capturing Reality s.r.o.
Using 8 CPU cores
Added 83 images
Feature detection completed in 11 seconds
Finalizing 1 component
Reconstruction completed in 31.237 seconds
Processing part 1 / 5. Estimated 1225441 vertices
Processing part 3 / 5. Estimated 38117 vertices
Processing part 4 / 5. Estimated 926526 vertices
Processing part 5 / 5. Estimated 538277 vertices
Reconstruction in Normal Detail completed in 232.061 seconds
Coloring completed in 30.105 seconds
Coloring completed in 0.116 seconds
Coloring completed in 30.363 seconds
Creating Virtual Reality completed in 294.092 seconds
The trickier part- system analysis. There is a nice little freeware tool called hardwareinfo, that does not require installation. and can spit out a nice little text report as below, It contains no sensitive info. These two logs combined I believe will contain all the required information needed for us to compile a nice comparative dataset. When I say we I mean me, I'll have to parse the data onto a google spreadsheet which will do the calculations and we can all see the results.
CPU: Intel Core i7-6700K (Skylake-S, R0)
4000 MHz (40.00x100.0) @ 4498 MHz (45.00x100.0)
Motherboard: ASUS MAXIMUS VIII HERO
Chipset: Intel Z170 (Skylake PCH-H)
Memory: 32768 MBytes @ 1599 MHz, 16-18-18-36
Graphics: NVIDIA GeForce GTX 1080 Ti, 11264 MB GDDR5X SDRAM
Drive: Samsung SSD 850 EVO 500GB, 488.4 GB, Serial ATA 6Gb/s @ 6Gb/s
Sound: Intel Skylake PCH-H - High Definition Audio Controller
Sound: NVIDIA GP102 - High Definition Audio Controller
Network: Intel Ethernet Connection I219-V
OS: Microsoft Windows 10 Professional (x64) Build 15063.674 (RS2)
I'll need your help :)
A) input from my wall of text above.
B) Suggestions on the proposed benchmark & setup.
C) To run the benchmark and post the results.
If you've read through all that and think - "yeah, I'd spend 15 min running the test files and report back". Please say
If you've read part of it and fell asleep thinking , "aint nobody got time for that" - Please say :D.
What we get out of out all this?
Eventually when/if enough people with varying hardware post the results. We can determine what areas to spend our precious money on to improve the areas of capturing reality that we are bottlenecked in. Which components and configurations - help with say reconstruction or texturing the most, and what hardware is just ineffective.
What say you ? Do you think this is a worthwhile task and should I proceed ?
-
Ivan,
looking at those logs "Reconstruction Time 48 seconds"
do you know if there is a way that can split that time into the depth map part and model generation parts.
might not be possible.
but we won't get separate gpu and cpu score. just a mixed gpu+cpu score which is less interesting for me.
-
Indeed, learning from the log how a particular gpu, cpu, etc. influences performance is the point of the exercise. Maybe, that PhD in imaging is precisely what's needed here ;^( someone write a macro to make a snip in time lapse at intervals of Resource Monitor and a gpu monitoring utility like TechPowerUp GPU-Z, then a way to glean values from the changing rasters... This appears to call for heavier lifting than off-the-shelf tools support. Definitely over my pay grade to properly conceive. We need a hero.
-
Hmmm, that gives me a thought. Whatever happens within these cpu/gpu monitoring apps that ends up getting rasterized to a set of graphs, what's actually needed is access to the discreet samples driving those functions, add to that a routine for averaging and comparing values. We're not the first to talk about the relative worth of benchmarking with off-the-shelf tools, the functions running within specific apps calling for more granular insights. If someone approached one of these developers behind something like GPU-Z, TechPowerUp, and pitched them on the concept we're after, but this with a broader value to them of vastly improving the utility of their app when pointing it to track a user-specified app, RC in this case, and generate this awesomely informative log, believe there's actually a strong selling point here, especially for manufacturers vying for sales, to take benchmarking to the next level. I personally don't mind taking a stab at making contact and opening a dialog. Maybe, that's all I do, handing that dialogue off to Ivan, Michal, whomever, to present an orderly list of specs we'd want this app to perform to. TechPowerUp is just an example, if it's the right idea, we should generate a prioritized hit list. Sounds a bit ambitious establishing momentum, but I'm game putting in a little time toward this end to fly it up the flagpole.
-
I have already contacted the dev's regarding this. It's the weekend - so let's be patient :) - I'm sure it will be possible.
The application is aware of this stage, so theoretically it shouldn't be a issue. I have put a placeholder in the code for it. Or I can extrapolate the data via more tricky scripting which isn't so elegant.
I have tried quite a few ungraceful things, some using 3rd party apps including gpuz (which is free to distribute uncommercially, modifications cost $), I think the way forward is to avoid any external applications, to avoid legal issues, the complexity of working with the different way each additional app handles data, and the issue of keeping compatibility cross versions.I also think it would be improper to contact 3rd parties even if the intentions are good :)
The capturing reality team likely have there own agenda, and I do not believe it would be professional of us to overstep/act on their behalf.
All this could be done in app, however development time is likely better spent on other new features etc. It maybe on the roadmap somewhere down the line..
Everything is currently achievable via the app and some crafty scripting, even gpu monitoring. -
That does make much better sense keeping this within the app. Since with every export of an asset RC alerts user of stats being sent their way, couldn't this performance data be collected alongside system specs, the comparison among all users leveraged to provide the benchmarks? Right, development time better spent on new features.
-
Hey Benjy,
awesome that you offered to share your salt mine stuff!
I have seen the staff room and I agree this would be an excellent choice.
It's rare, quirky and an interior, which are rarely seen and poeple often seem to have difficulties with them. So that could contribute to showing that it IS possible.
I'm somewhere in between ivan and Benjy on the approach, but since ivan is writing everything, the call is entirely up to him. The only issue might be, as chris pointed out, that the CPU and GPU parts should be separate to maximize the benefit.
Anyway, thank you again, ivan, for doing all this!
-
Good news
The benchmark now records the time taken for the following stages1) Alignment
2) Depth map reconstruction stage (GPU assisted)
3) Model reconstruction (Cpu)
4) Simplify
5) TexturingIt was a case of me misunderstanding the effect of one of the CLI switches. Michal kindly pointed me in the right direction.
-
Hey ivan,
excellent! I think now everything is covered, right?
So now we "just" need a suitable image set... ;-)
I'm just trying if I can get an interior of a small gothic choir covered in wall paintings to align with <500 images. Coverage would not be perfect (e.g. behind the altar) but certainly a nice impression.
-
One thing just came to mind. I guess we would need to rely on the automatic reconstruction region, right? Since RC tends to vary the orientation quite a bit, that would not be identical in most cases, so would skew the results. To avoid this, wen should add some GCPs to the scene. I have no idea if the automatic box is then always the same or if it also varies. If it does, we need to import a custom reconstruction region.
-
Good point Gotz, I also considered this however from my tests the region was always the same - maybe it isn't or it's slightly different.
I expect there will always be slight variances between run to run, as with all benchmarks. The best way to get accurate results is to run something many times then take the average. I don't think that level of accuracy is needed, unless the region does indeed cause issues.
There is no project file, except the one generated by the benchmark at the very end. I could however set a fixed region via the code if needs be, so that is always constrained.
However as it stands, it code is totally dataset indifferent, which I think is best. As that would enable us and end users to only swap the contents of the images folder if they wish to benchmark a different project. Makes things best in the long run.
I *may* be able to have different options. That would be Alpha 0.2 not passed 0.1 yet :P -
Technically you should be able to generate a custom .rcproj file, though I don't recall if you can specify the reconstruction region or GCPs in there. Those might be saved as separate data files by RC.
Also you need to keep in mind that the demo version of RC likely uses a slightly different format for the project files.
-
Shadow - you are correct I could create a custom rcproj file for the project, global settings (and backup the existing ones), also have a specific file just for the region. Avoiding GCP's is always best as from my experience this requirement for them just means there is something off with your images. These things can be *easily* added/adjusted as we proceed.
What are peoples thoughts on a ID for the benchmark. - no identifiable system data is scraped.
However some kind of identifier will be required as the results list will eventually get long, and some kind of identifier will be required to identify your results. I can extract the username from capturing reality, And maybe just have the first name plus the first letter of the last name.
So for instance Ivan Humperdinck results in Ivan H(not my last name :) )
-
Oh, I would have thought to creat an rcproj file with the images and GCPs already included. All people would need to do is to adjust the image path.
If the file format is different with the demo, then there might be a problem. I guess that Michal would have pointed that out to ivan. There are many users out there without a CLI version...
-
Hey ivan,
it seems like you want to preserve your anonymity! :-)
I think it might be better to use the HW as identifier for exactly that reason. Maybe with the first letter of the name or so. But in my case the first name would be almost a 100% give-away...
The name is ok for internal purposes while testing but later I think it should be as neutral as possible when it goes public.
GCPs are not there to patch up a model but to geo-reference (scale) it. It's an entirely standard procedure and it would make sure that the model is identical for everyone who runs the benchmark...
-
Your 100% correct GCP's are for what you suggest, I was getting mixed up with CP's. With all the scripting done over the last week.. I don't even see the code. All I see is blonde, brunette, redhead.
Re: a identifier I can ask it to prompt the user for whatever they wish to enter at the start, so that maybe better.
-
Random is not what I wanted.
Say you upload your results, then at a later date you upload more from a different system or change hardware, and wish to compare them, being able to locate those results in the database by scrolling down to 'F' which will show all results by "FluffyBunny" is the idea.Not finalised the way selection will be working yet, however I did want to have a identifier that people could see so they could compare their own results with others if wanted.
-
It's a great project as is, but maybe I misunderstood the scope. I thought there'd be some kind of analytics applied to the database to make the comparisons for us. If we manually study others system specs against our own and work to correlate performance values for a given operation to which hardware represents the winning horse, I'd think that could be a non-trivial exercise to tease out from the confluence of so many factors. No? I may be totally off base here, overthinking it, maybe it's more straightforward an exercise.
-
The idea is that the data you upload, will be displayed to you (and then others), and you can choose to compare that dataset either against a previous run that you did, or other variables.
I expect different people will wish to analyse the data in different ways.For me personally - and maybe selfishly, I wanted to compare initially against my own results, so I can make adjustments to my system and see how they affect each stage. - These will be presented to everyone, for better or worse.
Seeing how things compare on other systems will be great too.Data analysis can be a complex matter in itself, yes we are talking another PHD :D. Presenting it in a manner that is ideal for everyone won't likely be possible, however we can work at some pretty graphs etc. The idea isn't to make it a race to show who is at the top/who has the fastest system- ultimately that data will be very valuable to see the system spec of how/why that was achieved.
However it may not be the case that 1 system is the fastest at all stages, so I'll let the user choose if they want the data ordered by date/user/fastest alignment/fastest depth map/fastest model creation/fastest texturing/gfx card/cpu/ etc.. to which they can make the comparison. - I suppose a few default calculations could be done to say "Your computer sucks, it is 22448% slower than the fastest shown here." etc :D ... Then you can try and see why. It will probably make me full of buyer's remorse. :)
-
Ok - Time for some testing. Welcome to The Reality Capture Benchmark Alpha 0.4
This is not close to the final, however I need feedback on how/if the scripting works on others systems.
The online database is work in progress, so I have omitted any code regarding that.1) Things you need to do, unzip to your desktop. It will currently only run from there.
2) Choose your images and place them inside the Desktop/CapturingRealityBenchmark/Images folder
For now I'd suggest a smaller collection that you know work. - None are currently included.3) Run the benchmark.bat file
You will be asked to enter a identifier/nickname at the start.
4) Sit back and relax
5) Once the benchmark has run it's course you will be given the option to enter any additional notes.6) The results will be generated into a file called results.txt It should look similar to this.
Don't worry that the times are not labeled etc, that is all being dealt with when the data is parsed at the database.
If your txt file looks different to this, please share. - especially if you have multi gpu or HDD's.Current Known Issues/Potential Issues
1) If dataset is too small or computer is too fast completing a section <15s It may not record the time stamp for that section, fix - increase amount of photos.
2) I Cannot Identify if more than 1 GPU is present (requires Cuda toolkit) or we must wait until my workstation arrives so I can test multi GPU.
2.5) Same goes for Multi HDD's.
3) I run windows 10, I am unsure if all the commands/scripts will work on earlier versions/VM's/Servers
4) The code is english as are the commands, I do not know if they work with other local's
5) Will likely only run with Demo & Full/CLI versions of the application. So if you have the Promo, please try installing the demo.
6) The script assumes you have installed the application in the default directory
7) Admin privileges maybe required.
8) Be wary of running software from unknown sources from the internet. Both *.Bat files are in plain text. You are free to inspect the code in notepad to ensure no shenanigans. You can also check with www.virustotal.com
9) The project will delete your cache and may change some application settings away from default. - Fear not a backup of your settings are saved first. as "GlbBackup.bak.rcconfig"
10) You looked at the code, and question why it's such a mess, why I did it that way and why it took me so long, me too. - I'm no expert.
11) If you have made a suggestion and I ignored or refuted it, sorry. - If you think it is important, try a different way to convince me, I may not have understood. This project is for the benefit for us all, my opinion is just one of many. Everyones input and suggestions are valued :)
Please if you have time, give it a go and report back, if it does not work please explain what happened with as much info as you can if it does not behave as expected.
Thankyou :)
Edit:
Changelog - minor edit to code, to allow CapturingRealityBenchmark folder to be located anywhere, and not restricted to the desktop. I found that testing on a mac/parrallels vm that virtual paths/directories did not work so well, so ideally it should be located in a real location. -
Hi Benjamin
Unfortunately for the moment, the version posted will not work with the promo due to its restrictions with CLI instructions, - one of the caveats of the more accessible price point.
In my work in progress code, it does check which license type the user has, longer term having a less detailed bench running on the promo version maybe possible.Compromises are always a issue, and getting the most detailed & accurate data, took a higher priority.
I am currently unable to test the effect of installing the demo when the promo is installed.
Please sign in to leave a comment.
Comments
101 comments