.NET Developer Days 2017: The Performance Investigator's Field Guide – by Sasha Goldshtein

by Oliver 26. October 2017 10:00

Although the choice of the best session was not easy, it has to be awarded to Sasha Goldstein with his session on performance detective work. His session was very well prepared, had a clear goal and path, was packed with insights into challenges and problems during performance investigation work, and offered hand-crafted solutions like his own realtime ETW event tracing tool etrace.

What follows is an excerpt of Sasha Goldshtein's presentation.

Structure of a Performance Investigation

  1. Obtain the problem description
  2. Build a system diagram
  3. Run a quick performance checklist
  4. Understand which component is exhibiting the problem
  5. Investigate thoroughly
  6. Find the root cause
  7. Resolve the issue
  8. Verify resolution
  9. Conduct and document post-mortem

Performance Metrics, Goals, Monitoring

  • Performance metrics don’t live in a vacuum!
  • Derive performance metrics from business goals
  • Monitor these metrics in your APM solution, home-made dashboard, or collection script, and get alerts

Investigation Anti-Methods

  1. Make assumptions
  2. Trust “instincts” and irrational beliefs
  3. Look under the street light
  4. Use random tools
  5. Blame the tools

The USE Method

USE: Utilization, Saturation, Errors

  1. Build a functional diagram of the system, including hardware/software resources
  2. For each resource, identify utilization, saturation, and errors
  3. Understand, resolve, and verify errors, excessive saturation/utilization, under-utilization

Statstics Lie Be Careful With Statistics

  • Averages are meaningless
  • Medians are almost meaningless
  • Percentiles are OK if you know what you’re doing
  • Find good visualizations for your performance data
  • Beware coordinated omission

Look at histograms or sometimes even percentile plots aka cumulative distribution charts to really understand your data, e.g. your performance traces. Just look at this dinosaur to understand that very differently shaped data can lead to the same statistics values:

dino-statistics_thumb[1]

Conduct a Postmortem – Do It!

  1. Document the steps taken to identify, diagnose, resolve, and verify the problem
  2. Which tools did you use? Can they be improved?
  3. Where were the bottlenecks in your investigation?
  4. Can you add monitoring for sysadmins/ops?
  5. Can you add instrumentation for investigators?
  6. How do we triage this problem automatically next time it happens?

Resources

And now, happy performance hunting!

Pingbacks and trackbacks (1)+

Comments are closed

About Oliver

shades-of-orange.com code blog logo I build web applications using ASP.NET and have a passion for javascript. Enjoy MVC 4 and Orchard CMS, and I do TDD whenever I can. I like clean code. Love to spend time with my wife and our children. My profile on Stack Exchange, a network of free, community-driven Q&A sites

About Anton

shades-of-orange.com code blog logo I'm a software developer at teamaton. I code in C# and work with MVC, Orchard, SpecFlow, Coypu and NHibernate. I enjoy beach volleyball, board games and Coke.