Warning: WP_Syntax::substituteToken(): Argument #1 ($match) must be passed by reference, value given in /membri/maips21/wp-content/plugins/wp-syntax/wp-syntax.php on line 380
Warning: WP_Syntax::substituteToken(): Argument #1 ($match) must be passed by reference, value given in /membri/maips21/wp-content/plugins/wp-syntax/wp-syntax.php on line 380
Warning: WP_Syntax::substituteToken(): Argument #1 ($match) must be passed by reference, value given in /membri/maips21/wp-content/plugins/wp-syntax/wp-syntax.php on line 380
Warning: WP_Syntax::substituteToken(): Argument #1 ($match) must be passed by reference, value given in /membri/maips21/wp-content/plugins/wp-syntax/wp-syntax.php on line 380
This post is related to a use-case I had in a client environment.
The production webapp, running Magnolia EE 3.6.8 over Apache Tomcat, Apache Httpd, MySql was permanently over high load, cpu fixed at 100% even when the internet connection is pull-out from live servers.
The client has another test environment, but unfortunately it is not full compliant to the production one.
The main goal has been to replicate the critical situation over test environment, in order to try to fix it discovering and testing freely, without disturbing the live site.
So, these are the followed steps:
- Trace apache accesses
- Reproduce the same requests to the test environment
- Fix the error (if found) and retry the test
- Based on log file produced at point 2, make some statistics to summarize the work
Trace apache accesses
This is easy. Apache httpd by default store access log. Simply be sure to have this feature enabled and wait for the error happens. You can find more info here: [apache-access-log].
Log are in the form of this snippet:
82.72.135.51 - - [14/Apr/2010:04:05:35 +0200] "GET /home/comunicazione/impressum.html HTTP/1.0" 200 4906 "" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.1)" [9580] 82.72.135.51 - - [14/Apr/2010:04:04:24 +0200] "GET /home/channels/com/about/teleone.html HTTP/1.0" 200 5469 "" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.1)" [10401] 82.72.135.51 - - [14/Apr/2010:04:03:41 +0200] "GET /home/net/02-rightBox/06/smart/0/image/showcase.jpg HTTP/1.0" 503 402 "" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; GTB6; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)" [11725] 82.72.135.51 - - [14/Apr/2010:04:05:39 +0200] "GET /docroot/pria-layout/css/templates/t11-content-page.css HTTP/1.0" 200 305 "" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; GTB5; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; InfoPath.2; .NET CLR 3.5.30729; .NET CLR 3.0.30618)" [20709] 82.72.135.51 - - [14/Apr/2010:04:04:01 +0200] "GET /home/net/rete/radio.html?po=4f6679a4-4171-4c01-8727-5cf4708eaaf1&date=25.12.2009 HTTP/1.0" 200 5502 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; http://help.yahoo.com/help/us/ysearch/slurp)" [8945] |
Reproduce accesses over test instance
In order to be able to reproduce these calls to test instance, first has been produced another file, containing only the URI for each request.
The code used is a simple UNIX command:
cat apache-access.log | awk '{print $7}' >> localhost-resources.txt
where ‘7’ is the “column” of the access log you want to take from original apache-access.log to localhost-resources.txt
This produced a file like this:
... /resource/uri/page.html /resource/uri/stylesheet.css /resource/uri/page2.html /resource/uri/page3.html ... |
Now another shell command will launch curl to ask the webapp (a tomcat webapp running on localhost) the same requests.
A third file is produced, in order to make some statistics over it.
for i in $(cat $1); do curl -w "%{http_code}|%{time_total}|%{size_download}|%{speed_download}|$i\n" -s -o /dev/null http://localhost:8080$i ;done |
The log file is something like
... 200|0.001|61|47843.000|/resource/uri/page.html 200|0.002|314|208915.000|/resource/uri/stylesheet.css 200|0.394|29049|73802.000|/resource/uri/page2.html 404|0.003|27260|9422744.000|/resource/uri/page3.html ... |
On another machine we call the top unix command, in order to keep performance monitorized.
Now, it is easy to filter over requests that ends with 500 / 503… codes (or 404), or maybe filter requests fulfilled in time more than x seconds, to discover critical pages / actions.
Fix error and retry test
The error was easy. A RSS feed generated a loop in java code. After few requests of the same RSS java reached 100% of processors, without serving pages. This brought client servers down. After fixing the code that generates the xml feed (too heavy JCR queries!) the test has been relaunched, without any other problems.
Statistics to summarize the work
Now is the time to report some number.
(text for cut & paste)
- Requests: 100.447: generated requests
- 77127 (77%) are 200 (resource load as expected)
- 22496 (22%) are 404 (resource not found, but it is normal, on test environment we don’t have some extra website)
- 824 (1%) are 500, 302 or 400 code. The 500 code has been investigated and we found another little bug in a redirect code.
- Timing: 11987 seconds (around 3 hours and a little..)
- 98765 (98%) responses below 1 second
- 1682 (2%) responses between 1 second and 5 seconds
- 0 responses over 5 seconds
More in detail on this procedure
The method used here is really simple and full of constraints, but keep the team to reach an analitic solution of the puzzle.
I’m strongly sure that starting from apache logs is a good idea, because it represents a real usage request pool. The test can be optimized collecting only requests that are temporally around to the 100% threashold of the servers; otherwise is it possibile to try to use only requests that generates error http code (500, 503…)
I strongly encourage to use some kind of standard stress tool, because the curl method is really simple and can not simulate concurrent calls, calls can come from different ip, at different frequency, and so on.
A good idea can be to log at the same time the process / total cpu and memory usage, in order to correlate requests to server efforts
Be First to Comment