Wednesday, 18 May 2022

 Loop the values of a correlated variable


A ForEach controller loops through the values of a set of related variables. When you add samplers (or controllers) to a ForEach controller, every sample (or controller) is executed one or more times, where during every loop the variable has a new value. The input should consist of several variables, each extended with an underscore and a number. Each such variable must have a value. So for example when the input variable has the name inputVar, the following variables should have been defined:

  • inputVar_1 = wendy
  • inputVar_2 = charles
  • inputVar_3 = peter
  • inputVar_4 = john

Note: the "_" separator is now optional.

When the return variable is given as "returnVar", the collection of samplers and controllers under the ForEach controller will be executed 4 consecutive times, with the return variable having the respective above values, which can then be used in the samplers.




In this example, we created a Test Plan that sends a particular HTTP Request only once and sends another HTTP Request to every link that can be found on the page.

Figure 7 - ForEach Controller Example
Figure 7 - ForEach Controller Example

We configured the Thread Group for a single thread and a loop count value of one. You can see that we added one HTTP Request to the Thread Group and another HTTP Request to the ForEach Controller.

After the first HTTP request, a regular expression extractor is added, which extracts all the html links out of the return page and puts them in the inputVar variable

In the ForEach loop, a HTTP sampler is added which requests all the links that were extracted from the first returned HTML page.


Friday, 28 January 2022

Performance Tuning: Garbage Collection

Spark runs on the Java Virtual Machine (JVM). Because Spark can store large amounts of data in memory, it has a major reliance on Java’s memory management and garbage collection. Therefore, garbage collection can be a major issue that can affect many Spark applications.


Common symptoms of excessive Garbage Collection in Spark are:
#Application speed.
#Executor heartbeat timeout.
#garbage collection overhead limit exceeded error.


The first step in Garbage Collection tuning is to collect statistics by choosing – verbose while submitting spark jobs.
In an ideal situation we try to keep GC overheads < 10% of heap memory.
The Spark execution engine and Spark storage can both store data off-heap. 
You can switch on off-heap storage using the following commands:
–conf spark.memory.offHeap.enabled = true
–conf spark.memory.offHeap.size = Xgb.
If using RDD-based applications, use data structures with fewer objects. For example, use an array instead of a list.
If you are dealing with primitive data types, consider using specialized data structures like Koloboke or fastutil. These structures optimize memory usage for primitive types.
Be careful when using off-heap storage as it does not impact on-heap memory size, i.e. it won’t shrink heap memory. So, to define an overall memory limit, assign a smaller heap size.
If you are using #sparksql , try to use the built-in functions as much as possible, instead of writing new UDFs. Mostly Spark UDFs can work on UnsafeRow and don’t need to convert to wrapper data types. This avoids creating garbage, also it plays well with code generation.
Remember we may be working with billions of rows. If we create even a small temporary object with 100-byte size for each row, it will create 1 billion * 100 bytes of garbage.

Monday, 10 January 2022

Performance Testing and Engineering Knowledge Repository

 

A complete solution for 

Performance Testing and Engineering Knowledge Repository

https://github.com/santhoshjsh/PTPEKR