Tuesday, October 29, 2013

Global Variables in Shared Objects

I believe, even people are smart enough, capable enough to do more challenging tasks, they wont get opportunities. 

Recently, Because os some recent analysis of some issues. I came across one issue which made me to understand one of the good concepts of Shared Libraries in real world scenario. Even it is small but should know and understand.

One of the middleware processes and another application process shared the same library though which both can access one of the Shared Memory database. Timer was running in the middleware process to update one global counter variable in the library. It was updating and printing the values properly. 

Due to one of the issues reported by the client, we found that that counter variable needs to be reset upon one of the request from GUI. Recipient for the GUI request is Application. Hence, as application also shared the same library, we thought to reset the counter from application.  But when we did the same and print the value, counter was resetting properly. Next time, when timer started from middleware and prints the value, then it was continued from the previous value of the counter variable. This mean, resetting was not successful but we were getting proper value at application. 



After we found that, as this is shared library between application process and middleware process, global variables of shared objects will have different copy for each process. That is the reason behind using shared memory database in our project. Even SMDB is used, library to access the SMDB component is just shared object, ideally, which should not have global variables. In other words, global data will run in process address space I believe and hence each process will have its own copy. 

After few Googling, found following points are important.

1. If a shared library uses a global variable that is not exported, each process that uses that library get a copy of that variable -  Data is specific to processes, not libraries.

2. Only CODE is shared, not data (well, constant data might be shared) 

If any of you have more information or good links related to this, drop here as your comments.

Thanks in advance...

Sunday, July 14, 2013

Responses were getting delayed and taking more than 50ms....!

One of the critical issues reported after the Vehicle production started is there was a delay in response to the Diagnostics requests.

As per the requirement or as per the standard, for any diagnostics requests, response should be within 50ms. Hence at the end of assembly in production, using scripts, tested will be performed, called EOL tests (End of Line) and the result was NG (No Go or Not Good). Hence because of the failure in EOL tests, these issues became critical.

When we were analyzing the issues, we also took the same scripts were used for EOL tests where that script was sending DIAG requests continuously for every 60 ms. Hence our software module was expected to respond to the script requests within 50ms. But when there was a delay, script was showing the error. Sometimes there was no response at all..!

We were not able to reproduce the issue with the GGDS tester simulation which came with CANoe. Hence we were depending on the customer's script only. We spend more than 3-4 days of time to analyze the issue, to understand the pattern of the reproducibility. Finally, decided to use Oscilloscope to find out where and when the delay will be started. Using debug ports, configured the code and see the delay in the Transmission of CAN signal (Tx) in Oscilloscope, in CANoe itself. We started enabling all periodic messages one by one and found that one of the CAN signal handler is taking more than 50ms of time and delay is getting accumulated because which was a periodic message. Normally CAN signal handlers should not take more than 1-2ms of time because Vector task should get the CPU for every 2ms (schedule time). When we look into the code (signal handler callback) to understand which line of the code taking more time, found that there was a EEPROM Write operation was there which was taking more than 50 ms of time.

To fix this issue, later  we moved the EEPROM operation to ACC OFF handler.

This issue was not observed when there is not CAN traffic. Due to the periodic CAN messages in the netwrok, this issue was observed. This helped us to suspect one of the signal handlers might taking more time.

Due to all of the above, DIAG task was not getting the CPU to respond to the requests and delay was accumulating.

Ideally, CAN signals handlers should have minimum instructions to execute so that vector's tasks should not get delayed or blocked on anything. Schedule time of the CAN task will be 2-3 ms and which should get the CPU for every 2-3 ms. If we do any operation which takes more than the schedule of the CAN task, will cause issues.

This was one of the good analysis we made and can help us to root cause the issues where response delay is the major issue.

Friday, February 22, 2013

Experience on ‘void’ and ‘extern’

Fortunately, after very long time, I am able to have something new in my techblog. It is because of my stay @ onsite and change in the nature of work.

Since few months, may be from August 2012, I am working on brand new thing which I never worked before, never heard before. Really this is the advantage of Service Companies. In these days, I have come across many new things, even though they are small. But it was good to know. I will try to remember as many as possible.

Since I am working Vehicle Diagnostics, except Programming language, everything is new. CAN, CANoe, Renesas's High-Performance Embedded Workshop IDE (HEW), uITRON OS Platform, Flashing Tool, 16-bit co-processor HW platform etc..

When I was using HEW, I came across the wondering behavior of Renesas compiler. For absence of function prototype also, it was giving linker error. I was the first strange thing I faced with HEW/Renesas compiler.

'void' at declaration and call:

When writing one function, I declared as void for function argument. But in the same way, I copied the same prototype in the place of call.

void function (void);
Init()
{
funcrtion(void);
}

Then I got an ERROR message after the compilation. The error was

C2766 (E) parse error at near 'void'

I really took around 30 mins of time to root cause this error. As copied the prototype in place of call, 'void ' was placed as a parameter to that function. This was the reason the above error. When we declare any functions, 'void' will behave in usual way. But when we keep the same in call as a parameter, the meaning of 'void' data type, as we use only void pointers. Looks simple, but new for me.

'extern' without datatype:

As I am working with few KBs of RAM, we used many ‘extern’ to refer the variables shared between files of same module. When I copied the data from extern variable to local array, I could see different data. Say array a[4] = {1,2,3,4} and dest[4] is {1,3}. Once again I started thinking about junk platform or junk IDE kind of thoughts.

/* In one of the source file */
unsigned char a[4];


/* In another source file */
extern a[];

b[0] = a[0];
b[1] = a[1];
b[2] = a[2];
b[3] = a[3];

After the last line of execution, a was {1,2,3,4} and b is {1,3}. After some analysis, I understood the behavior.  I changed the above code as below.

/* In one of the source file */
unsigned char a[4];

/* In another source file */
extern unsigned char a[];

Here, there was no data type associated with the variable, it was  (array 'a') treated as 16-bit data as MCU was 16-bit and hence 0th byte and 2nd byte only copied from the source array.

Same kind of thing happened to me in different way.
There was a 2D array of around 20 indexes and each of 25 bytes, in other module.

const char twodarray[21][25];

I referred this 2D array as extern.

extern const char twodarray[][25];

I was copying the data in my module's init function to my local array like

U8 localArray[25];
strcpy( localArray, twodarray[3] );

It was working fine for few days. But after some time, when we tested, only 4 bytes of data being copied to the destination array. I was wondering and took few hours of time to root cause.

The main definition of the array got changed to

const char twodarray[21][30];

As it was externed as 25 bytes of array, for the strcpy function, array was behaving like twodarray[21][25] only, even though it’s original declaration having 30. When I changed to

extern const char twodarray[][30], it worked fine.

The reasons for this, i think, irrespective of the dimension of the array, all are contiguous memory. Due to this, expected chunk of data was not referenced.

Really I was happy when I could root cause these kind of issues.