Topic: How to improve performance on STM32F4

Hello I'm working on a STM32F437 with crypto hardware enabled.
My problem is that the exchange of packets is too long. If I measure the time with wireshark it takes 220ms for one frame, between [SYN] and [FIN, ACK].

Code is build with optimisation, task webserver is at high priority.

First test was with these options:
key 2048

#ifdef CYASSL_STM32F2
#define SIZEOF_LONG_LONG 8
#define NO_DEV_RANDOM
#define NO_CYASSL_DIR
#define NO_RABBIT
#define STM32F2_RNG
#define USER_TIME
#define STM32F2_HASH
#define STM32F2_CRYPTO
//#define USE_FAST_MATH
//#define TFM_TIMING_RESISTANT
//#define TFM_ARM
#endif

Second test with these options:
key 1024

#ifdef CYASSL_STM32F2
#define SIZEOF_LONG_LONG 8
#define NO_DEV_RANDOM
#define NO_CYASSL_DIR
#define NO_RABBIT
#define STM32F2_RNG
#define USER_TIME
#define STM32F2_HASH
#define STM32F2_CRYPTO
#define USE_FAST_MATH
#define TFM_TIMING_RESISTANT
#define TFM_ARM
#endif

There is no difference.

As you can see in appendix file the latency time is between packet 5 and  14.
How can I improve the speed? Because it will be used on local network in this case security is less important than speed.

Thank you for your help.
Pierre

Post's attachments

Capture.GIF
Capture.GIF 92.25 kb, file has never been downloaded. 

You don't have the permssions to download the attachments of this post.

Share

Re: How to improve performance on STM32F4

Now I use CyaSSL_set_session/CyaSSL_get_session for intercard communication and it improves the timing.
I use it on client side.

This is my implementation and I don't know if it is correct but it works until now:

...
                       xCyaSSL_Object = CyaSSL_new(ctx);

                        if (xCyaSSL_Object == NULL)
                            assert_param(0);

                        if (session)
                            err=CyaSSL_set_session(xCyaSSL_Object, session);

                        if (CyaSSL_set_fd (xCyaSSL_Object, sock) != SSL_SUCCESS )
                            assert_param(0);

                        if (len < sizeof(trame))
                            bw=CyaSSL_write(xCyaSSL_Object, (unsigned char*)trame, len);

                        session = CyaSSL_get_session(xCyaSSL_Object);

                        CyaSSL_free(xCyaSSL_Object);
...

Share

Re: How to improve performance on STM32F4

Hi pcu,

Yes, using session resumption will improve the performance of consecutive connections from the same client.

Have you run the CTaoCrypt benchmarks in your environment?  Looking at CyaSSL performance, this is usually one of the best metrics to compare.  You can find the CTaoCrypt benchmarks in the <cyassl_root>/ctaocrypt/benchmark/benchmark.c file.

With the benchmark.c file, if you will be using your own main() or driver function, you can define NO_MAIN_DRIVER.  You can also define BENCH_EMBEDDED to reduce the benchmark data sizes, making it more reasonable to use on an embedded device.

For the public key benchmarks, if you don't have a file system available, you can define either USE_CERT_BUFFERS_1024 or USE_CERT_BUFFERS_2048 which will use the certificate buffers located in the <cyassl/certs_test.h> header instead of files.

Best Regards,
Chris

Re: How to improve performance on STM32F4

Also, one quick note.  At the moment, we don't recommend defining STM32_HASH when using wolfSSL for SSL/TLS.  Because of restrictions of the Standard Peripheral library and how its hash API is designed, the wolfSSL STM32 hardware hash integration will only correctly work on a single context at a time.  An SSL connection uses several hash contexts simultaneously.

Best Regards,
Chris

Re: How to improve performance on STM32F4

chrisc wrote:

Also, one quick note.  At the moment, we don't recommend defining STM32_HASH when using CyaSSL for SSL/TLS.  Because of restrictions of the Standard Peripheral library and how its hash API is designed, the CyaSSL STM32 hardware hash integration will only correctly work on a single context at a time.  An SSL connection uses several hash contexts simultaneously.

Best Regards,
Chris


Hello Chris,
Thank you for your response.

Is it the same for STM32F2_CRYPTO?
Can I simply implement a mutex for Hash access in std lib or it is more complicated?

For the inter-card communication I tried PSK with success. I will check with benchmark.c.

Kind regards,
Pierre

Share

Re: How to improve performance on STM32F4

Hi pcu,

STM32F2_CRYPTO doesn't have the same problem with multiple block cipher contexts since the functionality of each call is stored into the respective wolfCrypt structure (ie: Aes, Des, etc.).

I don't think a mutex would solve the issue, since the limitation isn't necessarily just with access to the hardware hash module, but more with the internal state of the hash module.

Best Regards,
Chris