c++ - Troubleshooting auto vectorize reason '1200' -


msvc 2013 ultimate w/ update 4

not understanding why getting error on seemingly simple example

info c5002: loop not vectorized due reason '1200'

which is

1200 loop contains loop-carried data dependences

i don't see how iterations of loop interfere each other.

__declspec( align( 16 ) ) class physicssystem { public:     static const int32_t maxentities = 65535;      __declspec( align( 16 ) ) struct vectorizedxyz     {         double      mx[ maxentities ];         double      my[ maxentities ];         double      mz[ maxentities ];          vectorizedxyz()         {             memset( mx, 0, sizeof( mx ) );             memset( my, 0, sizeof( ) );             memset( mz, 0, sizeof( mz ) );         }     };      void update( double dt )     {         ( int32_t = 0; < maxentities; ++i ) <== 1200         {             mtmp.mx[ ] = mpos.mx[ ] + mvel.mx[ ] * dt;             mtmp.my[ ] = mpos.my[ ] + mvel.my[ ] * dt;             mtmp.mz[ ] = mpos.mz[ ] + mvel.mz[ ] * dt;         }     }  private:         vectorizedxyz   mtmp;     vectorizedxyz   mpos;     vectorizedxyz   mvel; }; 

edit: judging http://blogs.msdn.com/b/nativeconcurrency/archive/2012/05/08/auto-vectorizer-in-visual-studio-11-rules-for-loop-body.aspx seem example of "example 1 – embarrassingly parallel", acts thinks arrays unsafe aliasing, puzzling me.

edit2: nice if share reasons why auto vectorization fails on such seemingly simple example, after tinkering time, opted instead take reigns myself

void physicssystem::update( real dt ) {     const __m128d mdt = { dt, dt };      // advance 2 since can 2 @ time @ double precision in __m128d     ( size_t = 0; < maxentities; += 2 )     {         __m128d posx = _mm_load_pd( &mpos.mx[ ] );         __m128d posy = _mm_load_pd( &mpos.my[ ] );         __m128d posz = _mm_load_pd( &mpos.mz[ ] );          __m128d velx = _mm_load_pd( &mvel.mx[ ] );         __m128d vely = _mm_load_pd( &mvel.my[ ] );         __m128d velz = _mm_load_pd( &mvel.mz[ ] );          __m128d velframex = _mm_mul_pd( velx, mdt );         __m128d velframey = _mm_mul_pd( vely, mdt );         __m128d velframez = _mm_mul_pd( velz, mdt );          _mm_store_pd( &mpos.mx[ ], _mm_add_pd( posx, velframex ) );         _mm_store_pd( &mpos.my[ ], _mm_add_pd( posx, velframey ) );         _mm_store_pd( &mpos.mz[ ], _mm_add_pd( posx, velframez ) );     } } 

not sure if compiler supports it, enforcing proper vectorisation, can portably that:

void physicssystem::update( double dt ) {     double *tx=mtmp.mx, *ty=mtmp.my, *tz=mtmp.mz;     double *px=mpos.mx, *py=mpos.my, *pz=mpos.mz;     double *vx=mvel.mx, *vy=mvel.my, *vz=mvel.mz;     #pragma omp simd aligned( tx, ty, tz, px, py, pz, vx, vy, vz )     ( int = 0; < maxentities; ++i ) {         tx[ ] = px[ ] + vx[ ] * dt;         ty[ ] = py[ ] + vy[ ] * dt;         tz[ ] = pz[ ] + vz[ ] * dt;     } } 

you need enable openmp support directive taken account.


Comments

Popular posts from this blog

resizing Telegram inline keyboard -

command line - How can a Python program background itself? -

php - "cURL error 28: Resolving timed out" on Wordpress on Azure App Service on Linux -