c++ - Troubleshooting auto vectorize reason '1200' -
msvc 2013 ultimate w/ update 4
not understanding why getting error on seemingly simple example
info c5002: loop not vectorized due reason '1200'
which is
1200 loop contains loop-carried data dependences
i don't see how iterations of loop interfere each other.
__declspec( align( 16 ) ) class physicssystem { public: static const int32_t maxentities = 65535; __declspec( align( 16 ) ) struct vectorizedxyz { double mx[ maxentities ]; double my[ maxentities ]; double mz[ maxentities ]; vectorizedxyz() { memset( mx, 0, sizeof( mx ) ); memset( my, 0, sizeof( ) ); memset( mz, 0, sizeof( mz ) ); } }; void update( double dt ) { ( int32_t = 0; < maxentities; ++i ) <== 1200 { mtmp.mx[ ] = mpos.mx[ ] + mvel.mx[ ] * dt; mtmp.my[ ] = mpos.my[ ] + mvel.my[ ] * dt; mtmp.mz[ ] = mpos.mz[ ] + mvel.mz[ ] * dt; } } private: vectorizedxyz mtmp; vectorizedxyz mpos; vectorizedxyz mvel; };
edit: judging http://blogs.msdn.com/b/nativeconcurrency/archive/2012/05/08/auto-vectorizer-in-visual-studio-11-rules-for-loop-body.aspx seem example of "example 1 – embarrassingly parallel", acts thinks arrays unsafe aliasing, puzzling me.
edit2: nice if share reasons why auto vectorization fails on such seemingly simple example, after tinkering time, opted instead take reigns myself
void physicssystem::update( real dt ) { const __m128d mdt = { dt, dt }; // advance 2 since can 2 @ time @ double precision in __m128d ( size_t = 0; < maxentities; += 2 ) { __m128d posx = _mm_load_pd( &mpos.mx[ ] ); __m128d posy = _mm_load_pd( &mpos.my[ ] ); __m128d posz = _mm_load_pd( &mpos.mz[ ] ); __m128d velx = _mm_load_pd( &mvel.mx[ ] ); __m128d vely = _mm_load_pd( &mvel.my[ ] ); __m128d velz = _mm_load_pd( &mvel.mz[ ] ); __m128d velframex = _mm_mul_pd( velx, mdt ); __m128d velframey = _mm_mul_pd( vely, mdt ); __m128d velframez = _mm_mul_pd( velz, mdt ); _mm_store_pd( &mpos.mx[ ], _mm_add_pd( posx, velframex ) ); _mm_store_pd( &mpos.my[ ], _mm_add_pd( posx, velframey ) ); _mm_store_pd( &mpos.mz[ ], _mm_add_pd( posx, velframez ) ); } }
not sure if compiler supports it, enforcing proper vectorisation, can portably that:
void physicssystem::update( double dt ) { double *tx=mtmp.mx, *ty=mtmp.my, *tz=mtmp.mz; double *px=mpos.mx, *py=mpos.my, *pz=mpos.mz; double *vx=mvel.mx, *vy=mvel.my, *vz=mvel.mz; #pragma omp simd aligned( tx, ty, tz, px, py, pz, vx, vy, vz ) ( int = 0; < maxentities; ++i ) { tx[ ] = px[ ] + vx[ ] * dt; ty[ ] = py[ ] + vy[ ] * dt; tz[ ] = pz[ ] + vz[ ] * dt; } }
you need enable openmp support directive taken account.
Comments
Post a Comment