Difference between revisions of "Main Page/DAT"

From Nekcem
Jump to navigationJump to search
 
(4 intermediate revisions by the same user not shown)
Line 11: Line 11:
 
   lp : the maximum number of cores
 
   lp : the maximum number of cores
  
 
+
use (E,lelt,lx1,lp), to represent size of prob
We'll have to use (E,lelt,lx1,lp), to represent size of prob, instead of c3d.rea.
 
  
 
  E=total element numbers, lelt=element # per core,
 
  E=total element numbers, lelt=element # per core,
 
  lx1= grid points in one direction, lp= # of cores.
 
  lx1= grid points in one direction, lp= # of cores.
  
I had many different rea with c3d_6 (E=136K), c3d_7(E=273K), etc..
+
There are many different rea with c3d_6 (E=136K), c3d_7(E=273K), etc..
  
 
Even for a fixed num of element with c3d_7 (E=273K), men usage is different for
 
Even for a fixed num of element with c3d_7 (E=273K), men usage is different for
different # of cores (lp=32k, 65k, 131k).  So... sorry I wouldn't know which
+
different # of cores (lp=32k, 65k, 131k).   
case if it's just c3d...
 
  
By the way, please remember I made huge change in the code so far for 2 times
+
made huge change in the code for 2 times
 
reduction in mem usage to go further up from 1.1 billion to 2.2 billion cases.
 
reduction in mem usage to go further up from 1.1 billion to 2.2 billion cases.
  
Line 30: Line 28:
 
(E=999k, lx1=16, lp=131k) was 500M. So I couldn't do on BGP. But is be ok on XK6,
 
(E=999k, lx1=16, lp=131k) was 500M. So I couldn't do on BGP. But is be ok on XK6,
 
even with lp=262k.
 
even with lp=262k.
 
 
If you still keep the old version old version of the code: you can compile and
 
see what men usage was. From example below, always the fourth one (92352484) will
 
be the mem usage.
 
  
  
Line 48: Line 41:
 
       3 columns * 4 bytes
 
       3 columns * 4 bytes
  
So, we have 275M*(8 fields *3*4)+ 275M*(3+9+1)*4 = 40 GB ?
+
So, we have 275M*(8 fields *3*4)+ 275M*(3+9+1)*4 = 40 GB  
 
 
Or, neglecting the cel type, we get 275M(8*3*4+12*4)=39GB ?
 
 
 
 
 
 
 
 
 
  
 +
Or, neglecting the cel type, we get 275M(8*3*4+12*4)=39GB
  
  
Line 64: Line 52:
 
2236416000*(4*3*4)+2236416000*(1+9+3)*4 = 223,641,600,000 (223GB)
 
2236416000*(4*3*4)+2236416000*(1+9+3)*4 = 223,641,600,000 (223GB)
  
Memory required is 420M (I wouldn't know how to get breakdown of memory for computation and I/O)
+
Memory required is 420M .
But let me know if anyway that we can get the info -- we can work on it.
+
 
 +
 
 +
TODO:
 +
make sure thread joined at last step;
 +
make sure buffer size is optimized according to the formula given above;

Latest revision as of 14:43, 13 June 2012

SIZEu file

 ldim: dimension
 lxi: the degree of polynomials
 lx1: the number of grid points on the face 
 ly1=lx1; lz1=lx1
 lelt: the maximun number of element per core
 lp : the maximum number of cores

use (E,lelt,lx1,lp), to represent size of prob

E=total element numbers, lelt=element # per core,
lx1= grid points in one direction, lp= # of cores.

There are many different rea with c3d_6 (E=136K), c3d_7(E=273K), etc..

Even for a fixed num of element with c3d_7 (E=273K), men usage is different for different # of cores (lp=32k, 65k, 131k).

made huge change in the code for 2 times

reduction in mem usage to go further up from 1.1 billion to 2.2 billion cases.

 from (E=273, lx1=16, lp= 131k): limit in the past  ---> (E=546k, lx1=16, lp=131k)

(E=999k, lx1=16, lp=131k) was 500M. So I couldn't do on BGP. But is be ok on XK6, even with lp=262k.


In there, if we assume "nc" is approximately same as the total grids "n". we have the following:

 For the header:
 (1) coordinate => 3 columns * 4 bytes
 (2) cell data  => 9 columns * 4 bytes
 (3) cell type  => 1 columns * 4 bytes
 For the 8 fields:
     3 columns * 4 bytes

So, we have 275M*(8 fields *3*4)+ 275M*(3+9+1)*4 = 40 GB

Or, neglecting the cel type, we get 275M(8*3*4+12*4)=39GB


The problem size on 32k cores was npt= 546000*16*16*16 = E*lx1*lx1*lx1 (where lx1=lxi+1). i.e., npt=2,236,416,000 = 2.2 billion grids

The output size with 4 fields will be: 2236416000*(4*3*4)+2236416000*(1+9+3)*4 = 223,641,600,000 (223GB)

Memory required is 420M .


TODO: make sure thread joined at last step; make sure buffer size is optimized according to the formula given above;