Difference between revisions of "Main Page/DAT"
Line 66: | Line 66: | ||
Memory required is 420M (I wouldn't know how to get breakdown of memory for computation and I/O) | Memory required is 420M (I wouldn't know how to get breakdown of memory for computation and I/O) | ||
But let me know if anyway that we can get the info -- we can work on it. | But let me know if anyway that we can get the info -- we can work on it. | ||
+ | |||
+ | |||
+ | |||
+ | TODO: | ||
+ | make sure thread joined at last step; | ||
+ | make sure buffer size is optimized according to the formula given above; |
Revision as of 16:39, 25 March 2012
SIZEu file
ldim: dimension
lxi: the degree of polynomials lx1: the number of grid points on the face ly1=lx1; lz1=lx1
lelt: the maximun number of element per core
lp : the maximum number of cores
We'll have to use (E,lelt,lx1,lp), to represent size of prob, instead of c3d.rea.
E=total element numbers, lelt=element # per core, lx1= grid points in one direction, lp= # of cores.
I had many different rea with c3d_6 (E=136K), c3d_7(E=273K), etc..
Even for a fixed num of element with c3d_7 (E=273K), men usage is different for different # of cores (lp=32k, 65k, 131k). So... sorry I wouldn't know which case if it's just c3d...
By the way, please remember I made huge change in the code so far for 2 times reduction in mem usage to go further up from 1.1 billion to 2.2 billion cases.
from (E=273, lx1=16, lp= 131k): limit in the past ---> (E=546k, lx1=16, lp=131k)
(E=999k, lx1=16, lp=131k) was 500M. So I couldn't do on BGP. But is be ok on XK6, even with lp=262k.
If you still keep the old version old version of the code: you can compile and
see what men usage was. From example below, always the fourth one (92352484) will
be the mem usage.
In there, if we assume "nc" is approximately same as the total grids "n".
we have the following:
For the header: (1) coordinate => 3 columns * 4 bytes (2) cell data => 9 columns * 4 bytes (3) cell type => 1 columns * 4 bytes
For the 8 fields: 3 columns * 4 bytes
So, we have 275M*(8 fields *3*4)+ 275M*(3+9+1)*4 = 40 GB ?
Or, neglecting the cel type, we get 275M(8*3*4+12*4)=39GB ?
The problem size on 32k cores was npt= 546000*16*16*16 = E*lx1*lx1*lx1 (where lx1=lxi+1). i.e., npt=2,236,416,000 = 2.2 billion grids
The output size with 4 fields will be: 2236416000*(4*3*4)+2236416000*(1+9+3)*4 = 223,641,600,000 (223GB)
Memory required is 420M (I wouldn't know how to get breakdown of memory for computation and I/O) But let me know if anyway that we can get the info -- we can work on it.
TODO: make sure thread joined at last step; make sure buffer size is optimized according to the formula given above;