Difference between revisions of "Main Page/DAT"
(7 intermediate revisions by the same user not shown) | |||
Line 11: | Line 11: | ||
lp : the maximum number of cores | lp : the maximum number of cores | ||
− | + | use (E,lelt,lx1,lp), to represent size of prob | |
− | |||
E=total element numbers, lelt=element # per core, | E=total element numbers, lelt=element # per core, | ||
lx1= grid points in one direction, lp= # of cores. | lx1= grid points in one direction, lp= # of cores. | ||
− | + | There are many different rea with c3d_6 (E=136K), c3d_7(E=273K), etc.. | |
Even for a fixed num of element with c3d_7 (E=273K), men usage is different for | Even for a fixed num of element with c3d_7 (E=273K), men usage is different for | ||
− | different # of cores (lp=32k, 65k, 131k). | + | different # of cores (lp=32k, 65k, 131k). |
− | |||
− | + | made huge change in the code for 2 times | |
reduction in mem usage to go further up from 1.1 billion to 2.2 billion cases. | reduction in mem usage to go further up from 1.1 billion to 2.2 billion cases. | ||
− | + | from (E=273, lx1=16, lp= 131k): limit in the past ---> (E=546k, lx1=16, lp=131k) | |
(E=999k, lx1=16, lp=131k) was 500M. So I couldn't do on BGP. But is be ok on XK6, | (E=999k, lx1=16, lp=131k) was 500M. So I couldn't do on BGP. But is be ok on XK6, | ||
Line 32: | Line 30: | ||
− | + | In there, if we assume "nc" is approximately same as the total grids "n". | |
− | + | we have the following: | |
− | + | ||
+ | For the header: | ||
+ | (1) coordinate => 3 columns * 4 bytes | ||
+ | (2) cell data => 9 columns * 4 bytes | ||
+ | (3) cell type => 1 columns * 4 bytes | ||
+ | |||
+ | For the 8 fields: | ||
+ | 3 columns * 4 bytes | ||
+ | |||
+ | So, we have 275M*(8 fields *3*4)+ 275M*(3+9+1)*4 = 40 GB | ||
+ | |||
+ | Or, neglecting the cel type, we get 275M(8*3*4+12*4)=39GB | ||
+ | |||
+ | |||
+ | The problem size on 32k cores was npt= 546000*16*16*16 = E*lx1*lx1*lx1 | ||
+ | (where lx1=lxi+1). i.e., npt=2,236,416,000 = 2.2 billion grids | ||
+ | |||
+ | The output size with 4 fields will be: | ||
+ | 2236416000*(4*3*4)+2236416000*(1+9+3)*4 = 223,641,600,000 (223GB) | ||
+ | |||
+ | Memory required is 420M . | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | TODO: | |
+ | make sure thread joined at last step; | ||
+ | make sure buffer size is optimized according to the formula given above; |
Latest revision as of 14:43, 13 June 2012
SIZEu file
ldim: dimension
lxi: the degree of polynomials lx1: the number of grid points on the face ly1=lx1; lz1=lx1
lelt: the maximun number of element per core
lp : the maximum number of cores
use (E,lelt,lx1,lp), to represent size of prob
E=total element numbers, lelt=element # per core, lx1= grid points in one direction, lp= # of cores.
There are many different rea with c3d_6 (E=136K), c3d_7(E=273K), etc..
Even for a fixed num of element with c3d_7 (E=273K), men usage is different for different # of cores (lp=32k, 65k, 131k).
made huge change in the code for 2 times
reduction in mem usage to go further up from 1.1 billion to 2.2 billion cases.
from (E=273, lx1=16, lp= 131k): limit in the past ---> (E=546k, lx1=16, lp=131k)
(E=999k, lx1=16, lp=131k) was 500M. So I couldn't do on BGP. But is be ok on XK6, even with lp=262k.
In there, if we assume "nc" is approximately same as the total grids "n".
we have the following:
For the header: (1) coordinate => 3 columns * 4 bytes (2) cell data => 9 columns * 4 bytes (3) cell type => 1 columns * 4 bytes
For the 8 fields: 3 columns * 4 bytes
So, we have 275M*(8 fields *3*4)+ 275M*(3+9+1)*4 = 40 GB
Or, neglecting the cel type, we get 275M(8*3*4+12*4)=39GB
The problem size on 32k cores was npt= 546000*16*16*16 = E*lx1*lx1*lx1
(where lx1=lxi+1). i.e., npt=2,236,416,000 = 2.2 billion grids
The output size with 4 fields will be: 2236416000*(4*3*4)+2236416000*(1+9+3)*4 = 223,641,600,000 (223GB)
Memory required is 420M .
TODO:
make sure thread joined at last step;
make sure buffer size is optimized according to the formula given above;