Difference between revisions of "Main Page/DAT"

Latest revision as of 15:43, 13 June 2012

SIZEu file

 ldim: dimension

 lxi: the degree of polynomials
 lx1: the number of grid points on the face 
 ly1=lx1; lz1=lx1

 lelt: the maximun number of element per core

 lp : the maximum number of cores

use (E,lelt,lx1,lp), to represent size of prob

E=total element numbers, lelt=element # per core,
lx1= grid points in one direction, lp= # of cores.

There are many different rea with c3d_6 (E=136K), c3d_7(E=273K), etc..

Even for a fixed num of element with c3d_7 (E=273K), men usage is different for different # of cores (lp=32k, 65k, 131k).

made huge change in the code for 2 times

reduction in mem usage to go further up from 1.1 billion to 2.2 billion cases.

 from (E=273, lx1=16, lp= 131k): limit in the past  ---> (E=546k, lx1=16, lp=131k)

(E=999k, lx1=16, lp=131k) was 500M. So I couldn't do on BGP. But is be ok on XK6, even with lp=262k.

In there, if we assume "nc" is approximately same as the total grids "n". we have the following:

 For the header:
 (1) coordinate => 3 columns * 4 bytes
 (2) cell data  => 9 columns * 4 bytes
 (3) cell type  => 1 columns * 4 bytes

 For the 8 fields:
     3 columns * 4 bytes

So, we have 275M*(8 fields *3*4)+ 275M*(3+9+1)*4 = 40 GB

Or, neglecting the cel type, we get 275M(8*3*4+12*4)=39GB

The problem size on 32k cores was npt= 546000*16*16*16 = E*lx1*lx1*lx1 (where lx1=lxi+1). i.e., npt=2,236,416,000 = 2.2 billion grids

The output size with 4 fields will be: 2236416000*(4*3*4)+2236416000*(1+9+3)*4 = 223,641,600,000 (223GB)

Memory required is 420M .

TODO: make sure thread joined at last step; make sure buffer size is optimized according to the formula given above;

Difference between revisions of "Main Page/DAT"

Latest revision as of 15:43, 13 June 2012

SIZEu file

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools

@@ Line 11: / Line 11: @@
    lp : the maximum number of cores
+use (E,lelt,lx1,lp), to represent size of prob
-We'll have to use (E,lelt,lx1,lp), to represent size of prob, instead of c3d.rea.
   E=total element numbers, lelt=element # per core,
   lx1= grid points in one direction, lp= # of cores.
-I had many different rea with c3d_6 (E=136K), c3d_7(E=273K), etc..
+There are many different rea with c3d_6 (E=136K), c3d_7(E=273K), etc..
 Even for a fixed num of element with c3d_7 (E=273K), men usage is different for
-different # of cores (lp=32k, 65k, 131k).  So... sorry I wouldn't know which
+different # of cores (lp=32k, 65k, 131k).
-case if it's just c3d...
-By the way, please remember I made huge change in the code so far for 2 times
+ made huge change in the code for 2 times
 reduction in mem usage to go further up from 1.1 billion to 2.2 billion cases.
- from (E=273, lx1=16, lp= 131k): limit in the past  ---> (E=546k, lx1=16, lp=131k)
+  from (E=273, lx1=16, lp= 131k): limit in the past  ---> (E=546k, lx1=16, lp=131k)
 (E=999k, lx1=16, lp=131k) was 500M. So I couldn't do on BGP. But is be ok on XK6,
@@ Line 32: / Line 30: @@
-If you still keep the old version old version of the code: you can compile and
+In there, if we assume "nc" is approximately same as the total grids "n".
-see what men usage was. From example below, always the fourth one (92352484) will
+we have the following:
-be the mem usage.
+  For the header:
+  (1) coordinate => 3 columns * 4 bytes
+  (2) cell data  => 9 columns * 4 bytes
+  (3) cell type  => 1 columns * 4 bytes
+  For the 8 fields:
+columns * 4 bytes
+So, we have 275M*(8 fields *3*4)+ 275M*(3+9+1)*4 = 40 GB
+Or, neglecting the cel type, we get 275M(8*3*4+12*4)=39GB
+The problem size on 32k cores was  npt= 546000*16*16*16 = E*lx1*lx1*lx1
+(where lx1=lxi+1). i.e., npt=2,236,416,000 = 2.2 billion grids
+The output size with 4 fields will be:
+2236416000*(4*3*4)+2236416000*(1+9+3)*4 = 223,641,600,000 (223GB)
+Memory required is 420M .
-======
-jl_sparse_cholesky.o jl_poly.o jl_tensor.o jl_findpt.o jl_pfindpt.o comm_mpi2.o rbIO_nekcem.o vtkbin.o coIO_nekcem.o coIO_nekcem_read.o io_util.o mpiio_util.o io_driver.o   -llapack -lblas
-  text    data     bss     dec     hex filename
-4173564  266664 87912256        92352484        5812fe4 nekcem
-I am done
-======
-Let me know if not clear on this --
+TODO:
+make sure thread joined at last step;
+make sure buffer size is optimized according to the formula given above;