vba - Estimating duplication percentage between rows in Excel -
i have excel (2010) data file on 200 variables (columns), , on 1,000 records (rows), each identified unique id number. however, i'm suspicious of these records fabricated, i.e., took existing record, replicated it, , changed few numbers make little different. therefore, need produce matrix show me number/percent of "same values" between each record , other records (e.g., record 1 , record 2 share 75 equal values, record 1 , record 3 share 57 equal values, record 2 , record 3 share 45 equal values, etc.). have few workarounds, take hours , don't produce simple matrix. don't care difference between values - whether equal or not. ideas appreciated!
don't know how perform on huge dataset but:
sub t() dim d, m(), nr long, nc long, r long, r2 long, c long dim v1, v2, long d = sheet1.range("a1").currentregion.value nr = ubound(d, 1) nc = ubound(d, 2) redim m(1 nr, 1 nr) r = 1 nr r2 = r nr = 0 c = 1 nc v1 = d(r, c): if iserror(v1) v1 = "error!" v2 = d(r2, c): if iserror(v2) v2 = "error!" if v1 = v2 = + 1 next c m(r2, r) = next r2 next r sheet2 .range("b2").resize(nr, nr).value = m 'assuming id's in first column... r = 1 nr .cells(1 + r, 1) = d(r, 1) .cells(r, r + 1) = d(r, 1) next r end end sub
Comments
Post a Comment