python - Assign unique id to columns pandas data frame -
hello have following dataframe
df = b john tom homer bart tom maggie lisa john
i assign each name unique id , returns
df = b c d john tom 0 1 homer bart 2 3 tom maggie 1 4 lisa john 5 0
what have done following:
ll1 = pd.concat([df.a,df.b],ignore_index=true) ll1 = pd.dataframe(ll1) ll1.columns=['a'] nameun = pd.unique(ll1.a.ravel()) llout['c'] = 0 llout['d'] = 0 nn = list(nameun) in range(1,len(llout)): llout.c[i] = nn.index(llout.a[i]) llout.d[i] = nn.index(llout.b[i])
but since have large dataset process slow.
here's 1 way. first array of unique names:
in [11]: df.values.ravel() out[11]: array(['john', 'tom', 'homer', 'bart', 'tom', 'maggie', 'lisa', 'john'], dtype=object) in [12]: pd.unique(df.values.ravel()) out[12]: array(['john', 'tom', 'homer', 'bart', 'maggie', 'lisa'], dtype=object)
and make series, mapping names respective numbers:
in [13]: names = pd.unique(df.values.ravel()) in [14]: names = pd.series(np.arange(len(names)), names) in [15]: names out[15]: john 0 tom 1 homer 2 bart 3 maggie 4 lisa 5 dtype: int64
now use applymap
, names.get
lookup these numbers:
in [16]: df.applymap(names.get) out[16]: b 0 0 1 1 2 3 2 1 4 3 5 0
and assign correct columns:
in [17]: df[["c", "d"]] = df.applymap(names.get) in [18]: df out[18]: b c d 0 john tom 0 1 1 homer bart 2 3 2 tom maggie 1 4 3 lisa john 5 0
note: assumes values names begin with, may want restrict columns only:
df[['a', 'b']].values.ravel() ... df[['a', 'b']].applymap(names.get)
Comments
Post a Comment