Python + Pandas : Update ONE column in csv based on another csv -


i have 2 similar tables ("hist.csv") :

historical : id | url | url2 | url3 | time 1         b      c      5 2    d      e      f      8 

and ("new.csv") :

new : id | url | url2 | url3 | time 1         z      k      9 2    g      h           11 

i want update new.time column historical.time value, if "url" columns match. i.e. desired output here url "a" updated :

new2 : id | url | url2 | url3 | time 1         z      k      5 2    g      h           11 

i tried following :

historical = pd.dataframe.from_csv("hist.csv", index_col='id', sep='\t', encoding='utf-8') new = pd.dataframe.from_csv("new.csv", index_col='id', sep='\t', encoding='utf-8')  index, row in new.iterrows():     new.loc[index,'time']=historical.loc[historical['url'] == row['url'],'time']  new.to_csv("new2.csv", sep='\t', encoding='utf-8') 

raising :

 valueerror: must have equal len keys , value when setting iterable 

ps : found thread : updating dataframe based on dataframe looks proposed solution "merge" not fit needs, have many columns ?

the basic issue historical.loc[historical['url'] == row['url'],'time'] returns series (even if there 1 row or no row condition - historical['url'] == row['url'] -matches). example -

in [15]: df out[15]:     b 0  1  2 1  2  3  in [16]: df.loc[df['a']==1,'b'] out[16]: 0    2 name: b, dtype: int64 

and try set dataframe single cell of new dataframe, causing issue.

since in comments -

i may have several rows "url" in historical, have same time value. in case, should consider first occurence/match.

a quick fix code check whether row['url'] exists in other dataframe, , if true, value using -

for index, row in new.iterrows():     if row['url'] in historical['url'].values:         row['time']=historical.loc[historical['url'] == row['url'],'time'].values[0] 

Comments

Popular posts from this blog

How to show in django cms breadcrumbs full path? -

php - Invalid Cofiguration - yii\base\InvalidConfigException - Yii2 -

ruby on rails - npm error: tunneling socket could not be established, cause=connect ETIMEDOUT -