Python + Pandas : Update ONE column in csv based on another csv -
i have 2 similar tables ("hist.csv") :
historical : id | url | url2 | url3 | time 1 b c 5 2 d e f 8
and ("new.csv") :
new : id | url | url2 | url3 | time 1 z k 9 2 g h 11
i want update new.time column historical.time value, if "url" columns match. i.e. desired output here url "a" updated :
new2 : id | url | url2 | url3 | time 1 z k 5 2 g h 11
i tried following :
historical = pd.dataframe.from_csv("hist.csv", index_col='id', sep='\t', encoding='utf-8') new = pd.dataframe.from_csv("new.csv", index_col='id', sep='\t', encoding='utf-8') index, row in new.iterrows(): new.loc[index,'time']=historical.loc[historical['url'] == row['url'],'time'] new.to_csv("new2.csv", sep='\t', encoding='utf-8')
raising :
valueerror: must have equal len keys , value when setting iterable
ps : found thread : updating dataframe based on dataframe looks proposed solution "merge" not fit needs, have many columns ?
the basic issue historical.loc[historical['url'] == row['url'],'time']
returns series (even if there 1 row or no row condition - historical['url'] == row['url']
-matches). example -
in [15]: df out[15]: b 0 1 2 1 2 3 in [16]: df.loc[df['a']==1,'b'] out[16]: 0 2 name: b, dtype: int64
and try set dataframe single cell of new
dataframe, causing issue.
since in comments -
i may have several rows "url" in historical, have same time value. in case, should consider first occurence/match.
a quick fix code check whether row['url']
exists in other dataframe, , if true, value using -
for index, row in new.iterrows(): if row['url'] in historical['url'].values: row['time']=historical.loc[historical['url'] == row['url'],'time'].values[0]
Comments
Post a Comment