Pandas/Python ruining JSON data in DataFrames -
i'm interacting api , getting json data back. @ top level of json object have 'regular' data fields have more advanced structures:
{ "foo": 1, "bar": "string", "spam": { "egg":"green", "ham":"yum", "ran": { "out_of":"fake_words" } } }
i need preserve these advanced structures ("spam") valid json
i'm getting data using requests , loading pandas dataframe so:
api_result = api.get_data().json() #the json output of request object df = read_json(json.dumps(api_result))
this gives me nice dataframe 3 columns, expect (this want). problem comes strings:
foo bar spam 1 'string' {'egg':'green','ham':'yum','ran':{'out_of':'fake_words'}
pandas or python has changed strings in data single quotes (') instead of valid json double quotes ("). behaviour ruins downstream processing expecting valid json objects since quotes single quotes.
edit--> program writes out csv ingested database table expecting valid json in many of fields. table used many other processes further analysis , data preparation. <--edit
is there way tell pandas/python stop changing strings double single quotes? know general concession single quotes more pythonic they're ruining me.
thanks!
if want generate valid json in python, best route built-in json
package. can use dumps
function create valid json string python dictionary:
>>> import json >>> data = {'egg':'green','ham':'yum','ran':{'out_of':'fake_words'}} >>> json.dumps(data) '{"ham": "yum", "ran": {"out_of": "fake_words"}, "egg": "green"}'
edited answer based on edited question:
the problem when read json pandas dataframe, converts everything python objects. in case, json strings being converted python dicts, , when print results shows python's string representation of dicts. representation json, not json.
what need convert dicts in dataframe valid json strings. conversion in "spam"
column, use apply()
method call, e.g.
data['spam'] = data['spam'].apply(json.dumps)
now column contains json strings rather python dicts.
Comments
Post a Comment