xml - Python Scrapy Xpath? -
for non-profit college assignment i'm trying scrape data website www.rateyourmusic.com using scrapy framework in python, have had small amount of success have been able scrape name of artist artist page xpath other info (birth date, nationality) proving difficult me scrape. of know correct xpath these objects be? here parsing method has @ least worked artist name.
def parse_dir_contents(self, response): item = rateyourmusicartist() sel in response.xpath('//div/div/div/div/table/tbody/tr/td'): item['dateofbirth'] = sel.xpath('td/text()').extract() #these 2 selectors aren't working item['nationality'] = sel.xpath('td/a/text()').extract() sel in response.xpath('//div/div/div/div/div/h1'): item['name'] = sel.xpath('text()').extract() #this 1 works yield item
here sample url of artist page i'm scraping http://rateyourmusic.com/artist/kanye_west
here real snippet of html have on page (you can see if open page source).
<table class="artist_info"> <tr><td><div class="info_hdr">born</div> june 8, 1977, <a class="location" href="/location/atlanta/ga/united states">atlanta, ga, united states</a></td></tr> <tr><td><div class="info_hdr">currently</div><a class="location" href="/location/hidden hills/ca/united states">hidden hills, ca, united states</a></td></tr> </table>
in order birthday run suhc xpage (content of first row in table)
//table[@class='artist_info']/tr[1]/td/text()
result
'june 8, 1977,'
in order currently run suhc xpage (content of 2-nd row in table)
//table[@class='artist_info']/tr[2]/td/a/text()
result
'hidden hills, ca, united states'
Comments
Post a Comment