xml - Python Scrapy Xpath? -

- July 15, 2012

for non-profit college assignment i'm trying scrape data website www.rateyourmusic.com using scrapy framework in python, have had small amount of success have been able scrape name of artist artist page xpath other info (birth date, nationality) proving difficult me scrape. of know correct xpath these objects be? here parsing method has @ least worked artist name.

def parse_dir_contents(self, response):     item = rateyourmusicartist()      sel in response.xpath('//div/div/div/div/table/tbody/tr/td'):           item['dateofbirth'] = sel.xpath('td/text()').extract() #these 2 selectors aren't working         item['nationality'] = sel.xpath('td/a/text()').extract()      sel in response.xpath('//div/div/div/div/div/h1'):          item['name'] = sel.xpath('text()').extract() #this 1 works      yield item

here sample url of artist page i'm scraping http://rateyourmusic.com/artist/kanye_west

here real snippet of html have on page (you can see if open page source).

<table class="artist_info"> <tr><td><div class="info_hdr">born</div> june 8, 1977, <a class="location" href="/location/atlanta/ga/united states">atlanta, ga, united states</a></td></tr> <tr><td><div class="info_hdr">currently</div><a class="location" href="/location/hidden hills/ca/united states">hidden hills, ca, united states</a></td></tr> </table>

in order birthday run suhc xpage (content of first row in table)

//table[@class='artist_info']/tr[1]/td/text()

result

'june 8, 1977,'

in order currently run suhc xpage (content of 2-nd row in table)

//table[@class='artist_info']/tr[2]/td/a/text()

result

'hidden hills, ca, united states'

Search This Blog

Maxid

xml - Python Scrapy Xpath? -

Comments

Post a Comment

Popular posts from this blog

html - Difficulties with background-image property -

visual studio code - What does the isShellCommand property actually do and how should you use it? -

ios - Segue not passing data between ViewControllers -