xml - Python Scrapy Xpath? -

- July 15, 2012

for non-profit college assignment i'm trying scrape data website www.rateyourmusic.com using scrapy framework in python, have had small amount of success have been able scrape name of artist artist page xpath other info (birth date, nationality) proving difficult me scrape. of know correct xpath these objects be? here parsing method has @ least worked artist name.

def parse_dir_contents(self, response):     item = rateyourmusicartist()      sel in response.xpath('//div/div/div/div/table/tbody/tr/td'):           item['dateofbirth'] = sel.xpath('td/text()').extract() #these 2 selectors aren't working         item['nationality'] = sel.xpath('td/a/text()').extract()      sel in response.xpath('//div/div/div/div/div/h1'):          item['name'] = sel.xpath('text()').extract() #this 1 works      yield item

here sample url of artist page i'm scraping http://rateyourmusic.com/artist/kanye_west

here real snippet of html have on page (you can see if open page source).

<table class="artist_info"> <tr><td><div class="info_hdr">born</div> june 8, 1977, <a class="location" href="/location/atlanta/ga/united states">atlanta, ga, united states</a></td></tr> <tr><td><div class="info_hdr">currently</div><a class="location" href="/location/hidden hills/ca/united states">hidden hills, ca, united states</a></td></tr> </table>

in order birthday run suhc xpage (content of first row in table)

//table[@class='artist_info']/tr[1]/td/text()

result

'june 8, 1977,'

in order currently run suhc xpage (content of 2-nd row in table)

//table[@class='artist_info']/tr[2]/td/a/text()

result

'hidden hills, ca, united states'

Search This Blog

Maxid

xml - Python Scrapy Xpath? -

Comments

Post a Comment

Popular posts from this blog

How to show in django cms breadcrumbs full path? -

php - Invalid Cofiguration - yii\base\InvalidConfigException - Yii2 -

ruby on rails - npm error: tunneling socket could not be established, cause=connect ETIMEDOUT -