xml - Python Scrapy Xpath? -


for non-profit college assignment i'm trying scrape data website www.rateyourmusic.com using scrapy framework in python, have had small amount of success have been able scrape name of artist artist page xpath other info (birth date, nationality) proving difficult me scrape. of know correct xpath these objects be? here parsing method has @ least worked artist name.

def parse_dir_contents(self, response):     item = rateyourmusicartist()      sel in response.xpath('//div/div/div/div/table/tbody/tr/td'):           item['dateofbirth'] = sel.xpath('td/text()').extract() #these 2 selectors aren't working         item['nationality'] = sel.xpath('td/a/text()').extract()      sel in response.xpath('//div/div/div/div/div/h1'):          item['name'] = sel.xpath('text()').extract() #this 1 works      yield item 

here sample url of artist page i'm scraping http://rateyourmusic.com/artist/kanye_west

here real snippet of html have on page (you can see if open page source).

<table class="artist_info"> <tr><td><div class="info_hdr">born</div> june 8, 1977, <a class="location" href="/location/atlanta/ga/united states">atlanta, ga, united states</a></td></tr> <tr><td><div class="info_hdr">currently</div><a class="location" href="/location/hidden hills/ca/united states">hidden hills, ca, united states</a></td></tr> </table> 

in order birthday run suhc xpage (content of first row in table)

//table[@class='artist_info']/tr[1]/td/text() 

result

'june 8, 1977,'

in order currently run suhc xpage (content of 2-nd row in table)

//table[@class='artist_info']/tr[2]/td/a/text() 

result

'hidden hills, ca, united states'


Comments

Popular posts from this blog

php - Invalid Cofiguration - yii\base\InvalidConfigException - Yii2 -

How to show in django cms breadcrumbs full path? -

ruby on rails - npm error: tunneling socket could not be established, cause=connect ETIMEDOUT -