Iterating over a large number of xml documents
I have a list of roughly 56,000 xml documents that I each need to open and
pull out an attribute and verify the attributes value against another list
(csv file)
At the moment I am using this, which works for one xml document
soup = BeautifulSoup(xmlText)
nameTag = soup.find('instrument', {"name": True})
idTag = soup.find('instrument', {"id": True})
print(idTag['id'] + "," + nameTag['name'])
this gives me the id and the name of the item, which I can then compare
against my other list. But with 56,000 of these documents, what's the best
way to deal with this? I will need to download each document then load it
into BeautifulSoup and pull out the name and id. Can I do all that in a
simple for loop?
Thanks a lot in advance.
No comments:
Post a Comment