Nlp 使用NLTK读取食品包装中的成分

Nlp 使用NLTK读取食品包装中的成分,nlp,nltk,spacy,huggingface-tokenizers,nltk-trainer,Nlp,Nltk,Spacy,Huggingface Tokenizers,Nltk Trainer,我正在尝试将食品包装中的成分读取到我的数据库中,我已经能够使用pytesseract读取数据包中的数据,现在我想要构建数据,数据看起来非常复杂,下面是一个示例 RASPBERRY FILLING (INVERT SUGAR, CORN SYRUP, SUGAR, RASPBERRY PUREE, GLYCERIN, MALTODEXTRIN, MODIFIED CORN STARCH, RASPBERRY JUICE CONCENTRATE, SODIUM ALGINATE, METHYLC

我正在尝试将食品包装中的成分读取到我的数据库中,我已经能够使用pytesseract读取数据包中的数据,现在我想要构建数据,数据看起来非常复杂,下面是一个示例

RASPBERRY FILLING (INVERT SUGAR, CORN SYRUP, SUGAR, RASPBERRY PUREE, GLYCERIN, 
MALTODEXTRIN, MODIFIED CORN STARCH, RASPBERRY JUICE CONCENTRATE, SODIUM ALGINATE, 
METHYLCELLULOSE, VEGETABLE JUICE FOR COLOR (RADISH, CARROT, APPLE, BLACKCURRANT, 
HIBISCUS CONCENTRATES), MONOCALCIUM PHOSPHATE, XANTHAN GUM, DICALCIUM PHOSPHATE, 
CITRIC ACID, NATURAL FLAVORS, MALIC ACID), WHOLE GRAIN ROLLED OATS, 
WHOLE GRAIN WHEAT FLOUR, ENRICHED WHEAT FLOUR (BLEACHED WHEAT FLOUR, 
MALTED BARLEY FLOUR, NIACIN, REDUCED IRON, THIAMIN MONONITRATE, RIBOFLAVIN, 
FOLIC ACID), VEGETABLE OIL BLEND (CANOLA, PALM, PALM KERNEL), INVERT SUGAR, 
SUGAR, GLYCERIN, CONTAINS LESS THAN 2% OF THE FOLLOWING: WHEY, SOLUBLE CORN FIBER, 
CALCIUM CARBONATE HONEY, WHEAT BRAN, SALT, POTASSIUM BICARBONATE (LEAVENING), 
SORBITAN MONOSTEARATE, VITAL WHEAT GLUTEN, CORN STARCH, XANTHAN GUM, REDUCED IRON, 
NIACINAMIDE, PYRIDOXINE HYDROCHLORIDE (VITAMIN B6), DICALCIUM PHOSPHATE, ZINC OXIDE, 
VITAMIN A PALMITATE, FOLIC ACID, THIAMIN HYDROCHLORIDE (VITAMIN B1), RIBOFLAVIN 
(VITAMIN B2), CYANOCOBALAMIN (VITAMIN B12), NATURAL FLAVOR, MOLASSES,
我是NLP新手,所以我从使用NLTK开始,我正在标记数据,如果有括号,我将最里面的一个作为该部分的真正成分,而忽略它的外部部分。这种方法在这种情况下效果很好,但还有许多其他情况无法通过这种方法解决,所以有人能指导我解决这个问题的最佳方法是什么吗?如果需要进一步分类,请告诉我