C# 如何使用C控制台多rss提要阅读器检查机柜标记
我已经创建了C控制台应用程序,它读取多个RSS提要URL,然后获取值并放入数据库 现在我经常遇到wordpress rss提要生成器的问题,其中文章图像带有标记框 如何获取和解析文章图像url,若在源代码附件中,我用代码读取描述标签中的所有图像url 以下是我在数据库中读写的代码:C# 如何使用C控制台多rss提要阅读器检查机柜标记,c#,.net,xpath,console,xelement,C#,.net,Xpath,Console,Xelement,我已经创建了C控制台应用程序,它读取多个RSS提要URL,然后获取值并放入数据库 现在我经常遇到wordpress rss提要生成器的问题,其中文章图像带有标记框 如何获取和解析文章图像url,若在源代码附件中,我用代码读取描述标签中的所有图像url 以下是我在数据库中读写的代码: using (var xmlReader = XmlReader.Create(izvorURLX)) { var rssFormatter = new Rss20FeedFormatter(); r
using (var xmlReader = XmlReader.Create(izvorURLX))
{
var rssFormatter = new Rss20FeedFormatter();
rssFormatter.ReadFrom(xmlReader);
foreach (SyndicationItem syndicationItem in rssFormatter.Feed.Items)
{
Console.OutputEncoding = Encoding.UTF8;
string link = syndicationItem.Links[0].Uri.ToString();
var statCat1 = Convert.ToString((""));
foreach (var kategorija in syndicationItem.Categories.Take(1))
{
statCat1 = (kategorija.Name);
}
var rr = syndicationItem.AttributeExtensions.Values;
var LastIZV1 = rssFormatter.Feed.LastUpdatedTime.DateTime;
var SiteTitle = rssFormatter.Feed.Title.Text;
var itemFD = Convert.ToString(syndicationItem.Summary.Text);
string clitemFD = Regex.Replace(itemFD, @"<[^>]*>", String.Empty, RegexOptions.IgnoreCase).Trim();
var ItemItem = Convert.ToString(rssFormatter.Feed.Items);
var ff = rssFormatter.Feed.Items.ToString();
var datumIZV0 = syndicationItem.PublishDate.DateTime;
var nula = Convert.ToDateTime("01.01.0001 00:00:00");
var datumIZVX = Convert.ToDateTime(DateTime.Now);
if (datumIZV0 == nula)
{
datumIZVX = Convert.ToDateTime(DateTime.Now);
}
else
{
datumIZVX = Convert.ToDateTime(datumIZV0);
}
XmlDocument doc = new XmlDocument();
doc.Load(izvorURLX);
var imgSRC = Convert.ToString("");
var reg1 = new Regex("src=(?:\"|\')?(?<imgSrc>[^>]*[^/].?:bmp|jpg|jpeg|gif|png))(?:\"|\')?");
var match1 = reg1.Match(itemFD);
if (match1.Success)
{
Uri UrlImage = new Uri(match1.Groups["imgSrc"].Value, UriKind.Absolute);
imgSRC = UrlImage.ToString();
}
var feedXML = Convert.ToString(izvorURLX);
int KatX = Convert.ToInt32(KatIzv);
var statTitle = Convert.ToString(syndicationItem.Title.Text);
var statLink = Convert.ToString(syndicationItem.Links[0].Uri);
SqlConnection conn = new SqlConnection("Server=localhost\\SQLEXPRESS;Database=RSSFeedAgregator;Integrated Security=true");
conn.Open();
var FeedID = Convert.ToInt32(0);
var LastinDB = Convert.ToDateTime("01.01.0001 00:00:00");
string FeedInDB = Convert.ToString("a");
using (SqlCommand cmdX2 = new SqlCommand("SELECT Feed_ID, Izvor, LastUpd, feed, Kategorija, iID, izvTitle, statCat FROM [dbo].[tbl_feeds]", conn))
{
SqlDataReader readerX = cmdX2.ExecuteReader();
while (readerX.Read())
{
Console.OutputEncoding = Encoding.UTF8;
var feedTxt = Convert.ToString(readerX["feed"]);
FeedID = Convert.ToInt32(readerX["Feed_ID"]);
LastinDB = Convert.ToDateTime(readerX["LastUpd"]);
FeedInDB = Convert.ToString(readerX["feed"]);
}
readerX.Close();
}
bool inList = DB.Contains(clitemFD);
var statIMG = Convert.ToString("");
if (inList == false)
{
Console.WriteLine("false: Ne postoi");
using (SqlCommand cmd1 = new SqlCommand("INSERT INTO tbl_feeds VALUES (" + "@Izvor, @LastUpd, @feed, @Kategorija, @iID, @izvTitle, @statCat, @statTitle, @statLink, @statImage)", conn))
{
cmd1.Parameters.AddWithValue("@Izvor", feedXML);
cmd1.Parameters.AddWithValue("@LastUpd", datumIZVX);
cmd1.Parameters.AddWithValue("@feed", clitemFD);
cmd1.Parameters.AddWithValue("@Kategorija", KatX);
cmd1.Parameters.AddWithValue("@iID", IzvID);
cmd1.Parameters.AddWithValue("@izvTitle", SiteTitle);
cmd1.Parameters.AddWithValue("@statCat", statCat1);
cmd1.Parameters.AddWithValue("@statTitle", statTitle);
cmd1.Parameters.AddWithValue("@statLink", statLink);
cmd1.Parameters.AddWithValue("@statImage", imgSRC);
int rows = cmd1.ExecuteNonQuery();
Console.WriteLine("Uspesno dodadeno nov zapis !");
}
conn.Close();
}
else
{
Console.WriteLine("true: Postoi");
}
}
}
对不起,我的代码太长了,但我希望大家能帮助我更好地理解这件事
更新
我还发现了这段代码,这段代码读取附件URL,但只要重复阅读,如果源代码有10篇文章,这段代码读取10次所有10个img URL,并在数据库中保存最后一个
XmlNodeList items = doc.SelectNodes("//item") ;
for (int i = 0; i < items.Count; i++)
{
var encImg = (items[i].SelectSingleNode("enclosure").Attributes["url"].Value);
}
任何人都可以将此代码修改为工作属性?老实说,我不能很好地理解您的代码,因此我建议您将代码制表/排序 为此,可以使用Ctrl+K+Ctrl+D 您可以检查更多快捷方式 一旦这样说,您就可以使用以下代码行轻松找到附件url:
string link = "";
foreach (SyndicationItem syndicationItem in rssFormatter.Feed.Items)
{
Console.OutputEncoding = Encoding.UTF8;
// You have to check if `syndicationItem.Links` has more than 1 element.
if (syndicationItem.Links.Count > 0)
{
// this is the line that shows you the url of the "enclosure" tag:
link = syndicationItem.Links[1].Uri.ToString();
}
// Prints the Image's src.
Console.WriteLine("Image src: " + link);
}
前面的代码打印我:
Image src: http://a1on.mk/wp-content/uploads/2017/07/turcija-ucenici.jpg
Image src: http://a1on.mk/wp-content/uploads/2017/07/vlada-18juli.jpg
Image src: http://a1on.mk/wp-content/uploads/2017/07/tomas-greminger.jpg
Image src: http://a1on.mk/wp-content/uploads/2014/08/toplo.jpg
Image src: http://a1on.mk/wp-content/uploads/2017/06/grncarov.gif
Image src: http://a1on.mk/wp-content/uploads/2015/04/uprava-finansiska-policija.gif
Image src: http://a1on.mk/wp-content/uploads/2017/05/pritvor-turska.jpg
Image src: http://a1on.mk/wp-content/uploads/2017/07/kosarkari-do20.jpg
Image src: http://a1on.mk/wp-content/uploads/2017/07/vardar-fk-nat.jpg
Image src: http://a1on.mk/wp-content/uploads/2017/07/burgas.jpg
不要那样做。你需要使用一个HTML解析器。而且,你的Convert.ToString调用毫无意义。使用First而不是奇怪的循环。我不是专家,你能编写代码如何调用enclosure标记吗?因为我的代码从未定义过外壳读取。感谢Convert.ToString是因为我将所有值作为字符串放入数据库中,然后我也作为字符串读取,而不是XML文件。