在SQL查询中选择最近的行_Sql_Hive_Impala

在SQL查询中选择最近的行

sql hive

在SQL查询中选择最近的行,sql,hive,impala,Sql,Hive,Impala,我想连接两个表，为表1中的ID值选择最近的行 i、 e.对于表1中的每个ID值，只返回最近添加的ID值行。例如，表1如下所示： Columns: ID-value, date-added, other-information row 1: ID_1, 21/2/2020-12:30, other_newer_information... row 2: ID_1, 21/2/1990-12:30, other_older_information... Colu

我想连接两个表，为表1中的ID值选择最近的行

i、 e.对于表1中的每个ID值，只返回最近添加的ID值行。例如，表1如下所示：

Columns: ID-value, date-added,      other-information
row 1:    ID_1,    21/2/2020-12:30, other_newer_information...
row 2:    ID_1,    21/2/1990-12:30, other_older_information...

Columns: column-present-in-table-1, another-column-present-in-table-1, other-columns
row 1:   some_data,                 some_more_data...                  additional data
row 2:-  some_data, infor_2:        some_more_data...                  additional data
etc

SELECT
    id_1,
    info_1,
    info_2,
    date_time,
    info_3,
    info_4,
    max(info_3),
    min(info_4)
FROM table_1
INNER JOIN table_2
    ON table_1.info_1 = table_2.infor_1
    AND table_1.info_2 = table_2.infor_2
    WHERE id_1 in ("id1", "id2")
    AND info_3 = "10"
    GROUP BY id_1, info_1, info_2, info_3, info_4
    ORDER BY id_1, id_2, date_time DESC

因此，如果在该表中两次找到相同的ID值，则只返回最近的条目，在上述情况下为第1行

然后我想用第二个表中的信息连接这些行。 e、 g.表2看起来像这样：

Columns: ID-value, date-added,      other-information
row 1:    ID_1,    21/2/2020-12:30, other_newer_information...
row 2:    ID_1,    21/2/1990-12:30, other_older_information...

Columns: column-present-in-table-1, another-column-present-in-table-1, other-columns
row 1:   some_data,                 some_more_data...                  additional data
row 2:-  some_data, infor_2:        some_more_data...                  additional data
etc

SELECT
    id_1,
    info_1,
    info_2,
    date_time,
    info_3,
    info_4,
    max(info_3),
    min(info_4)
FROM table_1
INNER JOIN table_2
    ON table_1.info_1 = table_2.infor_1
    AND table_1.info_2 = table_2.infor_2
    WHERE id_1 in ("id1", "id2")
    AND info_3 = "10"
    GROUP BY id_1, info_1, info_2, info_3, info_4
    ORDER BY id_1, id_2, date_time DESC

下面我的sql查询在连接这两个表时可以正常工作
但我无法解决的是，当在多个日期输入了重复的ID值时，如何仅返回表1中最近的行
还不确定日期筛选是作为
```
SELECT
```
的一部分进行，还是在首次从表1中提取数据时进行

从StackOverflow的其他地方看，这些建议类似于

MAX（date\u time）

——但我的理解是，这将只返回最大日期时间值，而不是最近的一行——如果我错了，请纠正我。我的查询如下所示：

Columns: ID-value, date-added,      other-information
row 1:    ID_1,    21/2/2020-12:30, other_newer_information...
row 2:    ID_1,    21/2/1990-12:30, other_older_information...

Columns: column-present-in-table-1, another-column-present-in-table-1, other-columns
row 1:   some_data,                 some_more_data...                  additional data
row 2:-  some_data, infor_2:        some_more_data...                  additional data
etc

SELECT
    id_1,
    info_1,
    info_2,
    date_time,
    info_3,
    info_4,
    max(info_3),
    min(info_4)
FROM table_1
INNER JOIN table_2
    ON table_1.info_1 = table_2.infor_1
    AND table_1.info_2 = table_2.infor_2
    WHERE id_1 in ("id1", "id2")
    AND info_3 = "10"
    GROUP BY id_1, info_1, info_2, info_3, info_4
    ORDER BY id_1, id_2, date_time DESC

关于StackOverflow的其他建议：

选择TOP id\u 1…min（信息4）

（给出语法错误），

按id排序\u 1。。。date_time DESC LIMIT 1

（仅返回一行-即最近的日期时间）

ROW_NUMBER（）（按id分区，按日期时间排序）作为“ROW_NUMBER”

返回一个行号，而不是最近的行

因此，如果在我的表中两次找到相同的ID值，则只返回较新的条目，即上例中的第1行

您可以使用

行编号（）

：

我真的不知道你的问题和你的问题有什么关系。如果“表”实际上是查询的结果，则只需使用CTE或子查询：

with t as (
      <your query here>
     )
<query with row_number here>

带t作为(
)

请提供样本数据和所需结果。你的解释是一张桌子。但是你的查询要复杂得多。更新了描述。为了更加简洁，表1包含一个ID列。已在多个日期添加ID。因此，对于一个ID值，我只想选择最近添加的行，然后将它们连接到表2。连接是有效的，只获取最近的一行并不表示感谢——您的第一个选项是有效的。不确定我的平台是否支持CTE，但我将在下一步尝试。