当我必须在相当大的sqlite数据库中迭代多个表时,我有一个场景。在表格中,我存储了多年来行星在天空中的位置的信息。例如,对于火星,我有桌子Mars_2000,Mars_2001等等。表结构总是相同的:
|id:INTEGER|date:TEXT|longitude:REAL|问题是,对于特定的任务,我需要迭代这个表,这需要花费很多时间(对于10个以上的查询来说,这是痛苦的)。
我认为,如果将所有表合并为一个大表,那么性能可能会更好,因为一个大表中的一个查询比通过较小表的50个表要好。我想确保这可能是可行的,因为数据库是庞大的(约20 it ),重塑它将需要一段时间。
我刚才描述的这个计划可行吗?对于这种情况,还有其他解决办法吗?
这可能会有所帮助,因此我附加了生成SQL查询的函数,该函数对于每个表都是唯一的:
pub fn transition_query(
select_param: &str, // usually asterix
table_name: &str, // table I'd like to query
birth_degree: &f64, // constant number
wanted_degree: &f64, // another constant number
orb: &f64, // another constant number
upper_date_limit: DateTime<Utc>, // casts to SQL-like string
lower_date_limit: DateTime<Utc>, // casts to SQL-like string
) -> String {
let parsed_upper_date_limit = CelestialBodyPosition::parse_date(upper_date_limit);
let parsed_lower_date_limit = CelestialBodyPosition::parse_date(lower_date_limit);
return format!("
SELECT *,(SECOND_LAG>60 OR SECOND_LAG IS NULL) AS TRANSIT_START, (SECOND_LEAD > 60 OR SECOND_LEAD IS NULL) AS TRANSIT_END, time FROM (
SELECT
*,
UNIX_TIME - LAG(UNIX_TIME,1) OVER (ORDER BY time) as SECOND_LAG,
LEAD(UNIX_TIME,1) OVER (ORDER BY time) - UNIX_TIME as SECOND_LEAD FROM (
SELECT {select_param},
DATE(time) as day_scoped_date,
CAST(strftime('%s', time) AS INT) AS UNIX_TIME,
longitude
FROM {table_name}
WHERE ((-{orb} <= abs(realModulo(longitude -{birth_degree} -{wanted_degree},360))
AND abs(realModulo(longitude -{birth_degree} -{wanted_degree},360)) <= {orb})
OR
(-{orb} <= abs(realModulo(longitude -{birth_degree} +{wanted_degree},360))
AND abs(realModulo(longitude -{birth_degree} +{wanted_degree},360)) <= {orb}))
AND time < '{parsed_upper_date_limit}' AND time > '{parsed_lower_date_limit}'
)
) WHERE (TRANSIT_START AND NOT TRANSIT_END) OR (TRANSIT_END AND NOT TRANSIT_START) ;
");
}发布于 2022-07-01 05:36:02
我用程序解决了这个问题。整个过程是用锈菌库和r2d2_sqlite库完成的。我仍然在执行很多查询,但现在它是在线程中完成的。它允许我将执行时间从25s缩短到大约3s。下面是代码:
use std::sync::mpsc;
use std::thread;
use r2d2_sqlite::SqliteConnectionManager;
use r2d2;
let manager = SqliteConnectionManager::file("db_path");
let pool = r2d2::Pool::builder().build(manager).unwrap();
let mut result: Vec<CelestialBodyPosition> = vec![]; // Vector of structs
let (tx, rx) = mpsc::channel(); // Allows ansynchronous communication
let mut children = vec![]; //vector of join handlers (not sure if needed at all
for query in queries {
let pool = pool.clone(); // For each loop I clone connection to databse
let inner_tx = tx.clone(); // and messager, as each thread should have spearated one.
children.push(thread::spawn(move || {
let conn = pool.get().unwrap();
add_real_modulo_function(&conn); // this adds custom sqlite function I needed
let mut sql = conn.prepare(&query).unwrap();
// this does query, and maps result to my internal type
let positions: Vec<CelestialBodyPosition> = sql
.query_map(params![], |row| {
Ok(CelestialBodyPosition::new(row.get(1)?, row.get(2)?))
})
.unwrap()
.map(|position| position.unwrap())
.collect();
// this sends partial result to receiver
return inner_tx.send(positions).unwrap();
}));
}
// first messenger has to be dropped, otherwise program will wait for its input
drop(tx);
for received in rx {
result.extend(received); // combine all results
}
return result;正如你所看到的,没有任何优化发生在sqlite网站上,这让我觉得我做错了什么,但是现在还好。对生成线程的数量按下更多的控制可能更好。
https://stackoverflow.com/questions/72810996
复制相似问题